Script to undelete all files in Git

13 Aug 2019 - by 'Maurits van der Schee'

I have written a Bash script to quickly undelete all files that are deleted in a Git repository (on any previous commit). The script only recovers the last known state of files of which the filename is not currently in use. It is optimized so that it executes quickly when large numbers of files are deleted.

Usage

Copy the script into the repository that you want to undelete files in and run:

bash git-undelete-all.sh

It should output something like:

1 files restored in 0 seconds

If you run with "-v" the script will print the filename of each undeleted file on a separate line.

How it works

It is automating the following process (where "undelete.sh" is a deleted file):

1) List the files that are deleted from the Github repository:

git log --pretty=format: --name-only --diff-filter=D | sort -u

2) Which gives you the filenames of the deleted files:

undelete.sh

3) Then get the hash of the commit in which this file is deleted:

git rev-list -n 1 HEAD -- undelete.sh

4) Which gives you the hash of the undeletion:

ae85c23372a8a45b788ed857800b3b424b1c15f8

5) Now you can checkout the version of the file before the deletion:

git checkout ae85c23372a8a45b788ed857800b3b424b1c15f8^ -- undelete.sh

And doing that for every file in the list that is retrieved in step 2.

Limitations

The script does not recover all deleted content as it only focuses on deleted files. Files that are partially deleted (or are made empty) are not recovered. Files are recovered to their last state, so not necessarily to the best state. Also files that have been recreated (a new file with the same name has been made) are not recovered. Only the simple case in which files are directly deleted is handled.

Optimization

It may take a lot of time to recover all deleted files one by one. Instead of processing the files one by one it tries to undelete the parent directories. This speeds up the process in the case full directories (with lots of files) are removed. I have tried this script on a large repository and the recovery took under 2 minutes for 30 thousand files.

The code

Here is the Bash code for the script:

#!/bin/bash
verbose=false
silent=false

while getopts 'hsv' flag; do
    case "$flag" in
        v)  verbose=true ;;
        s)  silent=true ;;
        *)  echo "Usage git-undelete-all [OPTION...]"
            echo
            echo "  -h  help: print this usage information"
            echo "  -s  silent: do not print output"
            echo "  -v  verbose: print every file recovered"
            exit 1 ;;
    esac
done

start_time=$(date +%s)
file_count=0
while read file; do
    ((file_count++))
    if [ "false" == "$silent" ]; then
        if [ "true" == "$verbose" ]; then
            echo $file
        else
            printf "\r$file_count files restored"
        fi
    fi
    if [ -e "$file" ]; then
        continue
    fi
    files=()
    while [ ! -z "$file" ] && [ ! "." == "$file" ] && [ ! "/" == "$file" ]; do
        files=("$file" "${files[@]}")
        file=$(dirname "$file")
    done
    for file in "${files[@]}"; do
        if [ ! -e "$file" ]; then
            hash=$(git rev-list -n 1 HEAD -- "$file" | xargs)
            if [ ! -z $hash ]; then
                git checkout $hash^ -- "$file"
            fi
        fi
    done
done < <(git log --pretty=format: --name-only --diff-filter=D | sort -u)

if [ "false" == "$silent" ] && [ "false" == "$verbose" ]; then
    end_time=$(date +%s)
    printf " in $(($end_time - $start_time)) seconds\n" 
fi

You can find the code on my Github:

https://github.com/mevdschee/git-undelete-all.sh

Enjoy!

PS: Liked this article? Please share it on Facebook, Twitter or LinkedIn.

TQ
dev.com