13 Aug 2019 - by 'Maurits van der Schee'
I have written a Bash script to quickly undelete all files that are deleted in a Git repository (on any previous commit). The script only recovers the last known state of files of which the filename is not currently in use. It is optimized so that it executes quickly when large numbers of files are deleted.
Copy the script into the repository that you want to undelete files in and run:
bash git-undelete-all.sh
It should output something like:
1 files restored in 0 seconds
If you run with "-v" the script will print the filename of each undeleted file on a separate line.
It is automating the following process (where "undelete.sh" is a deleted file):
1) List the files that are deleted from the Github repository:
git log --pretty=format: --name-only --diff-filter=D | sort -u
2) Which gives you the filenames of the deleted files:
undelete.sh
3) Then get the hash of the commit in which this file is deleted:
git rev-list -n 1 HEAD -- undelete.sh
4) Which gives you the hash of the undeletion:
ae85c23372a8a45b788ed857800b3b424b1c15f8
5) Now you can checkout the version of the file before the deletion:
git checkout ae85c23372a8a45b788ed857800b3b424b1c15f8^ -- undelete.sh
And doing that for every file in the list that is retrieved in step 2.
The script does not recover all deleted content as it only focuses on deleted files. Files that are partially deleted (or are made empty) are not recovered. Files are recovered to their last state, so not necessarily to the best state. Also files that have been recreated (a new file with the same name has been made) are not recovered. Only the simple case in which files are directly deleted is handled.
It may take a lot of time to recover all deleted files one by one. Instead of processing the files one by one it tries to undelete the parent directories. This speeds up the process in the case full directories (with lots of files) are removed. I have tried this script on a large repository and the recovery took under 2 minutes for 30 thousand files.
Here is the Bash code for the script:
#!/bin/bash verbose=false silent=false while getopts 'hsv' flag; do case "$flag" in v) verbose=true ;; s) silent=true ;; *) echo "Usage git-undelete-all [OPTION...]" echo echo " -h help: print this usage information" echo " -s silent: do not print output" echo " -v verbose: print every file recovered" exit 1 ;; esac done start_time=$(date +%s) file_count=0 while read file; do ((file_count++)) if [ "false" == "$silent" ]; then if [ "true" == "$verbose" ]; then echo $file else printf "\r$file_count files restored" fi fi if [ -e "$file" ]; then continue fi files=() while [ ! -z "$file" ] && [ ! "." == "$file" ] && [ ! "/" == "$file" ]; do files=("$file" "${files[@]}") file=$(dirname "$file") done for file in "${files[@]}"; do if [ ! -e "$file" ]; then hash=$(git rev-list -n 1 HEAD -- "$file" | xargs) if [ ! -z $hash ]; then git checkout $hash^ -- "$file" fi fi done done < <(git log --pretty=format: --name-only --diff-filter=D | sort -u) if [ "false" == "$silent" ] && [ "false" == "$verbose" ]; then end_time=$(date +%s) printf " in $(($end_time - $start_time)) seconds\n" fi
You can find the code on my Github:
https://github.com/mevdschee/git-undelete-all.sh
Enjoy!
PS: Liked this article? Please share it on Facebook, Twitter or LinkedIn.