How to Use Git-Filter-Repo to Remove Files From Your Git Repository

If you commit a file to a Git repository, it stays there forever. Even when you remove it, Git keeps it around in the .git folder – otherwise you could not go back to a version where this file was part of your project.

 

The Problem

At the beginning of our user group site renewal, we decided to put the whole template for our site in the Template folder. If we run into a part that we missed, we could use the examples there and add them to our site in no time.

Fast forward 3 years and multiple updates and the Template folder grow to 500MB. That means 500MB need to be downloaded at every CI build, it has to be copied around on the build server (from /s to /b) and cleaned-up afterwards. Oops…

 

Our solution

There is a tool called git-filter-repo that you can use to rewrite your Git history and remove a file from every commit that it was involved with. You end up with a repository without certain files or folders, but everyone in the team needs to throw away their current repository and clone it again. That includes the repository on GitHub or Azure DevOps. If you do not do that you end up with a duplication of all commits and an even bigger repository than before!

You can install git-filter-repo or only copy the file git_filter_repo.py from GitHub into a folder that is in your $PATH. I strongly suggest you do that on a Linux system. In WSL it took way longer.

Before you do anything, make a backup of your current repository and put it somewhere you later could find if you made a mistake.

Go to your repository and run git filter-repo with the path to the folder you no longer need AND the option --invert-paths – otherwise you remove all but the Template/ folder:

Parsed 539 commits
New history written in 0.99 seconds; now repacking/cleaning…
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 7f5d6f75 Link hinzugefügt
Enumerating objects: 15463, done.
Counting objects: 100% (15463/15463), done.
Delta compression using up to 4 threads
Compressing objects: 100% (8544/8544), done.

Writing objects: 100% (15463/15463), done.
Total 15463 (delta 6005), reused 15453 (delta 5996), pack-reused 0

You can rerun this command with other folders you want to remove. In my case the command only took a few seconds and it was no problem to rerun it a few times.

Now to the scary part: Remove the repository in Azure DevOps, create a new empty repository and push your smaller repository to your Azure DevOps. This worked for us without any impact on the work items and the build pipelines.

Do not forget to push your tags up to the new repository:

Run a check and build your project with your CI pipeline. If everything works, inform your team that they need to delete their local repository and clone it again.

 

Conclusion

We needed two attempts to get everything out of our repository that we did no longer need. At the end we shrunk the repository by 60% and cut the build time in half. Our templates got their own repository and we get a lot faster feedback thanks to the improved build time.

It was an hour well spent and I can highly recommend git-filter-repo if you need to get rid of some files.

1 thought on “How to Use Git-Filter-Repo to Remove Files From Your Git Repository”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.