Working remotely? Waydev can help. Learn how to gain visibility into your engineering teams activity with Waydev. Request a demo
Back To All

How to reduce the git repository size

October 23rd, 2019
How To
Share Article

If you have a git repo, which has huge data and keeps on growing, we should reduce the repo size so that it will work seamlessly. In the case of bitbucket, it allows you to store up to 2 GB for a repository. The 2 GB storage doesn’t mean that you are storing a 2 GB file to the repository. It also includes the history of your git repository.

In git, everything is based on commits. You can make n number of commits to a repo. The advantage of using commits are, you can recreate the state of source code from a commit, that means, you can recreate the entire source code from a commit and it will contain all the codes and changes you had made in the repository till that commit. That is the power of version control. It is cool, isn’t it?

Keeping that in mind, we should understand the side effect of it. We cannot call it a side effect unless we are using git in an improper way. As I told, git keeps track of each and every line change you made, it makes its history huge. But git uses a powerful compressing mechanism so that it will make all your codes to tiny tiny chunks. Also, it stores the difference between the files to reduce the size.

Cloning a repository clones the entire history — including every version of every source code file. If a user commits a huge file, such as a JAR, every clone thereafter includes this file. Even if a user ends up removing the file from the project with a subsequent commit, the file still exists in the repository history.

So, your entire git content will be less than your actual source code size. But, even in that case, you keep on committing large files, your git repo size may increase due to the version history. You have to reduce your git repo size in order to work it seamlessly.

Ideally, we should keep your repository size to between 100MB and 300MB. To give you some examples: Git itself is 222MB, Mercurial itself is 64MB, and Apache is 225MB.

In bitbucket, there are two git storage limits; Soft limit and Hard limit.

Soft Limit (1GB): You will reach soft limit if your repository size reaches 1 GB. Bitbucket will notify the users about the storage limit. Users can still commit and push changes to their account. But you might have to perform maintenance to keep hitting the hard limit.

Hard limit (2 GB): This is the repository stop limit. You cannot perform further actions unless you reduce the storage size.

There are several ways to reduce the storage space of your git repository. First of all, you have to know what is the actual size of your repository.

git count-objects -v

This will display your repository size.

To reducing storage space, you have to rewrite your git history. Note that re-writing git history is a very much important task and once you have re-written the git history, you cannot go back to any previous versions or commits. So better you take a backup of your git repository before re-writing the git history.

You can backup the git code in several ways.

git clone repo_name.git

git clone repo_name.git — bare

Once you take backup, we can start cleaning up the repository.

If you committed any large file and you want to remove it entirely from the git history,

Git ‘gc’ (garbage collection) will remove all data from the repository that is not actually used, or in some way referenced, by any of your branches or tags. In order for that to be useful, we need to rewrite all Git repository history that contained the unwanted file, so that it no longer references it — git gc will then be able to discard the now-unused data.

Another approach is that, tell git that your current commit is the initial commit. For that, first checkout to the commit, which you want to make as the initial commit. Then run the following commands :

The above commands will forcefully push the current source code to the master branch as the first command.

Note : You should delete all other branches and tags, because it may still contain the old history.

References :

Ready to improve your Engineers’ performance?

Try Waydev with your team for free

Waydev's Playbook for data-driven engineering leaders.

Waydev's Playbook for data-driven engineering leaders.

Download Now!