If you have a git repo, which has huge data and keeps on growing, we should reduce the repo size so that it will work seamlessly. In the case of bitbucket, it allows you to store up to 2 GB for a repository. The 2 GB storage doesn’t mean that you are storing a 2 GB file to the repository. It also includes the history of your git repository.

In git, everything is based on commits. You can make n number of commits to a repo. The advantage of using commits are, you can recreate the state of source code from a commit, that means, you can recreate the entire source code from a commit and it will contain all the codes and changes you had made in the repository till that commit. That is the power of version control. It is cool, isn’t it?

Keeping that in mind, we should understand the side effect of it. We cannot call it a side effect unless we are using git in an improper way. As I told, git keeps track of each and every line change you made, it makes its history huge. But git uses a powerful compressing mechanism so that it will make all your codes to tiny tiny chunks. Also, it stores the difference between the files to reduce the size.

Cloning a repository clones the entire history — including every version of every source code file. If a user commits a huge file, such as a JAR, every clone thereafter includes this file. Even if a user ends up removing the file from the project with a subsequent commit, the file still exists in the repository history.

So, your entire git content will be less than your actual source code size. But, even in that case, you keep on committing large files, your git repo size may increase due to the version history. You have to reduce your git repo size in order to work it seamlessly.

Ideally, we should keep your repository size to between 100MB and 300MB. To give you some examples: Git itself is 222MB, Mercurial itself is 64MB, and Apache is 225MB.

In bitbucket, there are two git storage limits; Soft limit and Hard limit.

Soft Limit (1GB): You will reach soft limit if your repository size reaches 1 GB. Bitbucket will notify the users about the storage limit. Users can still commit and push changes to their account. But you might have to perform maintenance to keep hitting the hard limit.

Hard limit (2 GB): This is the repository stop limit. You cannot perform further actions unless you reduce the storage size.

There are several ways to reduce the storage space of your git repository. First of all, you have to know what is the actual size of your repository.

git count-objects -v

This will display your repository size.

To reducing storage space, you have to rewrite your git history. Note that re-writing git history is a very much important task and once you have re-written the git history, you cannot go back to any previous versions or commits. So better you take a backup of your git repository before re-writing the git history.

You can backup the git code in several ways.

  • Download the git source code as zip. Make sure you downloaded the code from a stable and updated branch so that you can re-use the source code. Bitbucket and GitHub provide an option to download the code as zip from a particular branch, you have to login to your account and find the download option.
  • Cloning the repo locally and keep it somewhere. As I mentioned earlier, cloning a repo includes all git history and the repo size will be exactly same as in remote.

git clone repo_name.git

  • Cloning the repo with git bare command. You can clone the repo as bare repository, which means you can have the exact copy of the git remote repository in your local and from that you can do normal cloning of repository. A bare repository consists of code, exactly how it stored in the remote.

git clone repo_name.git — bare

Once you take backup, we can start cleaning up the repository.

If you committed any large file and you want to remove it entirely from the git history,

  • remove the file from your project’s current file-tree
  • remove the file from repository history — rewriting Git history, deleting the file from all commits containing it
  • remove all reflog history that refers to the old commit history
  • repack the repository, garbage-collecting the now-unused data using git gc

Git ‘gc’ (garbage collection) will remove all data from the repository that is not actually used, or in some way referenced, by any of your branches or tags. In order for that to be useful, we need to rewrite all Git repository history that contained the unwanted file, so that it no longer references it — git gc will then be able to discard the now-unused data.

Another approach is that, tell git that your current commit is the initial commit. For that, first checkout to the commit, which you want to make as the initial commit. Then run the following commands :

  • git checkout — orphan latest_branch
  • git add -A
  • git commit -am “Initial commit message” #Committing the changes
  • git branch -D master #Deleting master branch
  • git branch -m master #renaming branch as master
  • git push -f origin master #pushes to the master branch
  • git gc — aggressive — prune=all # remove the old files

The above commands will forcefully push the current source code to the master branch as the first command.

Note : You should delete all other branches and tags, because it may still contain the old history.

References :