git rewrite history - Git Repository Only Gets Bigger After Using BFG -
we in process of migrating our svn repo git (hosted @ bitbucket). used subgit import our branches/history bare repo have locally on (windows) pc.
the repo quite big (7.42 gb after import) because contains information svn revision numbers provide way have 2 way sync between git , svn (i'm interested in 1 way svn git).
i create local clone of imported bare repo , push branches bitbucket. after couple of hours (!) repo uploaded. bitbucket gave me warnings repo size. checked size , 1.1gb. thats not big imported bare still big have fast repository.
after playing around bfg managed remove soms large dll/sql export files using these commands on bare repo (i use clone pushing without svn-related refs):
java -jar bfg.jar --delete-files '{''specialized 2015''','''specialized,''insert-pcreeks''}.sql' --no-blob-protection java -jar bfg.jar --delete-files 'incara.*.dll' --no-blob-protection incara.git git reflog expire --expire=now --all && git gc --prune=now --aggressive
this took while , afterwards git_find_big.sh script did not show these large sql files anymore. after pushing things bitbucket (as new repo, not force push) got bigger (1.8gb)
can provide possible explanation behavior?
i don't know if matters used non standard branch/tag model in svn. resulted in branches like: /refs/heads/archive/some/path/to/branch
. these branches seemed work fine , removing them did not affect size.
next these problems noticed had xml files showing in git_find_big.sh
output:
size,pack,sha,location 12180,1011,56731c772febd7db11de5a66674fe6a1a9ec00a7 repository/frontend.xml 12074,1002,0cefaee608c06621adfa4a9120ed7ef651076c33 repository/frontend.xml 12073,1002,a1c36cf49ec736a7fc069dcc834b784ada4b6a06 repository/frontend.xml 12073,1002,1ba5bd92817347739d3fba375fc42641016a5c1d repository/frontend.xml 12073,1002,e9182762bfc5849bc6645fdd6358265c3930779f repository/frontend.xml 12073,1002,dff5733d67cb0306534ac41a4c55b3bbaa436a2e repository/frontend.xml 12072,1002,8ee628f645ce53d970c3cf9fdae8d2697224e64c repository/frontend.xml 12072,1002,1266dee72b33f7a05ca67488c485ea8afc323615 repository/frontend.xml
these files contain frontend logic of web platform using , indeed quite big. should treated text right? therefore don't why show separate objects in above output. right should not happening?
the svn import resulted in empty commits (for example when svn creates or moves branch needs new commit). guess these can removed using filter-branch?
sorry, have lot of questions! me this?
thanks,
piet
i've asked more diagnostic information in comments on question, needed give reasonable answer main part, secondary questions (which stackoverflow encourages ask separately, incidentally!), here pointers:
next these problems noticed had xml files showing in git_find_big.sh output: [snip]
these files contain frontend logic of web platform using , indeed quite big. should treated text right? therefore don't why show separate objects in above output. right should not happening?
git allocates ids based on contents of files (a sha hash), , far goes, doesn't care whether files text or not - if files different, ids different, , stored separately (git may delta compression under-the-hood, doesn't stop files being defined logically separate). it's not surprising different versions of same file show more once in git_find_big.sh
output.
the svn import resulted in empty commits (for example when svn creates or moves branch needs new commit). guess these can removed using filter-branch?
yep, bfg doesn't out-of-the-box. however, it's 1 task filter-branch
does reasonably (even if fiddly use).
Comments
Post a Comment