<div>GitHub does do garbage-collection automatically, triggered by each push, so simply removing gh-pages may do the trick.  They certainly aren't aggressive about it, so they may keep the two-week default before gc actually removes the refs.  Also note that garbage collection is not push-able.  It's unfortunate, since if I run 'git gc --prune=now --agressive' on a freshly cloned IPython, the repo shrinks by a factor of 5.  But since gc is purely local, there's nothing for me to push once I've done that, and there's no way to instruct GitHub to do aggressive gc (in fact, I think I read that they have a policy against it, due to something about how their forks work).</div>
<div><br></div><div>In investigating why IPython is so huge (It's always been inappropriately large on GitHub), I found that the largest blobs refer to the davinci ebooks in docs/examples/kernel/davinci*.txt, and the recent svg connection diagrams.</div>
<div><br></div><div><div><div>I'd be happy to sit down with you when you get back, to go over ways to cleanup the repo, because I'm definitely not comfortable scrubbing it with filter-branch on my own.</div><div>
<br>
</div><div>Sorry for causing more trouble,</div>-MinRK<br>
<br><br><div class="gmail_quote">On Sun, Jan 23, 2011 at 20:56, Fernando Perez <span dir="ltr"><<a href="http://fperez.net">fperez.net</a>@<a href="http://gmail.com">gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hey folks,<br>
<br>
I'm really sorry that (due to 'real life' getting in the way) I missed<br>
several important discussions, in particular this one:<br>
<br>
<a href="https://github.com/ipython/ipython/issues/closed/#issue/239" target="_blank">https://github.com/ipython/ipython/issues/closed/#issue/239</a><br>
<br>
As I just mentioned to Brian during a chat we had, there's a problem<br>
with the default approach Github took to pages hosting: they put the<br>
pages in a separate DAG inside the repo (the gh-pages branch), but<br>
this means polluting the repo forever with all builds of the docs.<br>
Furthermore, if we want to host multiple versions of the docs (as we<br>
do today for each release and should continue doing), the storage<br>
requirements are going to balloon.<br>
<br>
Fortunately, there's a different approach that's easy to implement,<br>
adds only minimal work, and is completely clean. Â And I'd already<br>
written the tools for it :) Â For datarray, here's the gh-pages-hosted<br>
docs:<br>
<br>
<a href="http://fperez.github.com/datarray-doc/" target="_blank">http://fperez.github.com/datarray-doc/</a><br>
<br>
The basic idea is just to have a *separate* git repo that *only* hosts<br>
the gh-pages docs, in this case this one:<br>
<br>
<a href="https://github.com/fperez/datarray-doc" target="_blank">https://github.com/fperez/datarray-doc</a><br>
<br>
A couple of scripts in the docs build file auto-generate all the<br>
necessary commits and info, and the only manual step needed is to do a<br>
single push.<br>
<br>
So I think what we should do is:<br>
<br>
- remove the gh-pages branch right away from the repo, so it doesn't<br>
grow any larger.<br>
<br>
- see if we can do a full purge of that data from the repo (I think<br>
the added size is ~6MB right now) with git's filter-branch tool<br>
(<a href="http://help.github.com/removing-sensitive-data" target="_blank">http://help.github.com/removing-sensitive-data</a> has some tips).<br>
<br>
- add the tools from my datarray repo to handle the process smoothly.<br>
<br>
<br>
Min, if you think you're up for some/all of this let me know, I'm<br>
still in Colombia but we can skype for me to give you some pointers.<br>
If not, we can get together back home next week when I return and<br>
clean this up.<br>
<br>
Sorry I didn't catch this earlier when the pull request went up...<br>
<br>
Cheers,<br>
<br>
f<br>
<br>
ps - obviously, please don't add anything at all to the gh-pages<br>
branch anymore, so we have as little to clean up as possible.<br>
_______________________________________________<br>
IPython-dev mailing list<br>
<a href="mailto:IPython-dev@scipy.org">IPython-dev@scipy.org</a><br>
<a href="http://mail.scipy.org/mailman/listinfo/ipython-dev" target="_blank">http://mail.scipy.org/mailman/listinfo/ipython-dev</a><br>
</blockquote></div><br></div></div>