[IPython-dev] gh-pages problem...

MinRK benjaminrk at gmail.com
Mon Jan 24 04:13:18 EST 2011


GitHub does do garbage-collection automatically, triggered by each push, so
simply removing gh-pages may do the trick.  They certainly aren't aggressive
about it, so they may keep the two-week default before gc actually removes
the refs.  Also note that garbage collection is not push-able.  It's
unfortunate, since if I run 'git gc --prune=now --agressive' on a freshly
cloned IPython, the repo shrinks by a factor of 5.  But since gc is purely
local, there's nothing for me to push once I've done that, and there's no
way to instruct GitHub to do aggressive gc (in fact, I think I read that
they have a policy against it, due to something about how their forks work).

In investigating why IPython is so huge (It's always been inappropriately
large on GitHub), I found that the largest blobs refer to the davinci ebooks
in docs/examples/kernel/davinci*.txt, and the recent svg connection
diagrams.

I'd be happy to sit down with you when you get back, to go over ways to
cleanup the repo, because I'm definitely not comfortable scrubbing it with
filter-branch on my own.

Sorry for causing more trouble,
-MinRK


On Sun, Jan 23, 2011 at 20:56, Fernando Perez <fperez.net at gmail.com> wrote:

> Hey folks,
>
> I'm really sorry that (due to 'real life' getting in the way) I missed
> several important discussions, in particular this one:
>
> https://github.com/ipython/ipython/issues/closed/#issue/239
>
> As I just mentioned to Brian during a chat we had, there's a problem
> with the default approach Github took to pages hosting: they put the
> pages in a separate DAG inside the repo (the gh-pages branch), but
> this means polluting the repo forever with all builds of the docs.
> Furthermore, if we want to host multiple versions of the docs (as we
> do today for each release and should continue doing), the storage
> requirements are going to balloon.
>
> Fortunately, there's a different approach that's easy to implement,
> adds only minimal work, and is completely clean.  And I'd already
> written the tools for it :)  For datarray, here's the gh-pages-hosted
> docs:
>
> http://fperez.github.com/datarray-doc/
>
> The basic idea is just to have a *separate* git repo that *only* hosts
> the gh-pages docs, in this case this one:
>
> https://github.com/fperez/datarray-doc
>
> A couple of scripts in the docs build file auto-generate all the
> necessary commits and info, and the only manual step needed is to do a
> single push.
>
> So I think what we should do is:
>
> - remove the gh-pages branch right away from the repo, so it doesn't
> grow any larger.
>
> - see if we can do a full purge of that data from the repo (I think
> the added size is ~6MB right now) with git's filter-branch tool
> (http://help.github.com/removing-sensitive-data has some tips).
>
> - add the tools from my datarray repo to handle the process smoothly.
>
>
> Min, if you think you're up for some/all of this let me know, I'm
> still in Colombia but we can skype for me to give you some pointers.
> If not, we can get together back home next week when I return and
> clean this up.
>
> Sorry I didn't catch this earlier when the pull request went up...
>
> Cheers,
>
> f
>
> ps - obviously, please don't add anything at all to the gh-pages
> branch anymore, so we have as little to clean up as possible.
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20110124/d93a3332/attachment.html>


More information about the IPython-dev mailing list