[IPython-dev] tracking nbconvert-generated images in version control

Fri Apr 18 15:00:35 EDT 2014

The answer to this question may well be, "You shouldn't be trying to do 
that," but here goes anyway:

1. The websites for Software Carpentry bootcamps are hosted on GitHub, 
which generates them automatically by running a tool called Jekyll 
whenever content is committed to a repository's gh-pages branch.

2. Jekyll knows how to convert compile Markdown and HTML, but doesn't 
understand IPython Notebooks, so if people have notebooks in their 
bootcamp repository, they have to run nbconvert on their own machine and 
add the generated .md file to the repository.  (Yes, we could do 
something clever with post-commit hooks and continuous integration 
systems, but this seems simpler for our users.)

3. When nbconvert runs, it creates image files on disk for the plots and 
other code-generated visuals in the notebook.  These image files have 
auto-generated names like 01-numpy_76_0.png, and the Markdown/HTML 
generated by nbconvert links to them.

4. We can easily add those images to the version control repository as 
well - but if we move cells around in the notebook, nbconvert will give 
them different names the next time it runs.  We can add *those* images 
to version control too, but what do we do about cleaning out the old 
ones?  One suggestion is to 'git rm' all the generated images before 
re-running nbconvert and trust git to detect the new image and infer 
that we meant to 'git mv', but that feels dangerous.

Is there a cleaner solution?  One that we can explain and justify to 
people who are relatively new to both the notebook and version control, 
and is unlikely to go horribly, horribly wrong (which 'git rm' with 
wildcards well could)?

Thanks,
Greg