[IPython-dev] tracking nbconvert-generated images in version control

W. Trevor King wking at tremily.us
Sat Apr 19 12:04:21 EDT 2014


On Sat, Apr 19, 2014 at 05:07:28PM +0200, Matthias BUSSONNIER wrote:
> I can try to hack nbconvert to name the figure by their hash.
> Hence same figure would have same hash across conversions.
> (supposing the generated figures are identical of course)

I don't think it's file name changes themselves that are a problem,
it's distinguishing auto-generated files (which can be safely removed)
from manually-generated files (which we don't want to remove, or even
clobber).

On Fri, Apr 18, 2014 at 03:00:35PM -0400, Greg Wilson wrote:
> 3. When nbconvert runs, it creates image files on disk for the plots
> and other code-generated visuals in the notebook.  These image files
> have auto-generated names like 01-numpy_76_0.png, and the
> Markdown/HTML generated by nbconvert links to them.
> 
> 4. We can easily add those images to the version control repository
> as well - but if we move cells around in the notebook, nbconvert
> will give them different names the next time it runs.  We can add
> *those* images to version control too, but what do we do about
> cleaning out the old ones?

If someone edited the notebook by only reordering cells (or making
other tweaks that don't change the generated images), then hashed
names would work.  However, if the generated files change (which could
happen if one builder just has a different version of matplotlib),
we're still going to have the “cleanup old files and add the new
files” problem.

I think a better solution would be to shift the the auto-generated
files into a location that cannot be confused with manually-generated
files.  There is already work in this direction with 8ec29ff (Move
extracted files into their own subdir, 2013-07-22, landed in 1.0.0),
which started saving images (for example) in:

  01-numpy_files/76-0.png

However, the output files are still going to depend (obviously) on
which IPython version (etc.) the builder is using.  Maybe Greg is
still using something from before 1.0.0?

Interestingly, my 1.2.1 seems to have an issue with nesting:

  $ make novice/python/01-numpy.md
  ipython nbconvert --template=_templates/ipynb.tpl --to=markdown --output="novice/python/01-numpy" "novice/python/01-numpy.ipynb"
  [NbConvertApp] Using existing profile dir: u'/home/wking/.config/ipython/profile_default'
  [NbConvertApp] Converting notebook novice/python/01-numpy.ipynb to markdown
  [NbConvertApp] Support files will be in novice/python/01-numpy_files/
  [NbConvertApp] Loaded template _templates/ipynb.tpl
  [NbConvertApp] Making directory novice/python/01-numpy_files/novice/python
  [NbConvertApp] Writing 24888 bytes to novice/python/01-numpy.md
  $ ls novice/python/01-numpy_files/novice/python/
  …
  01-numpy_76_0.png
  …

Anyhow, it should be safe to 'git rm -r novice/python/01-numpy_files'
and just have manually-generated files live in the same directory as
the notebook (novice/python/my-manual-image.png).

On Fri, Apr 18, 2014 at 12:06:31PM -0700, Nathan Goldbaum wrote:
> RunNotebook consists of two sphinx extensions that take an
> unevaluated notebook and convert it into a form suitable for
> inclusion in sphinx HTML documentation.  That way we get small
> notebooks that contain only text added to our documentation version
> control but also full evaluated notebooks in the documentation we
> publish online.

I like this approach best ;).  Instead of guessing which parts of the
source tree are autogenrated and which are not, just keep all the
source in one branch, and put all the auto-generated stuff in another
(à la Git's git-htmldocs and git-manpages repositories [1]).

Cheers,
Trevor

[1]: http://git-blame.blogspot.com/p/git-public-repositories.html

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140419/b8c610cb/attachment.sig>


More information about the IPython-dev mailing list