[Python-Dev] PEP 552: deterministic pycs
Gregory P. Smith
greg at krypto.org
Thu Sep 7 19:58:26 EDT 2017
+1 on this PEP.
The TL;DR summary of this PEP:
The pyc date+length metadata check was a convenient hack. It still works
well for many people and use cases, it isn't going away.
PEP 552 proposes a new alternate hack that relies on file contents
instead of os and filesystem date metadata.
Assumption: The hash function is significantly faster than re-parsing
the source. (guaranteed to be true)
Input from OS package distributors would be interesting. Would they use
this? Which way would it impact their startup time (loading the .py file
vs just statting it. does that even matter? source files are often
eventually loaded for linecache use in tracebacks anyways)?
Would they benefit from a pyc that can contain _both_ timestamp+length, and
the source_hash? if both were present, I assume that only one would be
checked at startup. i'm not sure what would make the decision of what to
check. one fails, check the other? i personally do not have a use for
this case so i'd omit the complexity without a demonstrated need.
Something to also state in the PEP:
This is intentionally not a "secure" hash. Security is explicitly a
Rationale behind my support:
We use a superset of Bazel at Google (unsurprising) and have had to jump
through a lot of messy hoops to deal with timestamp metadata winding up in
output files vs deterministic builds. What Benjamin describes here sounds
exactly like what we would want.
It allows deterministic builds in distributed build and cached operation
systems where timestamps are never going to be guaranteed.
It allows the check to work on filesystems which do not preserve timestamps.
Also importantly, it allows the check to be disabled via the check_source
bit. Today we use a modified importer at work that skips checking
timestamps anyways as the way we ship applications where the entire set of
dependencies present is already guaranteed at build time to be correct and
being modified at runtime is not possible or not a concern. This PEP would
avoid the need for an extra importer or modified interpreter logic to make
On Thu, Sep 7, 2017 at 3:47 PM Benjamin Peterson <benjamin at python.org>
> On Thu, Sep 7, 2017, at 14:43, Guido van Rossum wrote:
> > On Thu, Sep 7, 2017 at 2:40 PM, Benjamin Peterson <benjamin at python.org>
> > wrote:
> > >
> > >
> > > On Thu, Sep 7, 2017, at 14:19, Guido van Rossum wrote:
> > > > Nice one.
> > > >
> > > > It would be nice to specify the various APIs needed as well.
> > >
> > > The compileall and py_compile ones?
> > >
> > Yes, and the SipHash mod to specify the key you mentioned.
> > >
> > > > Why do you keep the mtime-based format as an option? (Maybe because
> > > > faster? Did you measure it?)
> > >
> > > I haven't actually measured anything, but stating a file will
> > > be faster than reading it completely and hashing it. I suppose if the
> > > speed difference between timestamp-based and hash-based pycs turned out
> > > to be small we could feel good about dropping the timestamp format
> > > completely. However, that difference might be hard to determine
> > > definitely as I expect the speed hit will vary widely based on system
> > > parameters such as disk speed and page cache size.
> > >
> > > My goal in this PEP was to preserve the current pyc invalidation
> > > behavior, which works well today for many use cases, as the default.
> > > hash-based pycs are reserved for distribution and other power use
> > >
> > OK, maybe you can clarify that a bit in the PEP.
> I've added a paragraph to the Rationale section.
> Python-Dev mailing list
> Python-Dev at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev