[Python-Dev] PEP 552: single magic number

Antoine Pitrou solipsis at pitrou.net
Fri Sep 8 06:38:29 EDT 2017


On Fri, 8 Sep 2017 12:04:52 +0200
Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Thu, 7 Sep 2017 18:47:20 -0700
> Nick Coghlan <ncoghlan at gmail.com> wrote:
> > However, I do wonder whether we could encode *all* the mode settings
> > into the magic number, such that we did something like reserving the
> > top 3 bits for format flags:
> > 
> > * number & 0x1FFF -> the traditional magic number
> > * number & 0x8000 -> timestamp or hash?
> > * number & 0x4000 -> checked or not?
> > * number & 0x2000 -> reserved for future format changes  
> 
> I'd rather a single magic number and a separate bitfield that tells
> what the header encodes exactly.  We don't *have* to fight for a tiny
> size reduction of pyc files.

Let me expand a bit on this.  Currently, the format is:

- bytes 0..3: magic number
- bytes 4..7: source file timestamp
- bytes 8..11: source file size
- bytes 12+: pyc file body (marshal format)

What I'm proposing is:

- bytes 0..3: magic number
- bytes 4..7: header options (bitfield)
- bytes 8..15: header contents
   Depending on header options:
    - bytes 8..11: source file timestamp
    - bytes 12..15: source file size
   or:
    - bytes 8..15: 64-bit source file hash
- bytes 16+: pyc file body (marshal format)

This way, we keep a single magic number, a single header size, and
there's only a per-build variation in the middle of the header.


Of course, there are possible ways to encode information.  For
example, the header could be a sequence of Type-Length-Value triplets,
perhaps prefixed with header size or body offset for easy seeking.

My whole point here is that we can easily avoid the annoyance of dual
magic numbers and encodings which must be maintained in parallel.

Regards

Antoine.




More information about the Python-Dev mailing list