[Python-ideas] Add more information in the header of pyc files

Serhiy Storchaka storchaka at gmail.com
Tue Apr 10 11:49:36 EDT 2018

The format of the header of pyc files was stable for long time and 
changed only few times. First time it was changed in 3.3: added the size 
of the corresponding source mod 2**32. [1]  Second time it was changed 
in 3.7: added the 32-bit flags field and support of hash-based pyc files 
(PEP 552). [2] [3]

I think that it is worth to make more changed.

1. More stable file signature. Currently the magic number is changed in 
every feature release. Only the third and the forth bytes are stable 
(b'\r\n'), the first bytes are changed non-predicable. The 'py' launcher 
and third-party software like the 'file' command should support the list 
of magic numbers for all existing Python releases, and they can't detect 
pyc file for future versions. There is also a chance the pyc file 
signature will match the signature of other file type by accident. It 
would be better if the first 4 bytes of pyc files be same for all Python 
versions (or at least for all Python versions with the same major number).

2. Include the Python version. Currently the 'py' launcher needs to 
support the table that maps magic numbers to Python version. It can 
recognize only Python versions released before building the launcher. If 
the two major numbers of Python version be included in the version, it 
would not need such table.

3. The number of compatible subversion. Currently the interpreter 
supports only a single magic number. If the updated version of the 
compiler produces more optimal or more correct but compatible bytecode 
(like ), there is no way to say that the new bytecode is preferable, but 
the old bytecode can be used too. Changing the magic number causes 
invalidating all pyc files compiled by the old compiler (see [4] for the 
example of problems caused by this). The header could contain two magic 
numbers: the major magic number should be bumped for incompatible 
changes, the minor magic number should be reset to 0 when the major 
magic number is bumped, and should be bumped when the compiler become 
producing different but compatible bytecode. If the import system reads 
the pyc file with the minor magic number equal or greater than current, 
it just uses the pyc file. If it reads the pyc file with the minor magic 
number lesser than current, it can regenerate the pyc file if it is 
writeable. And the compileall module should regenerate all pyc files 
with minor magic numbers lesser than current.

[1] https://bugs.python.org/issue13645
[2] https://bugs.python.org/issue31650
[3] http://www.python.org/dev/peps/pep-0552/
[4] https://bugs.python.org/issue27286

More information about the Python-ideas mailing list