[Python-ideas] Merge .pyc into .pyo--store multiple code objects in one file?

Larry Hastings larry at hastings.org
Tue Feb 2 05:07:24 CET 2010


A .pyc file is made up of these three elements:
    4 byte magic number
    4 byte timestamp
    marshaled code object
A .pyo file is the same except the code object has been optimized.

I ask you: why gunk up the filesystem with two files when one would do?  
I propose we change the pyc file so it can contain multiple code 
objects.  Or, indeed, multiple arbitrary objects.  The .pyc file could 
become as a general-purpose cache for data relevant to the .py file.  
For example, perhaps the Unladen Swallow guys could cache post-JITted 
code.  Or the wordcode-based interpreter could cache its wordcode 
equivalent.

Lots of implementation choices suggest themselves.  Here's the cheapest 
approach that seems suitable:
    4 byte magic number
    4 byte timestamp
    marshaled array of pairs of ints, alternating "id of object"
        with "relative offset of object from the end of this marshaled 
array"
    marshaled code object 1
    marshaled code object 2
    ...

If you look for your cached object inside and it's not there, you 
compute your object then rewrite the file adding your id and offset to 
the end of the array.  (Your offset will always be the former size of 
the file.)  If the timestamp has changed, blow away all objects and 
start over with an empty array.

If it'd be too disruptive to change .pyc/.pyo files this way, would 
switching to a new file extension be better?

I suspect this probably isn't actually a good idea,


/larry/



More information about the Python-ideas mailing list