[Python-ideas] A new .pyc file format

Mike Meyer mwm at mired.org
Fri Apr 25 16:35:48 CEST 2008

On Fri, 25 Apr 2008 07:44:19 -0300
"Gabriel Genellina" <gagsl-py2 at yahoo.com.ar> wrote:

> Hello
> (Sorry if you get this twice, I can't see my original post from gmane)
> I want to propose a new .pyc file format. Currently .pyc files use a very
> simple format:
> - MAGIC number (4 bytes, little-endian)
> - last modification time of source file (4 bytes, little-endian)
> - code object (marshaled)
> The problem is that this format is *too* simple. It can't be changed, nor
> can accomodate other fields if desired. I propose using a more flexible
> ..pyc format (resembling RIFF files with multiple levels). The layout would
> be as follows:

Ok, *why* is this a problem? What proposed other fields do you have,
other than putting in multiple code segments with different flags?

Beyond that:

> - A section has an identifier (4 bytes, usually ASCII letters), followed
> by its size (4 bytes, not counting the section identifier nor the size
> itself), followed by the actual section content.

AKA Tag/Length/Value triples. While TLV is the common order, it's
slightly easier to deal with them if you go with LTV. You *have* to
deal with the length in order to read things in. Beyond that, you can
treat TV as atomic you don't care about the tag for some reason.

> - 32 bits should be enough for all sizes (and 640k should be enough for
> all people...)

Given that there are people who write code that writes code, and the
memory and disk capacities of modern systems, I'd say this is likely
to cause problems. Given those capacities, 8 byte lengths instead of 4
shouldn't be a problem. For embedded devices - well, they're not going
to like the idea in the first place.

Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.

More information about the Python-ideas mailing list