Summary of .pyc-Discussion so far (was Re: [Python-Dev] Proposal: .pyc file format change)
Peter Funk
pf@artcom-gmbh.de
Tue, 30 May 2000 09:08:15 +0200 (MEST)
Greg Stein:
> I don't think we should have a two-byte magic value. Especially where
> those two bytes are printable, 7-bit ASCII.
[...]
> To ensure uniqueness, I think a four-byte magic should stay.
Looking at /etc/magic I see many 16-bit magic numbers kept around
from the good old days. But you are right: Choosing a four-byte magic
value would make the chance of a clash with some other file format
much less likely.
> I would recommend the approach of adding opcodes into the marshal format.
> Specifically, 'V' followed by a single byte. That can only occur at the
> beginning. If it is not present, then you know that you have an old
> marshal value.
But this would not solve the problem with 8 byte versus 4 byte timestamps
in the header on 64-bit OSes. Trent Mick pointed this out.
I think, the situation we have now, is very unsatisfactory: I don't
see a reasonable solution, which allows us to keep the length of the
header before the marshal-block at a fixed length of 8 bytes together
with a frozen 4 byte magic number.
Moving the version number into the marshal doesn't help to resolve
this conflict. So either you have to accept a new magic on 64 bit
systems or you have to enlarge the header.
To come up with a new proposal, the following questions should be answered:
1. Is there really too much code out there, which depends on
the hardcoded assumption, that the marshal part of a .pyc file
starts at byte 8? I see no further evidence for or against this.
MAL pointed this out in
<http://www.python.org/pipermail/python-dev/2000-May/005756.html>
2. If we decide to enlarge the header, do we really need a new
header field defining the length of the header ?
This was proposed by Christian Tismer in
<http://www.python.org/pipermail/python-dev/2000-May/005792.html>
3. The 'imp' module exposes somewhat the structure of an .pyc file
through the function 'get_magic()'. I proposed changing the signature of
'imp.get_magic()' in an upward compatible way. I also proposed
adding a new function 'imp.get_version()'. What do you think about
this idea?
4. Greg proposed prepending the version number to the marshal
format. If we do this, we definitely need a frozen way to find
out, where the marshalled code object actually starts. This has
also the disadvantage of making the task to come up with a /etc/magic
definition whichs displays the version number of a .pyc file slightly
harder.
If we decide to move the version number into the marshal, if we can
also move the .py-timestamp there. This way the timestamp will be handled
in the same way as large integer literals. Quoting from the docs:
"""Caveat: On machines where C's long int type has more than 32 bits
(such as the DEC Alpha), it is possible to create plain Python
integers that are longer than 32 bits. Since the current marshal
module uses 32 bits to transfer plain Python integers, such values
are silently truncated. This particularly affects the use of very
long integer literals in Python modules -- these will be accepted
by the parser on such machines, but will be silently be truncated
when the module is read from the .pyc instead.
[...]
A solution would be to refuse such literals in the parser, since
they are inherently non-portable. Another solution would be to let
the marshal module raise an exception when an integer value would
be truncated. At least one of these solutions will be implemented
in a future version."""
Should this be 1.6? Changing the format of .pyc files over and over
again in the 1.x series doesn't look very attractive.
Regards, Peter