Summary of .pyc-Discussion so far (was Re: [Python-Dev] Proposal:
.pyc file format change)
M.-A. Lemburg
mal@lemburg.com
Tue, 30 May 2000 10:10:25 +0200
Peter Funk wrote:
>
> Greg Stein:
> > I don't think we should have a two-byte magic value. Especially where
> > those two bytes are printable, 7-bit ASCII.
> [...]
> > To ensure uniqueness, I think a four-byte magic should stay.
>
> Looking at /etc/magic I see many 16-bit magic numbers kept around
> from the good old days. But you are right: Choosing a four-byte magic
> value would make the chance of a clash with some other file format
> much less likely.
Just for quotes: the current /etc/magic I have on my Linux
machine doesn't know anything about PYC or PYO files, so I
don't really see much of a problem here -- noone seems to
be interested in finding out the file type for these
files anyway ;-)
Also, I don't really get the 16-bit magic argument: we still
have a 32-bit magic number -- one with a 16-bit fixed value and
predefined ranges for the remaining 16 bits. This already is
much better than what we have now w/r to making file(1) work
on PYC files.
> > I would recommend the approach of adding opcodes into the marshal format.
> > Specifically, 'V' followed by a single byte. That can only occur at the
> > beginning. If it is not present, then you know that you have an old
> > marshal value.
>
> But this would not solve the problem with 8 byte versus 4 byte timestamps
> in the header on 64-bit OSes. Trent Mick pointed this out.
The switch to 8 byte timestamps is only needed when the current
4 bytes can no longer hold the timestamp value. That will happen
in 2038...
Note that import.c writes the timestamp in 4 bytes until it
reaches an overflow situation.
> I think, the situation we have now, is very unsatisfactory: I don't
> see a reasonable solution, which allows us to keep the length of the
> header before the marshal-block at a fixed length of 8 bytes together
> with a frozen 4 byte magic number.
Adding a version to the marshal format is a Good Thing --
independent of this discussion.
> Moving the version number into the marshal doesn't help to resolve
> this conflict. So either you have to accept a new magic on 64 bit
> systems or you have to enlarge the header.
No you don't... please read the code: marshal only writes
8 bytes in case 4 bytes aren't enough to hold the value.
> To come up with a new proposal, the following questions should be answered:
> 1. Is there really too much code out there, which depends on
> the hardcoded assumption, that the marshal part of a .pyc file
> starts at byte 8? I see no further evidence for or against this.
> MAL pointed this out in
> <http://www.python.org/pipermail/python-dev/2000-May/005756.html>
I have several references in my tool collection, the import
stuff uses it, old import hooks (remember ihooks ?) also do, etc.
> 2. If we decide to enlarge the header, do we really need a new
> header field defining the length of the header ?
> This was proposed by Christian Tismer in
> <http://www.python.org/pipermail/python-dev/2000-May/005792.html>
In Py3K we can do this right (breaking things is allowed)...
and I agree with Christian that a proper file format needs
a header length field too. Basically, these values have to
be present, IMHO:
1. Magic
2. Version
3. Length of Header
4. (Header Attribute)*n
-- Start of Data ---
Header Attribute can be pretty much anything -- timestamps,
names of files or other entities, bit sizes, architecture
flags, optimization settings, etc.
> 3. The 'imp' module exposes somewhat the structure of an .pyc file
> through the function 'get_magic()'. I proposed changing the signature of
> 'imp.get_magic()' in an upward compatible way. I also proposed
> adding a new function 'imp.get_version()'. What do you think about
> this idea?
imp.get_magic() would have to return the proposed 32-bit value
('PY' + version byte + option byte).
I'd suggest adding additional functions which can read and write the
header given a PYCHeader object which would hold the
values version and options.
> 4. Greg proposed prepending the version number to the marshal
> format. If we do this, we definitely need a frozen way to find
> out, where the marshalled code object actually starts. This has
> also the disadvantage of making the task to come up with a /etc/magic
> definition whichs displays the version number of a .pyc file slightly
> harder.
>
> If we decide to move the version number into the marshal, if we can
> also move the .py-timestamp there. This way the timestamp will be handled
> in the same way as large integer literals. Quoting from the docs:
>
> """Caveat: On machines where C's long int type has more than 32 bits
> (such as the DEC Alpha), it is possible to create plain Python
> integers that are longer than 32 bits. Since the current marshal
> module uses 32 bits to transfer plain Python integers, such values
> are silently truncated. This particularly affects the use of very
> long integer literals in Python modules -- these will be accepted
> by the parser on such machines, but will be silently be truncated
> when the module is read from the .pyc instead.
> [...]
> A solution would be to refuse such literals in the parser, since
> they are inherently non-portable. Another solution would be to let
> the marshal module raise an exception when an integer value would
> be truncated. At least one of these solutions will be implemented
> in a future version."""
>
> Should this be 1.6? Changing the format of .pyc files over and over
> again in the 1.x series doesn't look very attractive.
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/