
Peter Funk wrote:
Greg Stein:
I don't think we should have a two-byte magic value. Especially where those two bytes are printable, 7-bit ASCII. [...] To ensure uniqueness, I think a four-byte magic should stay.
Looking at /etc/magic I see many 16-bit magic numbers kept around from the good old days. But you are right: Choosing a four-byte magic value would make the chance of a clash with some other file format much less likely.
Just for quotes: the current /etc/magic I have on my Linux machine doesn't know anything about PYC or PYO files, so I don't really see much of a problem here -- noone seems to be interested in finding out the file type for these files anyway ;-) Also, I don't really get the 16-bit magic argument: we still have a 32-bit magic number -- one with a 16-bit fixed value and predefined ranges for the remaining 16 bits. This already is much better than what we have now w/r to making file(1) work on PYC files.
I would recommend the approach of adding opcodes into the marshal format. Specifically, 'V' followed by a single byte. That can only occur at the beginning. If it is not present, then you know that you have an old marshal value.
But this would not solve the problem with 8 byte versus 4 byte timestamps in the header on 64-bit OSes. Trent Mick pointed this out.
The switch to 8 byte timestamps is only needed when the current 4 bytes can no longer hold the timestamp value. That will happen in 2038... Note that import.c writes the timestamp in 4 bytes until it reaches an overflow situation.
I think, the situation we have now, is very unsatisfactory: I don't see a reasonable solution, which allows us to keep the length of the header before the marshal-block at a fixed length of 8 bytes together with a frozen 4 byte magic number.
Adding a version to the marshal format is a Good Thing -- independent of this discussion.
Moving the version number into the marshal doesn't help to resolve this conflict. So either you have to accept a new magic on 64 bit systems or you have to enlarge the header.
No you don't... please read the code: marshal only writes 8 bytes in case 4 bytes aren't enough to hold the value.
To come up with a new proposal, the following questions should be answered: 1. Is there really too much code out there, which depends on the hardcoded assumption, that the marshal part of a .pyc file starts at byte 8? I see no further evidence for or against this. MAL pointed this out in <http://www.python.org/pipermail/python-dev/2000-May/005756.html>
I have several references in my tool collection, the import stuff uses it, old import hooks (remember ihooks ?) also do, etc.
2. If we decide to enlarge the header, do we really need a new header field defining the length of the header ? This was proposed by Christian Tismer in <http://www.python.org/pipermail/python-dev/2000-May/005792.html>
In Py3K we can do this right (breaking things is allowed)... and I agree with Christian that a proper file format needs a header length field too. Basically, these values have to be present, IMHO: 1. Magic 2. Version 3. Length of Header 4. (Header Attribute)*n -- Start of Data --- Header Attribute can be pretty much anything -- timestamps, names of files or other entities, bit sizes, architecture flags, optimization settings, etc.
3. The 'imp' module exposes somewhat the structure of an .pyc file through the function 'get_magic()'. I proposed changing the signature of 'imp.get_magic()' in an upward compatible way. I also proposed adding a new function 'imp.get_version()'. What do you think about this idea?
imp.get_magic() would have to return the proposed 32-bit value ('PY' + version byte + option byte). I'd suggest adding additional functions which can read and write the header given a PYCHeader object which would hold the values version and options.
4. Greg proposed prepending the version number to the marshal format. If we do this, we definitely need a frozen way to find out, where the marshalled code object actually starts. This has also the disadvantage of making the task to come up with a /etc/magic definition whichs displays the version number of a .pyc file slightly harder.
If we decide to move the version number into the marshal, if we can also move the .py-timestamp there. This way the timestamp will be handled in the same way as large integer literals. Quoting from the docs:
"""Caveat: On machines where C's long int type has more than 32 bits (such as the DEC Alpha), it is possible to create plain Python integers that are longer than 32 bits. Since the current marshal module uses 32 bits to transfer plain Python integers, such values are silently truncated. This particularly affects the use of very long integer literals in Python modules -- these will be accepted by the parser on such machines, but will be silently be truncated when the module is read from the .pyc instead. [...] A solution would be to refuse such literals in the parser, since they are inherently non-portable. Another solution would be to let the marshal module raise an exception when an integer value would be truncated. At least one of these solutions will be implemented in a future version."""
Should this be 1.6? Changing the format of .pyc files over and over again in the 1.x series doesn't look very attractive.
-- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/