importing .pyc-files generated by Python 1.5.2 in Python 1.6. Why not?
Python 1.6 reports a bad magic error, when someone tries to import a .pyc file compiled by Python 1.5.2. AFAIK only new features have been added. So why it isn't possible to use these old files in Python 1.6? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)
Because of the unicode changes, AFAIK. Or was it the multi-arg vs single arg append and friends. Anyway, the point is that their were incompatible changes made, and thus, the magic was changed. --Dan
Python 1.6 reports a bad magic error, when someone tries to import a .pyc file compiled by Python 1.5.2. AFAIK only new features have been added. So why it isn't possible to use these old files in Python 1.6?
Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)
On Tue, 23 May 2000, Peter Funk wrote:
Python 1.6 reports a bad magic error, when someone tries to import a .pyc file compiled by Python 1.5.2. AFAIK only new features have been added. So why it isn't possible to use these old files in Python 1.6?
Peter, In theory, perhaps it could; I don't know if the extra work is worth it, however. What's happening is that the .pyc magic number changed because the marshal format has been extended to support Unicode string objects. The old format should still be readable, but there's nothing in the .pyc loader that supports the acceptance of multiple versions of the marshal format. Is there reason to think that's a substantial problem for users, given the automatic recompilation of bytecode from source? The only serious problems I can see are when multiple versions of the interpreter are being used on the same collection of source files (because the re-compilation occurs more often and affects performance), and when *only* .pyc/.pyo files are available. Do you have reason to suspect that either case is sufficiently common to complicate the .pyc loader, or is there another reason that I've missed (very possible, I admit)? -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
Fred, Thank you for your quick response. Fred L. Drake:
Peter, In theory, perhaps it could; I don't know if the extra work is worth it, however. [...] Do you have reason to suspect that either case is sufficiently common to complicate the .pyc loader, or is there another reason that I've missed (very possible, I admit)?
Well, currently we (our company) deliver no source code to our customers. I don't want to discuss this policy and the reasoning behind here. But this situation may also apply to other commercial software vendors using Python. During late 2000 there may be several customers out there running Python 1.6 and others still running Python 1.5.2. So we will have several choices to deal with this situation: 1. Supply two different binary distribution packages: one containing 1.5.2 .pyc files and one containing 1.6 .pyc files. This will introduce some new logistic problems. 2. Upgrade to Python 1.6 at each customer site at once. This will be difficult. 3. Patch the 1.6 unmarshaller to support loading 1.5.2 .pyc files and supply our own patched Python distribution. (and this would also be "carrying owls to Athen" for Linux systems) [I don't know whether this ^^^^^^^^^^^^^^^^^^^^^^ careless translated german idiom makes any sense in english ;-) ] I personally don't like this. 4. Change our policy and distribute also the .py sources. Beside the difficulty to convince the management about this one, this also introduces new technical "challenges". The Unix text files have to be converted from LF lineends to CR lineends or MacPython wouldn't be able to parse the files. So the mac source distributions must be build from a different directory tree. No choice looks very attractive. Adding a '|| (magic == 0x994e)' or some such somewhere in the 1.6 unmarshaller should do the trick. But I don't want to submit a patch, if God^H^HGuido thinks, this isn't worth the effort. <wink> Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)
Peter Funk <pf@artcom-gmbh.de>:
(and this would also be "carrying owls to Athen" for Linux systems) [I don't know whether this ^^^^^^^^^^^^^^^^^^^^^^ careless translated german idiom makes any sense in english ;-) ]
There is a precise equivalent: "carrying coals to Newcastle". -- <a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a> "Are we to understand," asked the judge, "that you hold your own interests above the interests of the public?" "I hold that such a question can never arise except in a society of cannibals." -- Ayn Rand
On Tue, 23 May 2000, Eric S. Raymond wrote:
Peter Funk <pf@artcom-gmbh.de>:
(and this would also be "carrying owls to Athen" for Linux systems) [I don't know whether this ^^^^^^^^^^^^^^^^^^^^^^ careless translated german idiom makes any sense in english ;-) ]
There is a precise equivalent: "carrying coals to Newcastle".
That's interesting... I've never heard either, but I think I can guess the meaning now. I agree; it looks like there's some work to do in getting the .pyc loader to be a little more concerned about importing compatible marshal formats. I have an idea about how I'd like to see in done which may be a little less magical. I'll work up a patch later this week. I won't check in any changes for this until we've heard from Guido on the matter, and he'll probably be unavailable for the next couple of days. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
3. Patch the 1.6 unmarshaller to support loading 1.5.2 .pyc files
I agree that this is the correct solution.
No choice looks very attractive. Adding a '|| (magic == 0x994e)' or some such somewhere in the 1.6 unmarshaller should do the trick. But I don't want to submit a patch, if God^H^HGuido thinks, this isn't worth the effort. <wink>
That's BDFL for you, thank you. ;-) Before accepting the trivial patch, I would like to see some analysis that shows that in fact all 1.5.2 .pyc files work correctly with 1.6. This means you have to prove that (a) the 1.5.2 marshal format is a subset of the 1.6 marshal format (easy enough probably) and (b) the 1.5.2 bytecode opcodes are a subset of the 1.6 bytecode opcodes. That one seems a little trickier; I don't remember if we moved opcodes or changed existing opcodes' semantics. You may be lucky, but it will cause an extra constraint on the evolution of the bytecode, so I'm somewhat reluctant. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
3. Patch the 1.6 unmarshaller to support loading 1.5.2 .pyc files
I agree that this is the correct solution.
No choice looks very attractive. Adding a '|| (magic == 0x994e)' or some such somewhere in the 1.6 unmarshaller should do the trick. But I don't want to submit a patch, if God^H^HGuido thinks, this isn't worth the effort. <wink>
That's BDFL for you, thank you. ;-)
Before accepting the trivial patch, I would like to see some analysis that shows that in fact all 1.5.2 .pyc files work correctly with 1.6. This means you have to prove that (a) the 1.5.2 marshal format is a subset of the 1.6 marshal format (easy enough probably) and (b) the 1.5.2 bytecode opcodes are a subset of the 1.6 bytecode opcodes. That one seems a little trickier; I don't remember if we moved opcodes or changed existing opcodes' semantics. You may be lucky, but it will cause an extra constraint on the evolution of the bytecode, so I'm somewhat reluctant.
Be assured, I know the opcodes by heart. We only appended to the end of opcode space, there are no changes. But I can't tell about marshal. ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com
[...about accepting 1.5.2 generated .pyc files...] Guido van Rossum:
Before accepting the trivial patch, I would like to see some analysis that shows that in fact all 1.5.2 .pyc files work correctly with 1.6.
Would it be sufficient, if a Python 1.6a2 interpreter executable containing such a trivial patch is able to process the test suite in a 1.5.2 tree with all the .py-files removed? But some list.append calls with multiple args might cause errors.
This means you have to prove that (a) the 1.5.2 marshal format is a subset of the 1.6 marshal format (easy enough probably) and (b) the 1.5.2 bytecode opcodes are a subset of the 1.6 bytecode opcodes. That one seems a little trickier; I don't remember if we moved opcodes or changed existing opcodes' semantics. You may be lucky, but it will cause an extra constraint on the evolution of the bytecode, so I'm somewhat reluctant.
I feel the byte code format is rather mature and future evolution is unlikely to remove or move opcodes to new values or change the semantics of existing opcodes in an incompatible way. As has been shown, it is even possible to solve the 1/2 == 0.5 issue with upward compatible extension of the format. But I feel unable to provide a formal proof other than comparing 1.5.2/Include/opcode.h, 1.5.2/Python/marshal.c and import.c with the 1.6 ones. There are certainly others here on python-dev who can do better. Christian? BTW: import.c contains the following comment: /* XXX Perhaps the magic number should be frozen and a version field added to the .pyc file header? */ Judging from my decade long experience with exotic image and CAD data formats I think this is always the way to go for binary data files. Using this method newer versions of a program can always recognize the file format version and convert files generated by older versions in an appropriate way. Regards, Peter
Peter Funk <pf@artcom-gmbh.de>:
BTW: import.c contains the following comment: /* XXX Perhaps the magic number should be frozen and a version field added to the .pyc file header? */
Judging from my decade long experience with exotic image and CAD data formats I think this is always the way to go for binary data files. Using this method newer versions of a program can always recognize the file format version and convert files generated by older versions in an appropriate way.
I have similar experience, notably with hacking graphics file formats. I concur with this recommendation. -- <a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a> The end move in politics is always to pick up a gun. -- R. Buckminster Fuller
On Wed, 24 May 2000, Eric S. Raymond wrote:
Peter Funk <pf@artcom-gmbh.de>:
BTW: import.c contains the following comment: /* XXX Perhaps the magic number should be frozen and a version field added to the .pyc file header? */
Judging from my decade long experience with exotic image and CAD data formats I think this is always the way to go for binary data files. Using this method newer versions of a program can always recognize the file format version and convert files generated by older versions in an appropriate way.
I have similar experience, notably with hacking graphics file formats. I concur with this recommendation.
One more +1 here. In another thread (right now, actually), I'm discussing how you can hook up Linux to recognize .pyc files and directly execute them with the Python interpreter (e.g. no need for #!/usr/bin/env python at the head of the file). But if that magic number keeps changing, then it makes it a bit harder to set this up. Cheers, -g -- Greg Stein, http://www.lyra.org/
Given Christian Tismer's testimonial and inspection of marshal.c, I think Peter's small patch is acceptable. A bigger question is whether we should freeze the magic number and add a version number. In theory I'm all for that, but it means more changes; there are several tools (e.c. Lib/py_compile.py, Tools/freeze/modulefinder.py and Tools/scripts/checkpyc.py) that have intimate knowledge of the .pyc file format that would have to be modified to match. The current format of a .pyc file is as follows: bytes 0-3 magic number bytes 4-7 timestamp (mtime of .py file) bytes 8-* marshalled code object The magic number itself is used to convey various bits of information, all implicit: - the Python version - whether \r and \n are swapped (some old Mac compilers did this) - whether all string literals are Unicode (experimental -U flag) The current (1.6) value of the magic number (as a string -- the .pyc file format is byte order independent) is '\374\304\015\012' on most platforms; it's '\374\304\012\015' for the old Mac compilers mentioned; and it's '\375\304\015\012' with -U. Can anyone come up with a proposal? I'm swamped! --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido van Rossum]:
Given Christian Tismer's testimonial and inspection of marshal.c, I think Peter's small patch is acceptable.
A bigger question is whether we should freeze the magic number and add a version number. In theory I'm all for that, but it means more changes; there are several tools (e.c. Lib/py_compile.py, Tools/freeze/modulefinder.py and Tools/scripts/checkpyc.py) that have intimate knowledge of the .pyc file format that would have to be modified to match.
The current format of a .pyc file is as follows:
bytes 0-3 magic number bytes 4-7 timestamp (mtime of .py file) bytes 8-* marshalled code object
Proposal: The future format (Python 1.6 and newer) of a .pyc file should be as follows: bytes 0-3 a new magic number, which should be definitely frozen in 1.6. bytes 4-7 a version number (which should be == 1 in Python 1.6) bytes 8-11 timestamp (mtime of .py file) (same as earlier) bytes 12-* marshalled code object (same as earlier)
The magic number itself is used to convey various bits of information, all implicit: [...] This mechanism to construct the magic number should not be changed.
But now once again a new value must be choosen to prevent havoc with .pyc files floating around, where people already played with the Python 1.6 alpha releases. But this change should be definitely the last one, which will ever happen during the future life time of Python. The unmarshaller should do the following with the magic number read: If the read magic is the old magic number from 1.5.2, skip reading a version number and assume 0 as the version number. If the read magic is this new value instead, it should also read the version number and raise a new 'ByteCodeToNew' exception, if the read version number is greater than a #defind version number of this Python interpreter. If future incompatible extensions to the byte code format will happen, then this number should be incremented to 2, 3 and so on. For safety, 'imp.get_magic()' should return the old 1.5.2 magic number and only 'imp.get_magic(imp.PYC_FINAL)' should return the new final magic number. A new function 'imp.get_version()' should be introduced, which will return the current compiled in version number of this Python interpreter. Of course all Python modules reading .pyc files must be changed ccordingly, so that are able to deal with new .pyc files. This shouldn't be too hard. This proposed change of the .pyc file format must be described in the final Python 1.6 annoucement, if there are people out there, who borrowed code from 'Tools/scripts/checkpyc.py' or some such. Regards, Peter
Peter Funk wrote:
[Guido van Rossum]:
Given Christian Tismer's testimonial and inspection of marshal.c, I think Peter's small patch is acceptable.
A bigger question is whether we should freeze the magic number and add a version number. In theory I'm all for that, but it means more changes; there are several tools (e.c. Lib/py_compile.py, Tools/freeze/modulefinder.py and Tools/scripts/checkpyc.py) that have intimate knowledge of the .pyc file format that would have to be modified to match.
The current format of a .pyc file is as follows:
bytes 0-3 magic number bytes 4-7 timestamp (mtime of .py file) bytes 8-* marshalled code object
Proposal: The future format (Python 1.6 and newer) of a .pyc file should be as follows:
bytes 0-3 a new magic number, which should be definitely frozen in 1.6. bytes 4-7 a version number (which should be == 1 in Python 1.6) bytes 8-11 timestamp (mtime of .py file) (same as earlier) bytes 12-* marshalled code object (same as earlier)
This will break all tools relying on having the code object available in bytes[8:] and believe me: there are lots of those around ;-) You cannot really change the file header, only add things to the end of the PYC file... Hmm, or perhaps we should move the version number to the code object itself... after all, the changes we want to refer to using the version number are located in the code object and not the PYC file layout. Unmarshalling it would then raise the error. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
[M.-A. Lemburg]:
Proposal: The future format (Python 1.6 and newer) of a .pyc file should be as follows:
bytes 0-3 a new magic number, which should be definitely frozen in 1.6. bytes 4-7 a version number (which should be == 1 in Python 1.6) bytes 8-11 timestamp (mtime of .py file) (same as earlier) bytes 12-* marshalled code object (same as earlier)
This will break all tools relying on having the code object available in bytes[8:] and believe me: there are lots of those around ;-)
In some way, this is intentional: If these tools (are there are really that many out there, that munge with .pyc byte code files?) simply use 'imp.get_magic()' and then silently assume a specific content of the marshalled code object, they probably need changes anyway, since the code needed to deal with the new unicode object is missing from them.
You cannot really change the file header, only add things to the end of the PYC file...
Why? Will this idea really cause such earth quaking grumbling? Please review this in the context of my proposal to change 'imp.get_magic()' to return the old 1.5.2 MAGIC, when called without parameter.
Hmm, or perhaps we should move the version number to the code object itself... after all, the changes we want to refer to using the version number are located in the code object and not the PYC file layout. Unmarshalling it would then raise the error.
Since the file layout is a very thin layer around the marshalled code object, this makes really no big difference to me. But it will be harder to come up with reasonable entries for /etc/magic [1] and similar mechanisms. Putting the version number at the end of file is possible. But such a solution is some what "dirty" and only gives the false impression that the general file layout (pyc[8:] instead of pyc[12:]) is something you can rely on until the end of time. Hardcoding the size of an unpadded header (something like using buffer[8:]) is IMO bad style anyway. Regards, Peter [1]: /etc/magic on Unices is a small textual data base used by the 'file' command to identify the type of a file by looking at the first few bytes. Unix file managers may either use /etc/magic directly or a similar scheme to asciociate files with mimetypes and/or default applications.
Peter Funk wrote:
[M.-A. Lemburg]:
Proposal: The future format (Python 1.6 and newer) of a .pyc file should be as follows:
bytes 0-3 a new magic number, which should be definitely frozen in 1.6. bytes 4-7 a version number (which should be == 1 in Python 1.6) bytes 8-11 timestamp (mtime of .py file) (same as earlier) bytes 12-* marshalled code object (same as earlier)
This will break all tools relying on having the code object available in bytes[8:] and believe me: there are lots of those around ;-)
In some way, this is intentional: If these tools (are there are really that many out there, that munge with .pyc byte code files?) simply use 'imp.get_magic()' and then silently assume a specific content of the marshalled code object, they probably need changes anyway, since the code needed to deal with the new unicode object is missing from them.
That's why I proposed to change the marshalled code object and not the PYC file: the problem is not only related to PYC files, it touches all areas where marshal is used. If you try to load a code object using Unicode in Python 1.5 you'll get all sorts of errors, e.g. EOFError, SystemError. Since marshal uses a specific format, that format should receive the version number. Ideally that version would be prepended to the format (not sure whether this is possible), so that the PYC file layout would then look like this: word 0: magic word 1: timestamp word 2: version in the marshalled code object word 3-*: rest of the marshalled code object Please make sure that options such as the -U option are also respected... -- A different approach to all this would be fixing only the first two bytes of the magic word, e.g. byte 0: 'P' byte 1: 'Y' byte 2: version number (counting from 1) byte 3: option byte (8 bits: one for each option; bit 0: -U cmd switch) This would be b/w compatible and still provide file(1) with enough information to be able to tell the file type. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
"M.-A. Lemburg" wrote:
Peter Funk wrote:
[M.-A. Lemburg]:
Proposal: The future format (Python 1.6 and newer) of a .pyc file should be as follows:
bytes 0-3 a new magic number, which should be definitely frozen in 1.6. bytes 4-7 a version number (which should be == 1 in Python 1.6) bytes 8-11 timestamp (mtime of .py file) (same as earlier) bytes 12-* marshalled code object (same as earlier)
<snip/>
A different approach to all this would be fixing only the first two bytes of the magic word, e.g.
byte 0: 'P' byte 1: 'Y' byte 2: version number (counting from 1) byte 3: option byte (8 bits: one for each option; bit 0: -U cmd switch)
This would be b/w compatible and still provide file(1) with enough information to be able to tell the file type.
I think this approach is simple and powerful enough to survive Py3000. Peter's approach is of course nicer and cleaner from a "redo from scratch" point of view. But then, I'd even vote for a better format that includes another field which names the header size explicitly. For simplicity, comapibility and ease of change, I vote with +1 for adopting the solution of byte 0: 'P' byte 1: 'Y' byte 2: version number (counting from 1) byte 3: option byte (8 bits: one for each option; bit 0: -U cmd switch) If that turns out to be insufficient in some future, do a complete redesign. ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com
[...]
For simplicity, comapibility and ease of change, I vote with +1 for adopting the solution of
byte 0: 'P' byte 1: 'Y' byte 2: version number (counting from 1) byte 3: option byte (8 bits: one for each option; bit 0: -U cmd switch)
If that turns out to be insufficient in some future, do a complete redesign.
What about the CR/LF issue with some Mac Compilers (see Guido's mail for details)? Can we simply drop this? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)
Peter Funk wrote:
[...]
For simplicity, comapibility and ease of change, I vote with +1 for adopting the solution of
byte 0: 'P' byte 1: 'Y' byte 2: version number (counting from 1) byte 3: option byte (8 bits: one for each option; bit 0: -U cmd switch)
If that turns out to be insufficient in some future, do a complete redesign.
What about the CR/LF issue with some Mac Compilers (see Guido's mail for details)? Can we simply drop this?
Well, forgot about that. How about swapping bytes 0 and 1? -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com
I don't think we should have a two-byte magic value. Especially where those two bytes are printable, 7-bit ASCII. "But it is four bytes," you say. Nope. It is two plus a couple parameters that can now change over time. To ensure uniqueness, I think a four-byte magic should stay. I would recommend the approach of adding opcodes into the marshal format. Specifically, 'V' followed by a single byte. That can only occur at the beginning. If it is not present, then you know that you have an old marshal value. Cheers, -g On Sun, 28 May 2000, Christian Tismer wrote:
Peter Funk wrote:
[...]
For simplicity, comapibility and ease of change, I vote with +1 for adopting the solution of
byte 0: 'P' byte 1: 'Y' byte 2: version number (counting from 1) byte 3: option byte (8 bits: one for each option; bit 0: -U cmd switch)
If that turns out to be insufficient in some future, do a complete redesign.
What about the CR/LF issue with some Mac Compilers (see Guido's mail for details)? Can we simply drop this?
Well, forgot about that. How about swapping bytes 0 and 1?
-- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://www.python.org/mailman/listinfo/python-dev
-- Greg Stein, http://www.lyra.org/
Greg Stein:
I don't think we should have a two-byte magic value. Especially where those two bytes are printable, 7-bit ASCII. [...] To ensure uniqueness, I think a four-byte magic should stay.
Looking at /etc/magic I see many 16-bit magic numbers kept around from the good old days. But you are right: Choosing a four-byte magic value would make the chance of a clash with some other file format much less likely.
I would recommend the approach of adding opcodes into the marshal format. Specifically, 'V' followed by a single byte. That can only occur at the beginning. If it is not present, then you know that you have an old marshal value.
But this would not solve the problem with 8 byte versus 4 byte timestamps in the header on 64-bit OSes. Trent Mick pointed this out. I think, the situation we have now, is very unsatisfactory: I don't see a reasonable solution, which allows us to keep the length of the header before the marshal-block at a fixed length of 8 bytes together with a frozen 4 byte magic number. Moving the version number into the marshal doesn't help to resolve this conflict. So either you have to accept a new magic on 64 bit systems or you have to enlarge the header. To come up with a new proposal, the following questions should be answered: 1. Is there really too much code out there, which depends on the hardcoded assumption, that the marshal part of a .pyc file starts at byte 8? I see no further evidence for or against this. MAL pointed this out in <http://www.python.org/pipermail/python-dev/2000-May/005756.html> 2. If we decide to enlarge the header, do we really need a new header field defining the length of the header ? This was proposed by Christian Tismer in <http://www.python.org/pipermail/python-dev/2000-May/005792.html> 3. The 'imp' module exposes somewhat the structure of an .pyc file through the function 'get_magic()'. I proposed changing the signature of 'imp.get_magic()' in an upward compatible way. I also proposed adding a new function 'imp.get_version()'. What do you think about this idea? 4. Greg proposed prepending the version number to the marshal format. If we do this, we definitely need a frozen way to find out, where the marshalled code object actually starts. This has also the disadvantage of making the task to come up with a /etc/magic definition whichs displays the version number of a .pyc file slightly harder. If we decide to move the version number into the marshal, if we can also move the .py-timestamp there. This way the timestamp will be handled in the same way as large integer literals. Quoting from the docs: """Caveat: On machines where C's long int type has more than 32 bits (such as the DEC Alpha), it is possible to create plain Python integers that are longer than 32 bits. Since the current marshal module uses 32 bits to transfer plain Python integers, such values are silently truncated. This particularly affects the use of very long integer literals in Python modules -- these will be accepted by the parser on such machines, but will be silently be truncated when the module is read from the .pyc instead. [...] A solution would be to refuse such literals in the parser, since they are inherently non-portable. Another solution would be to let the marshal module raise an exception when an integer value would be truncated. At least one of these solutions will be implemented in a future version.""" Should this be 1.6? Changing the format of .pyc files over and over again in the 1.x series doesn't look very attractive. Regards, Peter
On Tue, May 30, 2000 at 09:08:15AM +0200, Peter Funk wrote:
I would recommend the approach of adding opcodes into the marshal format. Specifically, 'V' followed by a single byte. That can only occur at the beginning. If it is not present, then you know that you have an old marshal value.
But this would not solve the problem with 8 byte versus 4 byte timestamps in the header on 64-bit OSes. Trent Mick pointed this out.
I kind of intimated but did not make it clear: I wouldn't worry about the limitations of a 4 byte timestamp too much. That value is not going to overflow for another 38 years. Presumably the .pyc header (if such a thing even still exists then) will change by then. [peter summarizes .pyc header format options]
If we decide to move the version number into the marshal, if we can also move the .py-timestamp there. This way the timestamp will be handled in the same way as large integer literals. Quoting from the docs:
"""Caveat: On machines where C's long int type has more than 32 bits (such as the DEC Alpha), it is possible to create plain Python integers that are longer than 32 bits. Since the current marshal module uses 32 bits to transfer plain Python integers, such values are silently truncated. This particularly affects the use of very long integer literals in Python modules -- these will be accepted by the parser on such machines, but will be silently be truncated when the module is read from the .pyc instead. [...] A solution would be to refuse such literals in the parser, since they are inherently non-portable. Another solution would be to let the marshal module raise an exception when an integer value would be truncated. At least one of these solutions will be implemented in a future version."""
Should this be 1.6? Changing the format of .pyc files over and over again in the 1.x series doesn't look very attractive.
I *hope* it gets into 1.6, because I have implemented the latter suggestion (raise an exception is truncation of a PyInt to 32-bits will cause data loss) in the docs that you quoted and will be submitting a patch for it on Wed or Thurs. Ciao, Trent -- Trent Mick trentm@activestate.com
Trent Mick wrote:
But this would not solve the problem with 8 byte versus 4 byte timestamps in the header on 64-bit OSes. Trent Mick pointed this out.
I kind of intimated but did not make it clear: I wouldn't worry about the limitations of a 4 byte timestamp too much. That value is not going to overflow for another 38 years. Presumably the .pyc header (if such a thing even still exists then) will change by then.
note that py_compile (which is used to create PYC files after installation, among other things) treats the time as an unsigned integer. so in other words, if we fix the built-in "PYC compiler" so it does the same thing before 2038, we can spend another 68 years on coming up with a really future proof design... ;-) I really hope Py3K will be out before 2106. as for the other changes: *please* don't break the header layout in the 1.X series. and *please* don't break the "if the magic is the same, I can unmarshal and run this code blob without crashing the interpreter" rule (raising an exception would be okay, though). </F>
Peter Funk wrote:
Greg Stein:
I don't think we should have a two-byte magic value. Especially where those two bytes are printable, 7-bit ASCII. [...] To ensure uniqueness, I think a four-byte magic should stay.
Looking at /etc/magic I see many 16-bit magic numbers kept around from the good old days. But you are right: Choosing a four-byte magic value would make the chance of a clash with some other file format much less likely.
Just for quotes: the current /etc/magic I have on my Linux machine doesn't know anything about PYC or PYO files, so I don't really see much of a problem here -- noone seems to be interested in finding out the file type for these files anyway ;-) Also, I don't really get the 16-bit magic argument: we still have a 32-bit magic number -- one with a 16-bit fixed value and predefined ranges for the remaining 16 bits. This already is much better than what we have now w/r to making file(1) work on PYC files.
I would recommend the approach of adding opcodes into the marshal format. Specifically, 'V' followed by a single byte. That can only occur at the beginning. If it is not present, then you know that you have an old marshal value.
But this would not solve the problem with 8 byte versus 4 byte timestamps in the header on 64-bit OSes. Trent Mick pointed this out.
The switch to 8 byte timestamps is only needed when the current 4 bytes can no longer hold the timestamp value. That will happen in 2038... Note that import.c writes the timestamp in 4 bytes until it reaches an overflow situation.
I think, the situation we have now, is very unsatisfactory: I don't see a reasonable solution, which allows us to keep the length of the header before the marshal-block at a fixed length of 8 bytes together with a frozen 4 byte magic number.
Adding a version to the marshal format is a Good Thing -- independent of this discussion.
Moving the version number into the marshal doesn't help to resolve this conflict. So either you have to accept a new magic on 64 bit systems or you have to enlarge the header.
No you don't... please read the code: marshal only writes 8 bytes in case 4 bytes aren't enough to hold the value.
To come up with a new proposal, the following questions should be answered: 1. Is there really too much code out there, which depends on the hardcoded assumption, that the marshal part of a .pyc file starts at byte 8? I see no further evidence for or against this. MAL pointed this out in <http://www.python.org/pipermail/python-dev/2000-May/005756.html>
I have several references in my tool collection, the import stuff uses it, old import hooks (remember ihooks ?) also do, etc.
2. If we decide to enlarge the header, do we really need a new header field defining the length of the header ? This was proposed by Christian Tismer in <http://www.python.org/pipermail/python-dev/2000-May/005792.html>
In Py3K we can do this right (breaking things is allowed)... and I agree with Christian that a proper file format needs a header length field too. Basically, these values have to be present, IMHO: 1. Magic 2. Version 3. Length of Header 4. (Header Attribute)*n -- Start of Data --- Header Attribute can be pretty much anything -- timestamps, names of files or other entities, bit sizes, architecture flags, optimization settings, etc.
3. The 'imp' module exposes somewhat the structure of an .pyc file through the function 'get_magic()'. I proposed changing the signature of 'imp.get_magic()' in an upward compatible way. I also proposed adding a new function 'imp.get_version()'. What do you think about this idea?
imp.get_magic() would have to return the proposed 32-bit value ('PY' + version byte + option byte). I'd suggest adding additional functions which can read and write the header given a PYCHeader object which would hold the values version and options.
4. Greg proposed prepending the version number to the marshal format. If we do this, we definitely need a frozen way to find out, where the marshalled code object actually starts. This has also the disadvantage of making the task to come up with a /etc/magic definition whichs displays the version number of a .pyc file slightly harder.
If we decide to move the version number into the marshal, if we can also move the .py-timestamp there. This way the timestamp will be handled in the same way as large integer literals. Quoting from the docs:
"""Caveat: On machines where C's long int type has more than 32 bits (such as the DEC Alpha), it is possible to create plain Python integers that are longer than 32 bits. Since the current marshal module uses 32 bits to transfer plain Python integers, such values are silently truncated. This particularly affects the use of very long integer literals in Python modules -- these will be accepted by the parser on such machines, but will be silently be truncated when the module is read from the .pyc instead. [...] A solution would be to refuse such literals in the parser, since they are inherently non-portable. Another solution would be to let the marshal module raise an exception when an integer value would be truncated. At least one of these solutions will be implemented in a future version."""
Should this be 1.6? Changing the format of .pyc files over and over again in the 1.x series doesn't look very attractive.
-- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Greg Stein wrote:
I don't think we should have a two-byte magic value. Especially where those two bytes are printable, 7-bit ASCII.
"But it is four bytes," you say. Nope. It is two plus a couple parameters that can now change over time.
To ensure uniqueness, I think a four-byte magic should stay.
I would recommend the approach of adding opcodes into the marshal format. Specifically, 'V' followed by a single byte. That can only occur at the beginning. If it is not present, then you know that you have an old marshal value.
Fine with me, too! Everything that keeps the current 8 byte header intact and doesn't break much code is fine with me. Moving additional info intot he marshalled obejcts themselves gives even more flexibility than any header extension. Yes I'm all for it. ciao - chris++ -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com
participants (10)
-
Christian Tismer
-
Daniel Berlin
-
Eric S. Raymond
-
Fred L. Drake
-
Fredrik Lundh
-
Greg Stein
-
Guido van Rossum
-
M.-A. Lemburg
-
pf@artcom-gmbh.de
-
Trent Mick