[Python-Dev] Identifying magic prefix on Python files?
Ka-Ping Yee
ping@lfw.org
Sun, 4 Feb 2001 18:21:40 -0800 (PST)
On Sun, 4 Feb 2001, Tim Peters wrote:
> OK, I suggest (decimal) 143 for Python's first byte. That's a "control
> code" in Latin-1, and (unlike PNG's 137) not even Windows assigns it to a
> character in their Latin-1 superset (yet).
>
> (decimal) 143 80 89 84 13 10 26 10
> (hexadecimal) 8f 50 59 54 0d 0a 1a 0a
> (ASCII C notation) \217 P Y T \r \n \032 \n
Pyt? What's a "pyt"? How about something we can all recognize:
(decimal) 143 83 112 97 109 10 13 10
(hexadecimal) 8f 53 70 61 6d 0a 0d 0a
(ASCII C notation) \217 S p a m \n \r \n
...to be followed by:
date of last incompatible VM change (4 bytes: year, year, month, day)
Python version, as in sys.hexversion (4 bytes)
mtime of source .py file (4 bytes)
reserved for option flags and future expansion (8 bytes)
size of marshalled code data (4 bytes)
marshalled code
That's a nice, geeky 32 bytes of header info.
(The "Spam" part is not so serious; the rest is serious. But
i do think "Spam" is more fun that "Pyt"! :) And the Ctrl-Z char
is pointless; no other binary format does this or needs it.)
Hmm. Questions:
- Should we include the path to the original .py file?
(so Python can automatically recompile an out-of-date file)
- How about the name of the module? (so that renaming the file
doesn't kill it; possible answer to the case-sensitivity issue?)
- If the purpose of the code-size field is to protect against
incomplete file transfers, would a hash be worth considering here?
-- ?!ng
"Old code doesn't die -- it just smells that way."
-- Bill Frantz