[Python-Dev] Identifying magic prefix on Python files?

Ka-Ping Yee ping@lfw.org
Sun, 4 Feb 2001 18:21:40 -0800 (PST)


On Sun, 4 Feb 2001, Tim Peters wrote:
> OK, I suggest (decimal) 143 for Python's first byte.  That's a "control
> code" in Latin-1, and (unlike PNG's 137) not even Windows assigns it to a
> character in their Latin-1 superset (yet).
> 
>     (decimal)              143  80  89  84  13  10  26  10
>     (hexadecimal)           8f  50  59  54  0d  0a  1a  0a
>     (ASCII C notation)    \217   P   Y   T  \r  \n \032 \n

Pyt?  What's a "pyt"?  How about something we can all recognize:

    (decimal)              143  83 112  97 109  10  13  10
    (hexadecimal)           8f  53  70  61  6d  0a  0d  0a
    (ASCII C notation)    \217   S   p   a   m  \n  \r  \n

...to be followed by:

    date of last incompatible VM change (4 bytes: year, year, month, day)
    Python version, as in sys.hexversion (4 bytes)
    mtime of source .py file (4 bytes)
    reserved for option flags and future expansion (8 bytes)
    size of marshalled code data (4 bytes)
    marshalled code

That's a nice, geeky 32 bytes of header info.

(The "Spam" part is not so serious; the rest is serious.  But
i do think "Spam" is more fun that "Pyt"!  :)  And the Ctrl-Z char
is pointless; no other binary format does this or needs it.)

Hmm.  Questions:

    - Should we include the path to the original .py file?
          (so Python can automatically recompile an out-of-date file)

    - How about the name of the module?  (so that renaming the file
          doesn't kill it; possible answer to the case-sensitivity issue?)

    - If the purpose of the code-size field is to protect against
          incomplete file transfers, would a hash be worth considering here?


-- ?!ng

"Old code doesn't die -- it just smells that way."
    -- Bill Frantz