[Python-Dev] Identifying magic prefix on Python files?

Guido van Rossum guido@digicool.com
Sun, 04 Feb 2001 23:10:20 -0500


>     exactly 4 bytes magic number (but doesn't care about content)
> then
>     exactly 4 bytes file timestamp
> then
>     a blob that marshal believes is a single object
> then
>     that's it

That's also what I would call b/w compatible here.  It's the obvious
baseline.  (With the addition that the timestamp uses little-endian
byte order -- like marshal.)

> but doesn't care that, e.g., checking the 4-byte magic number alone is
> sufficent to catch binary files opened in text mode (but somebody else will
> care about that!)).

Hm, that's not the reason the magic number ends in \r\n.  The reason
was that on the Mac, long ago, the MPW compiler actually swapped the
meaning of \r and \n!  So that '\r' in C meant '\012' and '\n' meant
'\015'.  This was intended to make C programs that were parsing text
files looking for \n work on Mac text files which use \r.  (Why does
the Mac use \r?  AFAICT, for the same reason that DOS chose \ instead
of / -- to be different from Unix, possibly to avoid patent
infringement.  Silly.)

Later compilers on the Mac weren't so stupid, and now the fact that
this lets you discover text translation errors is just a pleasant
side-effect.

Personally, I don't care about this property any more.

> Since virtually none of this has been formalized via an API, virtually all
> code outside the distribution that deals with this stuff is cheating.  Small
> wonder it's contentious ...

The thing is, it's very useful to have tools ones that manipulate .pyc
files, and while it's not officially documented or standardized, the
presence of the C API to get the magic number at least suggests that
the file format can change the magic number but not otherwise.  The
py_compile.py standard library module acts as de-facto documentation.

--Guido van Rossum (home page: http://www.python.org/~guido/)