[Python-ideas] Add a cryptographic hash (e.g SHA1) of source toPython Compiled objects?

Brett Cannon brett at python.org
Fri Feb 6 23:27:36 CET 2009

On Fri, Feb 6, 2009 at 12:10,  <rocky at gnu.org> wrote:
>  > I still don't see the benefit of knowing what version of Python a
>  > magic number matches to. So I know some bytecode was compiled by
>  > Python 2.5 while I am running Python 2.6.
> Yep. Not uncommon for me to have several versions of Python
> available. It so happens that the computer where this email is being
> sent has at least 9 versions, possibly more because I didn't check if
> python.new and python.old are one those other 9. (I don't maintain
> this box, but pay someone else to; clearly this is a pathological
> case, but it's kind of interesting to me that there are at more than 9
> versions installed and I did not contrive this case.)
>  > What benefit do I derive
>  > from knowing that compared to just knowing that it was not compiled by
>  > Python 2.6? I mean are you ultimately planning on launching a
>  > different interpreter based on what generated the bytecode?
> If there's a mismatch in the first place, it means there's confusion
> on someone's part. Don't you want to foster development of programs
> that try to minimize confusion?

Come on, that is such a baiting question. You view adding a dict of
the versions as a way to help deal with confusion in a case where
someone actually cares about which version of bytecode is used. I view
it as another API someone is going to have to maintain for a use case
I do not see as justifying that maintenance. Bytecode is purely a
performance benefit, nothing more. This is why we so readily
reconstruct it. Heck, in Python 3.0 the __file__ attribute *always*
points to the .py file even if the .pyc was used for the load.

> Subsidiary effects when support of
> magic to version string are not readily available in situations where
> it would be helpful is possibly back and forth dialog in bug reports
> one is asking what telling folks how to get the version number
> (because it's not in the error message because its not readily
> available by a programmer).

I have never had a bug report come in where I had to care about he
magic number of a .pyc file.

> Never underestimate the level of users,
> especially if you are working on something like a debugger.

I don't, else I would not be a Python developer. But along with not
underestimating also means that if you need to worry about something
like what version of Python generates what magic number then you can
look at Python/compile.c just as easily without me adding some code
that has to be maintained.

> If we hope that someone's going to know about and read that comment in
> the C file turn it into a dictionary and maintain it anytime the magic
> number gets updated, it's probably not going to happen often.

Nope, it probably won't, and honestly I am fine with that.

> Again, although I see specific uses in a debugger this really an issue
> regarding code tools or programs that deal with Python code. I know
> there's a disassembler, but you mean there isn't a dump tool which
> shows all of the information inside a compiled object including a
> disassembly, Python version the program was compiled with, mtime in
> human readable format, and whatnot?

Just so there is no confusion: a .pyc is not a compiled object, but a
file containing a magic number, mtime, and a marshaled code object.

And no, there is nothing in the standard library that dumps all of
this data out for a .pyc. This is somewhat on purpose as we make no
hard guarantees we won't change the format of .pyc files at some point
(see the python-ideas list for a discussion about changing the format
to allowing a variable amount of metadata).


More information about the Python-ideas mailing list