[Python-ideas] Add a cryptographic hash (e.g SHA1) of source to Python Compiled objects?

Brett Cannon brett at python.org
Tue Feb 3 20:06:15 CET 2009


On Tue, Feb 3, 2009 at 01:56,  <rocky at gnu.org> wrote:
> I've been re-examining from ground up the whole state of affairs in
> writing a debugger. One of the challenges of a debugger or any
> source-code analysis tool is verifying that the source-code that the
> tool is reporting on corresponds to the compiled object under
> execution.
>
> For debuggers, this problem becomes more likely to occur when you are
> debugging on a computer that isn't the same as the computer where the
> code is running.)
>
> For this, it would be useful to have a cryptographic hash like a SHA1
> in the compiled object, but hopefully accessible via the module object
> where the file path is stored.
>
> I understand that there is a mtime timestamp in the .pyc but this is
> not as reliable as cryptographic hash such as SHA1.
>

Well, whatever solution you propose would need to have this signing be
optional since it is in no way required in day-to-day executions. The
overhead of calculating the hash is not worth the benefit in the
general case.

> There seems to be some confusion in thinking the only use case for
> this is in remote debugging where source code may be on a different
> computer than where the code is running, but I do not believe this is
> so.  Here are two other situations which come up.
>
> First is a code coverage tool like coverage.py which checks coverage
> over several runs. Let's say the source code is erased and checked out
> again; or edited and temporarily changed several times but in the end
> the file stays the same. A SHA1 has will understand the file hasn't
> changed, mtime won't.
>

That's seems somewhat contrived. Assuming you do not have coverage as
part of your continuous build process is having a couple of files have
to be covered again that expensive? And if you were mucking with the
files you might want to make sure that you really did not change
something.

> A second more contrived example is in in some sort of secure
> environment. Let's say I am using the compiled Python code, (say for
> an embedded device) and someone offers me what's purported to be the
> source code.  How can I easily verify that this is correct?
>

I really do not see that situation ever coming up.

> In theory I suppose if I have enough information about the version of
> Python and which platform, I can compile the purported source ignoring
> some bits of information (like the mtime ;-) in the compiled
> object. But one would have to be careful about getting compilers and
> platforms then same or understand how this changes compilation.

The only thing you need to compile bytecode is the same Python version
(and thus the same magic number) and whether it is a .pyc or .pyo (and
thus if -O/-OO was used). Your platform has nothing to do with
bytecode compilation.

-Brett



More information about the Python-ideas mailing list