[Python-ideas] Add a cryptographic hash (e.g SHA1) of source toPython Compiled objects?
rocky at gnu.org
rocky at gnu.org
Thu Feb 5 14:40:44 CET 2009
Brett Cannon writes:
> On Wed, Feb 4, 2009 at 01:57, <rocky at gnu.org> wrote:
> > Terry Reedy writes:
> > > rocky at gnu.org wrote:
> > >
> > > > Without a doubt you all are much more familiar at this stuff that I
> > > > am. (In fact I'm a rank novice.) So I'd be grateful if someone would
> > > > post code for a function say:
> > > >
> > > > compare_src_obj(python_src, python_obj)
> > > >
> > > > that takes two strings -- a Python source filename and a Python object
> > > > -- does what's outlined above, and returns a status which indicates
> > > > the same or if not and if not whether the difference is because of the
> > > > wrong version of Python was used.
> > >
> > > Interesting question. For equaility, I would start with, just guessing
> > > a bit:
> > >
> > > marshal(compile(open(file.py).read())) == open(file.pyc).read()
> > >
> > > Specifically for version clash, I believe the first 4 bytes are a magic
> > > version number. If that is not part of the marshal string, it would
> > > need to be skipped for the equality comparison.
> >
> > There's also the mtime that needs to be ignored mentioned in prior
> > posts. And is there a table which converts a magic number version back
> > into a string with the Python version number? Thanks.
>
> marshal.dumps(compile(open('file.py').read(), 'file.py', 'exec')) ==
> open('file.pyc').read()[8:]
Thanks.
Alas, I can't see how in practice this will be generally useful.
Again, here is the problem: I have a some sort of compiled python file
and something which I think is the source code for it. I want to
verify that it is.
(In a debugger it means we can warn that what you are seeing is not
what's being run. However I do not believe this is the only situation
where getting the answer to this question is helpful/important.)
The solution above is very sensitive to knowing the name of the file
(files?) used in compilation because those are stored in the
co_filename portion of the code object.
For example if what's stored in that field is 'foo.py' but I compile
with the name './foo.py' or some other equivalent name, then I get a
false mismatch. Worse, as we've seen before when dealing with zipped
eggs, the name stored in co_filename is a somewhat temporary location
and something very few people are going to guess or recognize as the
location of where they think the file originated.
What seems to me to be a weakness of this approach is that it requires
that you get two additional pieces of information correct that really
are irrelevant from the standpoint of the problem: the name of the
file and the version of Python used in the compilation process. I just
care about the source text.
As I write this I can't help but be amused me, because when before on
pydthon-dev I asked about how I could get more accurate file names in
co_filename (for zipped eggs), the answer invariably offered was
something along the lines "why not use the source text?"
More information about the Python-ideas
mailing list