How to know that two pyc files contain the same code

Gelonida N gelonida at gmail.com
Sun Mar 11 00:30:00 EST 2012


Hi Steven,

On 03/10/2012 11:52 PM, Steven D'Aprano wrote:
> > On Sat, 10 Mar 2012 15:48:48 +0100, Gelonida N wrote:
> >
>> >> Hi,
>> >>
>> >> I want to know whether two .pyc files are identical.
>> >>
>> >> With identical I mean whether they contain the same byte code.
> >
> > Define "identical" and "the same".
Indeed! Identical is not that simple to define and depends on the context.

One definition of identical, that would suit me at the moment would be:

If I have two .pyc files, which were the result of a compilation of
two identical .py files, then I would like to treat these two .pyc files
as identical,
even if they were compiled at different times (absolutely necessary)
and with a different absolute path (would be nice)

Above definition of identical byte code would also mean, that any error
message about errors in a given line number would be identical for both
.pyc files

> >
> > If I compile these two files:
> >
> >
> > # file ham.py
> > x = 23
> > def func():
> >     a = 23
> >     return a + 19
> >
> >
> >
> > # file = spam.py
> > def func():
> >     return 42
> >
> > tmp = 19
> > x = 4 + tmp
> > del tmp
> >
> >
> > do you expect spam.pyc and ham.pyc to count as "the same"?
> >
For most pythons I would not expect, that ham.py and spam.py would
result in the same byte code and would thus not even have the same
performance,

I agree, though that an efficient compiler might generate the same byte
code, though I wonder if an optimizing compiler would/should be allowed
to optimize away the global variable tmp, as it would be visible (though
only for a short time) in a multithreading environment.

If the byte code were different in two .pyc files. then I would
like to have them treated as different .pyc files.

If by coincidence, the generated btye code were the same, then I
wouldn't mind, if they were treated as identical,  but I wouldn't insist.

Up to my knowledge Python (or at least C-python) stores line numbers in
the .pyc files, so that it can report exact line numbers refering to the
originating source code in case of an exception or for back traces

So there is the choice to say, that two pyc files with exactly the same
byte code would be treated identical if white spaces / line numbers of
their sources were different or the choice to say, that they are
different.

Being conservative I'd treat them as different.

Ideally I'd like to be able depending on my use case to distinguish
following cases.
a) .pyc files with identical byte code
b) .pyc files with identical byte code AND source code line numbers
c) same as b) AND identical source file names.









More information about the Python-list mailing list