Mailman 3 Add a generic object comparison utility - Python-ideas

Add a generic object comparison utility

Sébastien Eskenaz

Sept. 25, 2019

9:15 a.m.

Hello, Several libraries have complex objects but no comparison operators for them other than "is" which checks if we are comparing an object with itself. It would be quite nice to be able to compare any two objects together. I made this function in python to have a starting point https://gist.github.com/SebastienEske/5a9c04e718becd93b7928514e80f0211 I know that it needs some improvement to protect against infinite loops and I don't like the hardcoded check for strings, it should probably be a parameter that allows the function to directly check certain types. What do you think? Best regards, *Sébastien Eskenazi* <http://goog_31297828> <https://www.pixelz.com> Pixelz 15th Floor, Detech Tower 2 Building, 107 Nguyen Phong Sac, Hanoi, Vietnam <https://www.linkedin.com/in/sebastieneskenazi/> Beautiful Images Sell Products Learn eCommerce Photography and Image Editing Tips on the Pixelz Blog <http://www.pixelz.com/blog/>!

Attachments:

attachment.htm (text/html — 6.7 KB)

Show replies by date

Steven D'Aprano

September 2019

7:50 p.m.

Hi Sébastien, and welcome. On Wed, Sep 25, 2019 at 04:15:58PM +0700, Sébastien Eskenaz wrote:

...

Hello,

Several libraries have complex objects but no comparison operators for them other than "is" which checks if we are comparing an object with itself.

Not true: equality ``==`` is also defined for all objects. By default, ``==`` falls back to testing for object identity, but your classes can define ``__eq__`` to override that.

...

It would be quite nice to be able to compare any two objects together.

Compare in what way?

...

I made this function in python to have a starting point https://gist.github.com/SebastienEske/5a9c04e718becd93b7928514e80f0211

I don't have much time today, but I had a quick look. Your function begins with: if type(obj) != type(obj1): return False That means that an instance of a subclass and an instance of its parent class will always compare False. So: class MyStr(str): pass x = MyStr("hello") compare_objs("hello", x) # returns False Why is this useful? Then your code says: for a in dir(obj): You should not use ``dir`` like that, it is not designed for programmatic use. The docs warn Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases. https://docs.python.org/3/library/functions.html#dir I don't have time to go through the rest of your function now, but I think you should explain what you expect "compare any two objects" to do in words, rather than expect that we read the source code and try to guess. Thanks, -- Steven

Andrew Barnert

2:58 a.m.

On Sep 25, 2019, at 02:15, Sébastien Eskenaz <s.eskenaz@pixelz.com> wrote:

...

Hello,

Several libraries have complex objects but no comparison operators for them other than "is" which checks if we are comparing an object with itself.

Most of them also have == which does the same comparison, because you get that by default. Breaking that would be a backward compatibility nightmare. And the ones that don’t have deliberately disabled equality comparison, probably for some good reason, so it’s probably an even worse idea to break them.

...

It would be quite nice to be able to compare any two objects together.

Sure, but there’s no rule that makes sense for every type. There is a large set of classes where “compare for equality by comparing all attributes for equality” makes perfect sense. And even a pretty large subset of that where “compare for order by comparing attributes lexicographically in their natural attribute order” makes sense too. But it’s definitely not all classes. For example, let’s say you have a DOM node class. Presumably you want == to be true of two nodes that head equal (sub)trees. But if the children are accessed by a method call, or by iterating, not by an attribute, your algorithm does the wrong thing. Or, if there’s a back link to the parent that is an attribute, again it does the wrong thing. Not to mention that, because Python is ridiculously dynamic, you can have classes where there’s not even a way to get a list of all attributes—think of a remote proxy or a bridge to another language. Also, even for some classes that _could_ be correctly compared this way, it still might not be a good idea. For example, comparing two HTTP responses by their content arguably does the right thing. But do you want r1 == r2 to potentially block for 20 seconds while it downloads two complete resources over the network? To compare both raw content and decoded text because there are properties for both even though they’re redundant? To raise an exception about a network error? I think “classes that you’d want to compare by lexicographically comparing attributes” overlaps pretty well with “classes that could have been defined with @dataclass”. Classes that actually _are_ defined with @dataclass already get comparison operators for free, but there are tons of classes that maybe would use @dataclass in an ideal world, but they predate it, or need to work with Python 3.5 or 2.7, or need a custom metaclass, or have some behavior that doesn’t play nice with some of the limitations of @dataclass, etc. So those are the ones you care about. If your algorithm would work for 90% of those classes (which means you don’t need to deal with every weird edge case—just declare that Infinite attribute cycles aren’t part of the 90% you handle), and you had an easy way to monkeypatch or wrap the handful of classes you actually need to compare on a given app, would that be good enough? Because I think you could write that pretty easily, and it might give you everything you’re looking for.

Sébastien Eskenaz

2:34 a.m.

Sorry for not being clear enough. The default back to object identity is exactly the issue here. Here is the example that got me to make this function: I have a class `dataset` which can be built as follows: `mydata1 = dataset('path/to/dataset1')` Then I can create a second dataset `mydata2 = dataset('path/to/dataset1')` (no typo about `path/to/dataset1`, it is a 1). Thus testing `mydata1==mydata2` will return false while they actually contain the same information and data. Regarding parent class comparison, I would assume that there is a good reason to have made that class `MyStr` hence either some of its methods or attributes will end up being different from `str`. So testing directly for the class is only a faster way to find the difference. I don't really see the problem with saying that objects of two different classes will always be different (unless the object implements its own `__eq__` function in which case the user can use the `==` operator). It's like comparing apples to oranges. They're bound to be different. Just in case that this is ambiguous too: I am not suggesting to change the behaviour of the equality operator, just to provide the user with an extra function for a more comprehensive (and slower) comparison operator/function. I also found another workaround for my case (pass the constructor parameters instead of the object) but I just thought it could be useful to others. Turns out maybe not :-) Sorry for the disturbance then. I didn't know about @dataclass, I'll look into that. Thanks for the pointer. Best regards, *Sébastien Eskenazi* <http://goog_31297828> <https://www.pixelz.com> Pixelz 15th Floor, Detech Tower 2 Building, 107 Nguyen Phong Sac, Hanoi, Vietnam <https://www.linkedin.com/in/sebastieneskenazi/> Beautiful Images Sell Products Learn eCommerce Photography and Image Editing Tips on the Pixelz Blog <http://www.pixelz.com/blog/>! On Thu, 26 Sep 2019 at 09:58, Andrew Barnert <abarnert@yahoo.com> wrote:

...

On Sep 25, 2019, at 02:15, Sébastien Eskenaz <s.eskenaz@pixelz.com> wrote:

...
Hello,

Several libraries have complex objects but no comparison operators for

them other than "is" which checks if we are comparing an object with itself.

Most of them also have == which does the same comparison, because you get that by default. Breaking that would be a backward compatibility nightmare. And the ones that don’t have deliberately disabled equality comparison, probably for some good reason, so it’s probably an even worse idea to break them.

...
It would be quite nice to be able to compare any two objects together.

Sure, but there’s no rule that makes sense for every type.

There is a large set of classes where “compare for equality by comparing all attributes for equality” makes perfect sense. And even a pretty large subset of that where “compare for order by comparing attributes lexicographically in their natural attribute order” makes sense too.

But it’s definitely not all classes. For example, let’s say you have a DOM node class. Presumably you want == to be true of two nodes that head equal (sub)trees. But if the children are accessed by a method call, or by iterating, not by an attribute, your algorithm does the wrong thing. Or, if there’s a back link to the parent that is an attribute, again it does the wrong thing. Not to mention that, because Python is ridiculously dynamic, you can have classes where there’s not even a way to get a list of all attributes—think of a remote proxy or a bridge to another language.

Also, even for some classes that _could_ be correctly compared this way, it still might not be a good idea. For example, comparing two HTTP responses by their content arguably does the right thing. But do you want r1 == r2 to potentially block for 20 seconds while it downloads two complete resources over the network? To compare both raw content and decoded text because there are properties for both even though they’re redundant? To raise an exception about a network error?

I think “classes that you’d want to compare by lexicographically comparing attributes” overlaps pretty well with “classes that could have been defined with @dataclass”. Classes that actually _are_ defined with @dataclass already get comparison operators for free, but there are tons of classes that maybe would use @dataclass in an ideal world, but they predate it, or need to work with Python 3.5 or 2.7, or need a custom metaclass, or have some behavior that doesn’t play nice with some of the limitations of @dataclass, etc. So those are the ones you care about.

If your algorithm would work for 90% of those classes (which means you don’t need to deal with every weird edge case—just declare that Infinite attribute cycles aren’t part of the 90% you handle), and you had an easy way to monkeypatch or wrap the handful of classes you actually need to compare on a given app, would that be good enough? Because I think you could write that pretty easily, and it might give you everything you’re looking for.

Andrew Barnert

6:47 a.m.

On Sep 26, 2019, at 19:34, Sébastien Eskenaz <s.eskenaz@pixelz.com> wrote:

...

Regarding parent class comparison, I would assume that there is a good reason to have made that class `MyStr` hence either some of its methods or attributes will end up being different from `str`.

It may just override some of the base methods, without adding any new ones, and without adding any attributes. And it will use the native string storage the same way as str (because you can’t really avoid that when subclassing str). So why shouldn’t it be equal? The default == for the str type will say it’s equal, so I think most people would expect the same from a more-lenient/more-wisely-usable equals function

...

So testing directly for the class is only a faster way to find the difference. I don't really see the problem with saying that objects of two different classes will always be different (unless the object implements its own `__eq__` function in which case the user can use the `==` operator). It's like comparing apples to oranges. They're bound to be different.

It’s like comparing Valencia oranges to oranges. All Valencia oranges are oranges. Pick up any Valencia oranges, and you’ve got an orange that’s equal to a Valencia orange. That’s exactly the same as MyStr and str, or bool and int, or any other subtype. In fact, isinstance(True, int), and for that matter, True == 1. Plus, even types that aren’t related can be equal: 1 == 1.0 == Fraction(1, 1).

Sébastien Eskenaz

7:27 a.m.

Ok, got the point. :-) Best regards, *Sébastien Eskenazi* <http://goog_31297828> <https://www.pixelz.com> Pixelz 15th Floor, Detech Tower 2 Building, 107 Nguyen Phong Sac, Hanoi, Vietnam <https://www.linkedin.com/in/sebastieneskenazi/> Beautiful Images Sell Products Learn eCommerce Photography and Image Editing Tips on the Pixelz Blog <http://www.pixelz.com/blog/>! On Fri, 27 Sep 2019 at 13:48, Andrew Barnert <abarnert@yahoo.com> wrote:

...

Steven D'Aprano

9:57 a.m.

On Fri, Sep 27, 2019 at 09:34:05AM +0700, Sébastien Eskenaz wrote:

...

Do they? I have to believe you, because it is your class and you should know, but we can't make that assumption in general. It depends on what the dataset class does. For instance, think of: myfile1 = open('path/to/file') # some time later... myfile2 = open('path/to/file') Are they the same file? No. They point to the same *pathname*, but they are independent file objects. On Linux systems, they might not even end up reading from the same file on disk, for example: * path/to/file is a hard link to spam.txt * open('path/to/file') --> spam.txt * another process deletes path/to/file * then creates a new file at path/to/file. (Untested, but I'm pretty sure that will work.) Without knowing the internal details of your mydata1 and mydata2, there is no way that I can predict whether or not the two objects should compare as "the same". Which brings us to another point. What does "the same" mean? In Python, we have two well-defined operators for sameness: - the ``is`` operator, which tests for object identity (the two operands are the same object) - the ``==`` operator, which tests for value equality (the two operands have equal value) Javascript defines at least four different versions of "sameness" https://developer.mozilla.org/en-US/docs/Web/JavaScript/Equality_comparisons... none of which are the same as either of Python's standard comparisons. What does your compare function do? Obviously it doesn't test for object identity, but it doesn't test for equality either since ``compare_objs(1, 1.0)`` will return False. I've now taken the time to read the whole function https://gist.github.com/SebastienEske/5a9c04e718becd93b7928514e80f0211 but the main thing that is missing is an explanation of *what it does*. If I call compare_objs(spam, eggs) and it returns True, what does that tell me about spam and eggs? Without having a good, solid description of the *meaning* of this comparison, it is difficult to tell whether it is useful in general or not, or whether other people would find it helpful as well as you.

...

Perhaps. But the *value* of the strings are still equal: MyStr("hello") and built-in string "hello" have the same sequence of characters and should be considered equal. But we've established that your comparison isn't an equality comparison.

...

That depends on what you mean by "different". If the subclass is designed to obey the Liskov Substitution Principle, you ought to be able to substitute the subclass and the behaviour will remain the same. In other words, anything you can do with the string "hello", you should be able to do with MyStr("hello") -- they are identical in the limited sense of the Liskov Substitution Principle. So not so much like apples and oranges, but more like apples from one tree, and apples from another tree. Possibly from the same variety, or possibly not. -- Steven

Steven D'Aprano

September 2019

7:50 p.m.

Hi Sébastien, and welcome. On Wed, Sep 25, 2019 at 04:15:58PM +0700, Sébastien Eskenaz wrote:

...

Hello,

Several libraries have complex objects but no comparison operators for them other than "is" which checks if we are comparing an object with itself.

Not true: equality ``==`` is also defined for all objects. By default, ``==`` falls back to testing for object identity, but your classes can define ``__eq__`` to override that.

...

It would be quite nice to be able to compare any two objects together.

Compare in what way?

...

I made this function in python to have a starting point https://gist.github.com/SebastienEske/5a9c04e718becd93b7928514e80f0211

Andrew Barnert

2:58 a.m.

On Sep 25, 2019, at 02:15, Sébastien Eskenaz <s.eskenaz@pixelz.com> wrote:

...

Hello,

Several libraries have complex objects but no comparison operators for them other than "is" which checks if we are comparing an object with itself.

...

It would be quite nice to be able to compare any two objects together.

Sébastien Eskenaz

2:34 a.m.

...

On Sep 25, 2019, at 02:15, Sébastien Eskenaz <s.eskenaz@pixelz.com> wrote:

...
Hello,

Several libraries have complex objects but no comparison operators for

them other than "is" which checks if we are comparing an object with itself.

Most of them also have == which does the same comparison, because you get that by default. Breaking that would be a backward compatibility nightmare. And the ones that don’t have deliberately disabled equality comparison, probably for some good reason, so it’s probably an even worse idea to break them.

...
It would be quite nice to be able to compare any two objects together.

Sure, but there’s no rule that makes sense for every type.

There is a large set of classes where “compare for equality by comparing all attributes for equality” makes perfect sense. And even a pretty large subset of that where “compare for order by comparing attributes lexicographically in their natural attribute order” makes sense too.

But it’s definitely not all classes. For example, let’s say you have a DOM node class. Presumably you want == to be true of two nodes that head equal (sub)trees. But if the children are accessed by a method call, or by iterating, not by an attribute, your algorithm does the wrong thing. Or, if there’s a back link to the parent that is an attribute, again it does the wrong thing. Not to mention that, because Python is ridiculously dynamic, you can have classes where there’s not even a way to get a list of all attributes—think of a remote proxy or a bridge to another language.

Also, even for some classes that _could_ be correctly compared this way, it still might not be a good idea. For example, comparing two HTTP responses by their content arguably does the right thing. But do you want r1 == r2 to potentially block for 20 seconds while it downloads two complete resources over the network? To compare both raw content and decoded text because there are properties for both even though they’re redundant? To raise an exception about a network error?

I think “classes that you’d want to compare by lexicographically comparing attributes” overlaps pretty well with “classes that could have been defined with @dataclass”. Classes that actually _are_ defined with @dataclass already get comparison operators for free, but there are tons of classes that maybe would use @dataclass in an ideal world, but they predate it, or need to work with Python 3.5 or 2.7, or need a custom metaclass, or have some behavior that doesn’t play nice with some of the limitations of @dataclass, etc. So those are the ones you care about.

If your algorithm would work for 90% of those classes (which means you don’t need to deal with every weird edge case—just declare that Infinite attribute cycles aren’t part of the 90% you handle), and you had an easy way to monkeypatch or wrap the handful of classes you actually need to compare on a given app, would that be good enough? Because I think you could write that pretty easily, and it might give you everything you’re looking for.

Andrew Barnert

6:47 a.m.

On Sep 26, 2019, at 19:34, Sébastien Eskenaz <s.eskenaz@pixelz.com> wrote:

...

Regarding parent class comparison, I would assume that there is a good reason to have made that class `MyStr` hence either some of its methods or attributes will end up being different from `str`.

...

So testing directly for the class is only a faster way to find the difference. I don't really see the problem with saying that objects of two different classes will always be different (unless the object implements its own `__eq__` function in which case the user can use the `==` operator). It's like comparing apples to oranges. They're bound to be different.