Thoughts about implementing object-compare in unittest package?

Hey all, What are thoughts about implementing an object-compare function in the unittest package? (Compare two objects recursively, attribute by attribute.) This seems like a common use case in many testing scenarios, and there are many 3rd party solutions solving the same problem. (Maybe we can promote standardization by implementing it in the standard library?) Apologies ahead of time if this idea has already been proposed; I was not able to find similar posts in the archive. Best, -Henry

On Sat, Jul 25, 2020 at 10:15:16PM -0500, Henry Lin wrote:
Why not just ask the objects to compare themselves? assertEqual(actual, expected) will work if actual and expected define a sensible `__eq__` and are the same type. If they aren't the same type, why not? actual = MyObject(spam=1, eggs=2, cheese=3) expected = DifferentObject(spam=1, eggs=2, cheese=3)
This seems like a common use case in many testing scenarios,
I've never come across it. Can you give an example where defining an `__eq__` method won't be the right solution? -- Steven

Hi Steven, You're right, declaring `__eq__` for the class we want to compare would solve this issue. However, we have the tradeoff that - All classes need to implement the `__eq__` method to compare two instances; - Any class implementing the `__eq__` operator is no longer hashable - Developers might not want to leak the `__eq__` function to other developers; I wouldn't want to invade the implementation of my class just for testing. In terms of the "popularity" of this potential feature, from what I understand (and through my own development), there are testing libraries built with this feature. For example, testfixtures.compare <https://testfixtures.readthedocs.io/en/latest/api.html#testfixtures.compare> can compare two objects recursively, and I am using it in my development for this purpose. On Sun, Jul 26, 2020 at 4:56 AM Steven D'Aprano <steve@pearwood.info> wrote:

You're quite right, but if you don't implement __eq__, the hash of an object is simply a random integer (I suppose generated from the address of the object). Alternatively, if you want a quick hash, you can use hash(str(obj)) (if you implemented __str__ or __repr__).

On 7/26/20 4:09 PM, Marco Sulla wrote:
And if you don't implement __eq__, I thought that the default equal was same id(), (which is what the hash is based on too). The idea was (I thought) that if you implement an __eq__, so that two different object could compare equal, then you needed to come up with some hash function for that object that matched that equality function, or the object is considered unhashable. -- Richard Damon

@Steven D'Aprano <steve@pearwood.info>
My thinking is by default, the `==` operator checks whether two objects have the same reference. So implementing `__eq__` is actually a breaking change for developers. It seems by consensus of people here, people do tend to implement `__eq__` anyways, so maybe this point is minor. I do appreciate the suggestion of adding this feature into functools though. Let's assume we commit to doing something like this. Thinking how this feature can be extended, let's suppose for testing purposes, I want to highlight which attributes of two objects are mismatching. Would we have to implement something different to find the delta between two objects, or could components of the functools solution be reused? (Would we want a feature like this to exist in the standard library?) On Sun, Jul 26, 2020 at 8:29 PM Steven D'Aprano <steve@pearwood.info> wrote:

On 7/26/20 10:31 AM, Henry Lin wrote:
I usually implement __eq__ sooner or later anyway -- even if just for testing.
* Any class implementing the `__eq__` operator is no longer hashable
One just needs to define a __hash__ method that behaves properly.
And yet that's exactly what you are proposing with your object compare. If two objects are, in fact, equal, why is it bad for == to say so? -- ~Ethan~

On Sun, Jul 26, 2020 at 11:01 PM Ethan Furman <ethan@stoneleaf.us> wrote:
This is quite a significant change in behaviour which may break compatibility. Equality and hashing based only on identity can be quite a useful property which I often rely on. There's another reason people might find this useful - if the objects have differing attributes, the assertion can show exactly which ones, instead of just saying that the objects are not equal. Even if all the involved classes implement a matching repr, which is yet more work, the reprs will likely be on a single line and the diff will be difficult to read.

+1 to Alex Hall. In general I think there are a lot of questions regarding whether using the __eq__ operator is sufficient. It seems from people's feedback that it will essentially get the job done, but like Alex says, if we want to understand which field is leading to a test breaking, we wouldn't have the ability to easily check. On Sun, Jul 26, 2020 at 4:13 PM Alex Hall <alex.mojaki@gmail.com> wrote:

On Sun, Jul 26, 2020 at 11:12:39PM +0200, Alex Hall wrote:
That's a good point. I sat down to start an implementation, when a fundamental issue with this came to mind. This proposed comparison is effectively something close to: vars(actual) == vars(expected) only recursively and with provision for objects with `__slots__` and/or no `__dict__`. And that observation lead me to the insight that as tests go, this is a risky, unreliable test. A built-in example: actual = lambda: 1 # simulate some complex object expected = lambda: 2 # another complex object vars(actual) == vars(expected) # returns True So this is a comparison that needs to be used with care. It is easy for the test to pass while the objects are nevertheless not what you expect. Having said that, another perspective is that unittest already has a smart test for comparing dicts, assertDictEqual, which is automatically called by assertEqual. https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertDict... So it may be sufficient to have a utility function that copies an instance's slots and dict into a dict, and then compare dicts. Here's a sketch: d1 = vars(actual).copy() d1.update({key: value for key in actual.__slots__}) # Likewise for d2 from expected self.assertEqual(d1, d2) Make that handle the corner cases where objects have no instance dict or slots, and we're done. Thinking aloud here.... I see this as a kind of copy operation, and think this would be useful outside of testing. I've written code to copy attributes from instances on multiple occasions. So how about a new function in the `copy` module to do so: copy.getattrs(obj, deep=False) that returns a dict. Then the desired comparison could be a thin wrapper: def assertEqualAttrs(self, actual, expected, msg=None): self.assertEqual(getattrs(actual), getattrs(expected)) I'm not keen on a specialist test function, but I'm warming to the idea of exposing this functionality in a more general, and hence more useful, form. -- Steven

@Steven D'Aprano <steve@pearwood.info> All good ideas ☺ I'm in agreement that we should be building solutions which are generalizable. Are there more concerns people would like to bring up when considering the problem of object equality? On Sun, Jul 26, 2020 at 9:25 PM Steven D'Aprano <steve@pearwood.info> wrote:

I am really surprised at the resistance against defining `__eq__` on the target objects. Every time this problem has cropped up in code I was working on (including code part of very large corporate code bases) the obvious solution was to define `__eq__`. The only reason I can think of why you are so resistant to this would be due to poor development practices, e.g. adding tests long after the "main" code has already been deployed, or having a separate team write tests. Regarding `__hash__`, it is a very bad idea to call `super().__hash__()`! Unless your `__eq__` also just calls `super().__eq__(other)` (and what would be the point of that?), defining `__hash__` that way will cause irreproducible behavior where *sometimes* an object that is equal to a dict key will not be found in the dict even though it is already present, because the two objects have different hash values. Defining `__hash__` as `id(self)` is no better. In fact, defining `__hash__` as returning the constant `42` is better, because it is fine if two objects that *don't* compare equal still have the same hash value (but not the other way around). The right way to define `__hash__` is to construct a tuple of all the attributes that are considered by `__eq__` and return the `hash()` of that tuple. (In some cases you can make it faster by leaving some expensive attribute out of the tuple -- again, that's fine, but don't consider anything that's not used by `__eq__`.) Finally, dataclasses get you all this for free, and they are the future. On Sun, Jul 26, 2020 at 7:48 PM Henry Lin <hlin117@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Sun, Jul 26, 2020 at 8:25 PM Guido van Rossum <guido@python.org> wrote:
and even then, maybe monkey-patch an __eq__ in for your tests? For my part, I have for sure defined __eq__ for no other reason than tests -- but I'm still glad I did. Though perhaps the idea (sorry, not sure who to credit) of providing a utility for object equality in the stdlib, so that in the common case, it would be simple to write a "standard" __eq__ would be nice to have. (note on that -- make sure it handles properties "properly" -- if that's possible) In fact, defining `__hash__` as returning the constant `42` is better,
because it is fine if two objects that *don't* compare equal still have the same hash value (but not the other way around).
Really? can anyone recommend something to read so I can "get" this -- it's counter intuitive to me. Is __eq__ always checked?!? I recently was faced with dealing with this issue in updating some old code, and I'm still a bit confused about the relationship between __hash__ and __eq__, and main Python docs did not clarify it for me.
Finally, dataclasses get you all this for free, and they are the future.
That is a great point -- I've learned that the really nice thing about dataclasses is that they keep a separate structure of all the attributes that matter, and some metadata about them -- type, etc. This is really useful, and better (or at least more stable) than simply relying on __dict__ and friends. I'm thinking that a "dataclasstools" package that builds on dataclasses, would be really nice -- clearly something to start on PyPi, but as a unified effort, we could get something cleaner than everyone building their own little bit on their own. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 7/27/20 11:15 AM, Christopher Barker wrote:
On Sun, Jul 26, 2020 at 8:25 PM Guido van Rossum wrote:
Equal objects must have equal hashes. Objects that compare equal must have hashes that compare equal. However, not all objects with the equal hashes compare equal themselves. From a practical standpoint, think of dictionaries: adding ------ - objects are sorted into buckets based on their hash - any one bucket can have several items with equal hashes - those several items (obviously) will not compare equal retrieving ---------- - get the hash of the object - find the bucket that would hold that hash - find the already stored objects with the same hash - use __eq__ on each one to find the match So, if an object's hash changes, then it will no longer be findable in any hash table (dict, set, etc.). -- ~Ethan~

I guess this is the part I find confusing: when (and why) does __eq__ play a role? On Mon, Jul 27, 2020 at 12:01 PM Ethan Furman <ethan@stoneleaf.us> wrote:
Equal objects must have equal hashes. Objects that compare equal must have hashes that compare equal.
OK got it. However, not all objects with the equal hashes compare equal themselves.
That's the one I find confusing -- why is it not "bad" for two objects with the same has (the 42 example above) to not be equal? That seems like it would be very dangerous. Is this because it's possible, if very unlikely, for ANY hash algorithm to create the same hash for two different inputs? So equality always has to be checked anyway?
From a practical standpoint, think of dictionaries:
(that's the trick here -- you can't "get" this without knowing something about the implementation details of dicts.)
is this mostly because there are many more possible hashes than buckets? - those several items (obviously) will not compare equal
So the hash is a fast way to put stuff in buckets, so you only need to compare with the others that end up in the same bucket? retrieving
So here's my question: if there is only one object in that bucket, is __eq__ checked anyway? If so, then yes, can see why it's not dangerous (if potentially slow) to have a bunch of unequal objects with the same hash.
So, if an object's hash changes, then it will no longer be findable in any hash table (dict, set, etc.).
That part, I think I got. So what happens when there is no __eq__?The object can still be hashable -- I guess that's because there IS an __eq__ -- it defaults to an id check, yes? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 7/27/20 5:00 PM, Christopher Barker wrote:
I guess this is the part I find confusing:
when (and why) does __eq__ play a role?
__eq__ is the final authority on whether two objects are equal. The default __eq__ punts and used identity.
On Mon, Jul 27, 2020 at 12:01 PM Ethan Furman wrote:
Well, there are a finite number of integers to be used as hashes, and potentially many more than that number of objects needing to be hashed. So, yes, hashes can (and will) be shared, and equality must be checked also. For example, if a hash algorithm decided to use short names, then a group of people might be sorted like this: Bob: Bob, Robert Chris: Christopher, Christine, Christian, Christina Ed: Edmund, Edward, Edwin, Edwina So if somebody draws a name from a hat: Christina You apply the hash to it: Chris Ignore the Bob and Ed buckets, then use equality checks on the Chris names to find the right one.
Depends on the person -- I always do better with a concrete application.
Yes.
Yes.
Yes -- just because it has the same hash does not mean it's equal.
Yes. The default hash, I believe, also defaults to the object id -- so, by default, objects are hashable and compare equal only to themselves. -- ~Ethan~

On Mon, Jul 27, 2020 at 5:42 PM Ethan Furman <ethan@stoneleaf.us> wrote:
Chris Barker wrote:
For example, if a hash algorithm decided to use short names, then a
sure, but know (or assume anyway) that python dicts and sets don't use such a simple, naive hash algorithm, so in fact, non-equal strings are very unlikely to have the same hash: In [42]: hash("Christina") Out[42]: -8424898463413304204 In [43]: hash("Christopher") Out[43]: 4404166401429815751 In [44]: hash("Christian") Out[44]: 1032502133450913307 But a dict always has a LOT fewer buckets than possible hash values, so clashes within a bucket are not so rare, so equality needs to be checked always -- which is what I was missing. And while it wouldn't break anything, having a bunch of non-equal objects produce the same hash wouldn't break anything, it would break the O(1) performance of dicts. Have I got that right? -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 2020-07-28 at 15:58:58 -0700, Christopher Barker <pythonchb@gmail.com> wrote:
Have I got that right?
Yes. Breaking O(1) performance was actually the root of possible Denial of Service attacks: if an attacker knows the algorithms, that attacker could specifically create keys (e.g., user names) whose hash values are the same, and then searching a dict degenerates to O(N), and then your server falls to its knees. At some point, Python added some randomization to the way dictionaries work in order to foil suck attacks.

On Sun, Jul 26, 2020 at 12:31:17PM -0500, Henry Lin wrote:
One argument in favour of a standard solution would be to avoid duplicated implementations. Perhaps we should add something, not as a unittest method, but in functools: def compare(a, b): if a is b: return True # Simplified version. return vars(a) == vars(b) The actual implementation would be more complex, of course. Then classes could optionally implement equality: def __eq__(self, other): if isinstance(other, type(self): return functools.compare(self, other) return NotImplemented or if you prefer, you could call the function directly in your unit tests: self.assertTrue(functools.compare(actual, other))
- Any class implementing the `__eq__` operator is no longer hashable
Easy enough to add back in: def __hash__(self): return super().__hash__()
That seems odd to me. You are *literally* comparing two instances for equality, just calling it something different from `==`. Why would you not be happy to expose it?
That's a good example of what we should *not* do, and why trying to create a single standard solution for every imaginable scenario can only end up with an over-engineered, complex, complicated, confusing API: testfixtures.compare( x, y, prefix=None, suffix=None, raises=True, recursive=True, strict=False, comparers=None, **kw) Not shown in the function signature are additional keyword arguments: actual, expected # alternative spelling for x, y x_label, y_label, ignore_eq That is literally thirteen optional parameters, plus arbitrary keyword parameters, for something that just compares two objects. But a simple comparison function, possibly in functools, that simply compares attributes, might be worthwhile. -- Steven

On Sat, Jul 25, 2020 at 10:15:16PM -0500, Henry Lin wrote:
Why not just ask the objects to compare themselves? assertEqual(actual, expected) will work if actual and expected define a sensible `__eq__` and are the same type. If they aren't the same type, why not? actual = MyObject(spam=1, eggs=2, cheese=3) expected = DifferentObject(spam=1, eggs=2, cheese=3)
This seems like a common use case in many testing scenarios,
I've never come across it. Can you give an example where defining an `__eq__` method won't be the right solution? -- Steven

Hi Steven, You're right, declaring `__eq__` for the class we want to compare would solve this issue. However, we have the tradeoff that - All classes need to implement the `__eq__` method to compare two instances; - Any class implementing the `__eq__` operator is no longer hashable - Developers might not want to leak the `__eq__` function to other developers; I wouldn't want to invade the implementation of my class just for testing. In terms of the "popularity" of this potential feature, from what I understand (and through my own development), there are testing libraries built with this feature. For example, testfixtures.compare <https://testfixtures.readthedocs.io/en/latest/api.html#testfixtures.compare> can compare two objects recursively, and I am using it in my development for this purpose. On Sun, Jul 26, 2020 at 4:56 AM Steven D'Aprano <steve@pearwood.info> wrote:

You're quite right, but if you don't implement __eq__, the hash of an object is simply a random integer (I suppose generated from the address of the object). Alternatively, if you want a quick hash, you can use hash(str(obj)) (if you implemented __str__ or __repr__).

On 7/26/20 4:09 PM, Marco Sulla wrote:
And if you don't implement __eq__, I thought that the default equal was same id(), (which is what the hash is based on too). The idea was (I thought) that if you implement an __eq__, so that two different object could compare equal, then you needed to come up with some hash function for that object that matched that equality function, or the object is considered unhashable. -- Richard Damon

@Steven D'Aprano <steve@pearwood.info>
My thinking is by default, the `==` operator checks whether two objects have the same reference. So implementing `__eq__` is actually a breaking change for developers. It seems by consensus of people here, people do tend to implement `__eq__` anyways, so maybe this point is minor. I do appreciate the suggestion of adding this feature into functools though. Let's assume we commit to doing something like this. Thinking how this feature can be extended, let's suppose for testing purposes, I want to highlight which attributes of two objects are mismatching. Would we have to implement something different to find the delta between two objects, or could components of the functools solution be reused? (Would we want a feature like this to exist in the standard library?) On Sun, Jul 26, 2020 at 8:29 PM Steven D'Aprano <steve@pearwood.info> wrote:

On 7/26/20 10:31 AM, Henry Lin wrote:
I usually implement __eq__ sooner or later anyway -- even if just for testing.
* Any class implementing the `__eq__` operator is no longer hashable
One just needs to define a __hash__ method that behaves properly.
And yet that's exactly what you are proposing with your object compare. If two objects are, in fact, equal, why is it bad for == to say so? -- ~Ethan~

On Sun, Jul 26, 2020 at 11:01 PM Ethan Furman <ethan@stoneleaf.us> wrote:
This is quite a significant change in behaviour which may break compatibility. Equality and hashing based only on identity can be quite a useful property which I often rely on. There's another reason people might find this useful - if the objects have differing attributes, the assertion can show exactly which ones, instead of just saying that the objects are not equal. Even if all the involved classes implement a matching repr, which is yet more work, the reprs will likely be on a single line and the diff will be difficult to read.

+1 to Alex Hall. In general I think there are a lot of questions regarding whether using the __eq__ operator is sufficient. It seems from people's feedback that it will essentially get the job done, but like Alex says, if we want to understand which field is leading to a test breaking, we wouldn't have the ability to easily check. On Sun, Jul 26, 2020 at 4:13 PM Alex Hall <alex.mojaki@gmail.com> wrote:

On Sun, Jul 26, 2020 at 11:12:39PM +0200, Alex Hall wrote:
That's a good point. I sat down to start an implementation, when a fundamental issue with this came to mind. This proposed comparison is effectively something close to: vars(actual) == vars(expected) only recursively and with provision for objects with `__slots__` and/or no `__dict__`. And that observation lead me to the insight that as tests go, this is a risky, unreliable test. A built-in example: actual = lambda: 1 # simulate some complex object expected = lambda: 2 # another complex object vars(actual) == vars(expected) # returns True So this is a comparison that needs to be used with care. It is easy for the test to pass while the objects are nevertheless not what you expect. Having said that, another perspective is that unittest already has a smart test for comparing dicts, assertDictEqual, which is automatically called by assertEqual. https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertDict... So it may be sufficient to have a utility function that copies an instance's slots and dict into a dict, and then compare dicts. Here's a sketch: d1 = vars(actual).copy() d1.update({key: value for key in actual.__slots__}) # Likewise for d2 from expected self.assertEqual(d1, d2) Make that handle the corner cases where objects have no instance dict or slots, and we're done. Thinking aloud here.... I see this as a kind of copy operation, and think this would be useful outside of testing. I've written code to copy attributes from instances on multiple occasions. So how about a new function in the `copy` module to do so: copy.getattrs(obj, deep=False) that returns a dict. Then the desired comparison could be a thin wrapper: def assertEqualAttrs(self, actual, expected, msg=None): self.assertEqual(getattrs(actual), getattrs(expected)) I'm not keen on a specialist test function, but I'm warming to the idea of exposing this functionality in a more general, and hence more useful, form. -- Steven

@Steven D'Aprano <steve@pearwood.info> All good ideas ☺ I'm in agreement that we should be building solutions which are generalizable. Are there more concerns people would like to bring up when considering the problem of object equality? On Sun, Jul 26, 2020 at 9:25 PM Steven D'Aprano <steve@pearwood.info> wrote:

I am really surprised at the resistance against defining `__eq__` on the target objects. Every time this problem has cropped up in code I was working on (including code part of very large corporate code bases) the obvious solution was to define `__eq__`. The only reason I can think of why you are so resistant to this would be due to poor development practices, e.g. adding tests long after the "main" code has already been deployed, or having a separate team write tests. Regarding `__hash__`, it is a very bad idea to call `super().__hash__()`! Unless your `__eq__` also just calls `super().__eq__(other)` (and what would be the point of that?), defining `__hash__` that way will cause irreproducible behavior where *sometimes* an object that is equal to a dict key will not be found in the dict even though it is already present, because the two objects have different hash values. Defining `__hash__` as `id(self)` is no better. In fact, defining `__hash__` as returning the constant `42` is better, because it is fine if two objects that *don't* compare equal still have the same hash value (but not the other way around). The right way to define `__hash__` is to construct a tuple of all the attributes that are considered by `__eq__` and return the `hash()` of that tuple. (In some cases you can make it faster by leaving some expensive attribute out of the tuple -- again, that's fine, but don't consider anything that's not used by `__eq__`.) Finally, dataclasses get you all this for free, and they are the future. On Sun, Jul 26, 2020 at 7:48 PM Henry Lin <hlin117@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Sun, Jul 26, 2020 at 8:25 PM Guido van Rossum <guido@python.org> wrote:
and even then, maybe monkey-patch an __eq__ in for your tests? For my part, I have for sure defined __eq__ for no other reason than tests -- but I'm still glad I did. Though perhaps the idea (sorry, not sure who to credit) of providing a utility for object equality in the stdlib, so that in the common case, it would be simple to write a "standard" __eq__ would be nice to have. (note on that -- make sure it handles properties "properly" -- if that's possible) In fact, defining `__hash__` as returning the constant `42` is better,
because it is fine if two objects that *don't* compare equal still have the same hash value (but not the other way around).
Really? can anyone recommend something to read so I can "get" this -- it's counter intuitive to me. Is __eq__ always checked?!? I recently was faced with dealing with this issue in updating some old code, and I'm still a bit confused about the relationship between __hash__ and __eq__, and main Python docs did not clarify it for me.
Finally, dataclasses get you all this for free, and they are the future.
That is a great point -- I've learned that the really nice thing about dataclasses is that they keep a separate structure of all the attributes that matter, and some metadata about them -- type, etc. This is really useful, and better (or at least more stable) than simply relying on __dict__ and friends. I'm thinking that a "dataclasstools" package that builds on dataclasses, would be really nice -- clearly something to start on PyPi, but as a unified effort, we could get something cleaner than everyone building their own little bit on their own. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 7/27/20 11:15 AM, Christopher Barker wrote:
On Sun, Jul 26, 2020 at 8:25 PM Guido van Rossum wrote:
Equal objects must have equal hashes. Objects that compare equal must have hashes that compare equal. However, not all objects with the equal hashes compare equal themselves. From a practical standpoint, think of dictionaries: adding ------ - objects are sorted into buckets based on their hash - any one bucket can have several items with equal hashes - those several items (obviously) will not compare equal retrieving ---------- - get the hash of the object - find the bucket that would hold that hash - find the already stored objects with the same hash - use __eq__ on each one to find the match So, if an object's hash changes, then it will no longer be findable in any hash table (dict, set, etc.). -- ~Ethan~

I guess this is the part I find confusing: when (and why) does __eq__ play a role? On Mon, Jul 27, 2020 at 12:01 PM Ethan Furman <ethan@stoneleaf.us> wrote:
Equal objects must have equal hashes. Objects that compare equal must have hashes that compare equal.
OK got it. However, not all objects with the equal hashes compare equal themselves.
That's the one I find confusing -- why is it not "bad" for two objects with the same has (the 42 example above) to not be equal? That seems like it would be very dangerous. Is this because it's possible, if very unlikely, for ANY hash algorithm to create the same hash for two different inputs? So equality always has to be checked anyway?
From a practical standpoint, think of dictionaries:
(that's the trick here -- you can't "get" this without knowing something about the implementation details of dicts.)
is this mostly because there are many more possible hashes than buckets? - those several items (obviously) will not compare equal
So the hash is a fast way to put stuff in buckets, so you only need to compare with the others that end up in the same bucket? retrieving
So here's my question: if there is only one object in that bucket, is __eq__ checked anyway? If so, then yes, can see why it's not dangerous (if potentially slow) to have a bunch of unequal objects with the same hash.
So, if an object's hash changes, then it will no longer be findable in any hash table (dict, set, etc.).
That part, I think I got. So what happens when there is no __eq__?The object can still be hashable -- I guess that's because there IS an __eq__ -- it defaults to an id check, yes? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 7/27/20 5:00 PM, Christopher Barker wrote:
I guess this is the part I find confusing:
when (and why) does __eq__ play a role?
__eq__ is the final authority on whether two objects are equal. The default __eq__ punts and used identity.
On Mon, Jul 27, 2020 at 12:01 PM Ethan Furman wrote:
Well, there are a finite number of integers to be used as hashes, and potentially many more than that number of objects needing to be hashed. So, yes, hashes can (and will) be shared, and equality must be checked also. For example, if a hash algorithm decided to use short names, then a group of people might be sorted like this: Bob: Bob, Robert Chris: Christopher, Christine, Christian, Christina Ed: Edmund, Edward, Edwin, Edwina So if somebody draws a name from a hat: Christina You apply the hash to it: Chris Ignore the Bob and Ed buckets, then use equality checks on the Chris names to find the right one.
Depends on the person -- I always do better with a concrete application.
Yes.
Yes.
Yes -- just because it has the same hash does not mean it's equal.
Yes. The default hash, I believe, also defaults to the object id -- so, by default, objects are hashable and compare equal only to themselves. -- ~Ethan~

On Mon, Jul 27, 2020 at 5:42 PM Ethan Furman <ethan@stoneleaf.us> wrote:
Chris Barker wrote:
For example, if a hash algorithm decided to use short names, then a
sure, but know (or assume anyway) that python dicts and sets don't use such a simple, naive hash algorithm, so in fact, non-equal strings are very unlikely to have the same hash: In [42]: hash("Christina") Out[42]: -8424898463413304204 In [43]: hash("Christopher") Out[43]: 4404166401429815751 In [44]: hash("Christian") Out[44]: 1032502133450913307 But a dict always has a LOT fewer buckets than possible hash values, so clashes within a bucket are not so rare, so equality needs to be checked always -- which is what I was missing. And while it wouldn't break anything, having a bunch of non-equal objects produce the same hash wouldn't break anything, it would break the O(1) performance of dicts. Have I got that right? -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 2020-07-28 at 15:58:58 -0700, Christopher Barker <pythonchb@gmail.com> wrote:
Have I got that right?
Yes. Breaking O(1) performance was actually the root of possible Denial of Service attacks: if an attacker knows the algorithms, that attacker could specifically create keys (e.g., user names) whose hash values are the same, and then searching a dict degenerates to O(N), and then your server falls to its knees. At some point, Python added some randomization to the way dictionaries work in order to foil suck attacks.

On Sun, Jul 26, 2020 at 12:31:17PM -0500, Henry Lin wrote:
One argument in favour of a standard solution would be to avoid duplicated implementations. Perhaps we should add something, not as a unittest method, but in functools: def compare(a, b): if a is b: return True # Simplified version. return vars(a) == vars(b) The actual implementation would be more complex, of course. Then classes could optionally implement equality: def __eq__(self, other): if isinstance(other, type(self): return functools.compare(self, other) return NotImplemented or if you prefer, you could call the function directly in your unit tests: self.assertTrue(functools.compare(actual, other))
- Any class implementing the `__eq__` operator is no longer hashable
Easy enough to add back in: def __hash__(self): return super().__hash__()
That seems odd to me. You are *literally* comparing two instances for equality, just calling it something different from `==`. Why would you not be happy to expose it?
That's a good example of what we should *not* do, and why trying to create a single standard solution for every imaginable scenario can only end up with an over-engineered, complex, complicated, confusing API: testfixtures.compare( x, y, prefix=None, suffix=None, raises=True, recursive=True, strict=False, comparers=None, **kw) Not shown in the function signature are additional keyword arguments: actual, expected # alternative spelling for x, y x_label, y_label, ignore_eq That is literally thirteen optional parameters, plus arbitrary keyword parameters, for something that just compares two objects. But a simple comparison function, possibly in functools, that simply compares attributes, might be worthwhile. -- Steven
participants (10)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Alex Hall
-
Christopher Barker
-
Ethan Furman
-
Guido van Rossum
-
Henry Lin
-
Marco Sulla
-
Richard Damon
-
Stephen J. Turnbull
-
Steven D'Aprano