Mailman 3 Thoughts about implementing object-compare in unittest package? - Python-ideas

Thoughts about implementing object-compare in unittest package?

older
Alternative syntax for Callable...

Henry Lin

25 Jul 2020 25 Jul '20

10:15 p.m.

Hey all, What are thoughts about implementing an object-compare function in the unittest package? (Compare two objects recursively, attribute by attribute.) This seems like a common use case in many testing scenarios, and there are many 3rd party solutions solving the same problem. (Maybe we can promote standardization by implementing it in the standard library?) Apologies ahead of time if this idea has already been proposed; I was not able to find similar posts in the archive. Best, -Henry

Attachments:

attachment.htm (text/html — 603 bytes)

Show replies by date

Steven D'Aprano

26 Jul 26 Jul

4:54 a.m.

On Sat, Jul 25, 2020 at 10:15:16PM -0500, Henry Lin wrote:

...

Hey all,

What are thoughts about implementing an object-compare function in the unittest package? (Compare two objects recursively, attribute by attribute.)

Why not just ask the objects to compare themselves? assertEqual(actual, expected) will work if actual and expected define a sensible `__eq__` and are the same type. If they aren't the same type, why not? actual = MyObject(spam=1, eggs=2, cheese=3) expected = DifferentObject(spam=1, eggs=2, cheese=3)

...

This seems like a common use case in many testing scenarios,

I've never come across it. Can you give an example where defining an `__eq__` method won't be the right solution? -- Steven

Henry Lin

12:31 p.m.

Hi Steven, You're right, declaring `__eq__` for the class we want to compare would solve this issue. However, we have the tradeoff that - All classes need to implement the `__eq__` method to compare two instances; - Any class implementing the `__eq__` operator is no longer hashable - Developers might not want to leak the `__eq__` function to other developers; I wouldn't want to invade the implementation of my class just for testing. In terms of the "popularity" of this potential feature, from what I understand (and through my own development), there are testing libraries built with this feature. For example, testfixtures.compare <https://testfixtures.readthedocs.io/en/latest/api.html#testfixtures.compare> can compare two objects recursively, and I am using it in my development for this purpose. On Sun, Jul 26, 2020 at 4:56 AM Steven D'Aprano <steve@pearwood.info> wrote:

...

On Sat, Jul 25, 2020 at 10:15:16PM -0500, Henry Lin wrote:

...
Hey all,

What are thoughts about implementing an object-compare function in the unittest package? (Compare two objects recursively, attribute by attribute.)

Why not just ask the objects to compare themselves?

assertEqual(actual, expected)

will work if actual and expected define a sensible `__eq__` and are the same type. If they aren't the same type, why not?

actual = MyObject(spam=1, eggs=2, cheese=3) expected = DifferentObject(spam=1, eggs=2, cheese=3)

...
This seems like a common use case in many testing scenarios,

I've never come across it. Can you give an example where defining an `__eq__` method won't be the right solution?

-- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TDLFBU... Code of Conduct: http://python.org/psf/codeofconduct/

Marco Sulla

12:47 p.m.

On Sun, 26 Jul 2020 at 19:33, Henry Lin <hlin117@gmail.com> wrote:

...

- Any class implementing the `__eq__` operator is no longer hashable

You can use: def __hash__(self): return id(self)

Richard Damon

1:24 p.m.

On 7/26/20 1:47 PM, Marco Sulla wrote:

...

On Sun, 26 Jul 2020 at 19:33, Henry Lin <hlin117@gmail.com <mailto:hlin117@gmail.com>> wrote:

* Any class implementing the `__eq__` operator is no longer hashable

You can use:

def __hash__(self): return id(self) I thought that there was an assumption that if two objects are equal (via __eq__) then their hashes (via __hash__) should be equal? Which wouldn't hold for this definition, and thus dictionaries wouldn't behave as expected.

-- Richard Damon

Marco Sulla

3:09 p.m.

You're quite right, but if you don't implement __eq__, the hash of an object is simply a random integer (I suppose generated from the address of the object). Alternatively, if you want a quick hash, you can use hash(str(obj)) (if you implemented __str__ or __repr__).

Richard Damon

3:49 p.m.

On 7/26/20 4:09 PM, Marco Sulla wrote:

...

You're quite right, but if you don't implement __eq__, the hash of an object is simply a random integer (I suppose generated from the address of the object).

Alternatively, if you want a quick hash, you can use hash(str(obj)) (if you implemented __str__ or __repr__).

And if you don't implement __eq__, I thought that the default equal was same id(), (which is what the hash is based on too). The idea was (I thought) that if you implement an __eq__, so that two different object could compare equal, then you needed to come up with some hash function for that object that matched that equality function, or the object is considered unhashable. -- Richard Damon

Steven D'Aprano

8:26 p.m.

On Sun, Jul 26, 2020 at 07:47:39PM +0200, Marco Sulla wrote:

...

On Sun, 26 Jul 2020 at 19:33, Henry Lin <hlin117@gmail.com> wrote:

...
- Any class implementing the `__eq__` operator is no longer hashable

You can use:

def __hash__(self): return id(self)

Don't do that. It's a horrible hash function. The `object` superclass already knows how to do a good, reliable hash function. Use it. def __hash__(self): return super().__hash__() -- Steven

Henry Lin

9:21 p.m.

@Steven D'Aprano <steve@pearwood.info>

...

- Developers might not want to leak the `__eq__` function to other

...
developers; I wouldn't want to invade the implementation of my class just for testing. That seems odd to me. You are *literally* comparing two instances for equality, just calling it something different from `==`. Why would you not be happy to expose it?

My thinking is by default, the `==` operator checks whether two objects have the same reference. So implementing `__eq__` is actually a breaking change for developers. It seems by consensus of people here, people do tend to implement `__eq__` anyways, so maybe this point is minor. I do appreciate the suggestion of adding this feature into functools though. Let's assume we commit to doing something like this. Thinking how this feature can be extended, let's suppose for testing purposes, I want to highlight which attributes of two objects are mismatching. Would we have to implement something different to find the delta between two objects, or could components of the functools solution be reused? (Would we want a feature like this to exist in the standard library?) On Sun, Jul 26, 2020 at 8:29 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

On Sun, Jul 26, 2020 at 07:47:39PM +0200, Marco Sulla wrote:

...
On Sun, 26 Jul 2020 at 19:33, Henry Lin <hlin117@gmail.com> wrote:

...
- Any class implementing the `__eq__` operator is no longer hashable

You can use:

def __hash__(self): return id(self)

Don't do that. It's a horrible hash function.

The `object` superclass already knows how to do a good, reliable hash function. Use it.

def __hash__(self): return super().__hash__()

-- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7N3GMY... Code of Conduct: http://python.org/psf/codeofconduct/

Ethan Furman

3:58 p.m.

On 7/26/20 10:31 AM, Henry Lin wrote:

...

You're right, declaring `__eq__` for the class we want to compare would solve this issue. However, we have the tradeoff that

* All classes need to implement the `__eq__` method to compare two instances;

I usually implement __eq__ sooner or later anyway -- even if just for testing.

...

* Any class implementing the `__eq__` operator is no longer hashable

One just needs to define a __hash__ method that behaves properly.

...

* Developers might not want to leak the `__eq__` function to other developers; I wouldn't want to invade the implementation of my class just for testing.

And yet that's exactly what you are proposing with your object compare. If two objects are, in fact, equal, why is it bad for == to say so? -- ~Ethan~

Alex Hall

4:12 p.m.

On Sun, Jul 26, 2020 at 11:01 PM Ethan Furman <ethan@stoneleaf.us> wrote:

...

On 7/26/20 10:31 AM, Henry Lin wrote:

...
You're right, declaring `__eq__` for the class we want to compare would solve this issue. However, we have the tradeoff that

* All classes need to implement the `__eq__` method to compare two instances;

I usually implement __eq__ sooner or later anyway -- even if just for testing.

...
* Any class implementing the `__eq__` operator is no longer hashable

One just needs to define a __hash__ method that behaves properly.

This is quite a significant change in behaviour which may break compatibility. Equality and hashing based only on identity can be quite a useful property which I often rely on. There's another reason people might find this useful - if the objects have differing attributes, the assertion can show exactly which ones, instead of just saying that the objects are not equal. Even if all the involved classes implement a matching repr, which is yet more work, the reprs will likely be on a single line and the diff will be difficult to read.

Henry Lin

6:55 p.m.

+1 to Alex Hall. In general I think there are a lot of questions regarding whether using the __eq__ operator is sufficient. It seems from people's feedback that it will essentially get the job done, but like Alex says, if we want to understand which field is leading to a test breaking, we wouldn't have the ability to easily check. On Sun, Jul 26, 2020 at 4:13 PM Alex Hall <alex.mojaki@gmail.com> wrote:

...

On Sun, Jul 26, 2020 at 11:01 PM Ethan Furman <ethan@stoneleaf.us> wrote:

...
On 7/26/20 10:31 AM, Henry Lin wrote:

...
You're right, declaring `__eq__` for the class we want to compare would solve this issue. However, we have the tradeoff that

* All classes need to implement the `__eq__` method to compare two instances;

I usually implement __eq__ sooner or later anyway -- even if just for testing.

...
* Any class implementing the `__eq__` operator is no longer hashable

One just needs to define a __hash__ method that behaves properly.

This is quite a significant change in behaviour which may break compatibility. Equality and hashing based only on identity can be quite a useful property which I often rely on.

There's another reason people might find this useful - if the objects have differing attributes, the assertion can show exactly which ones, instead of just saying that the objects are not equal. Even if all the involved classes implement a matching repr, which is yet more work, the reprs will likely be on a single line and the diff will be difficult to read. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZJYGN4... Code of Conduct: http://python.org/psf/codeofconduct/

Steven D'Aprano

9:22 p.m.

On Sun, Jul 26, 2020 at 11:12:39PM +0200, Alex Hall wrote:

...

There's another reason people might find this useful - if the objects have differing attributes, the assertion can show exactly which ones, instead of just saying that the objects are not equal.

That's a good point. I sat down to start an implementation, when a fundamental issue with this came to mind. This proposed comparison is effectively something close to: vars(actual) == vars(expected) only recursively and with provision for objects with `__slots__` and/or no `__dict__`. And that observation lead me to the insight that as tests go, this is a risky, unreliable test. A built-in example: actual = lambda: 1 # simulate some complex object expected = lambda: 2 # another complex object vars(actual) == vars(expected) # returns True So this is a comparison that needs to be used with care. It is easy for the test to pass while the objects are nevertheless not what you expect. Having said that, another perspective is that unittest already has a smart test for comparing dicts, assertDictEqual, which is automatically called by assertEqual. https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertDict... So it may be sufficient to have a utility function that copies an instance's slots and dict into a dict, and then compare dicts. Here's a sketch: d1 = vars(actual).copy() d1.update({key: value for key in actual.__slots__}) # Likewise for d2 from expected self.assertEqual(d1, d2) Make that handle the corner cases where objects have no instance dict or slots, and we're done. Thinking aloud here.... I see this as a kind of copy operation, and think this would be useful outside of testing. I've written code to copy attributes from instances on multiple occasions. So how about a new function in the `copy` module to do so: copy.getattrs(obj, deep=False) that returns a dict. Then the desired comparison could be a thin wrapper: def assertEqualAttrs(self, actual, expected, msg=None): self.assertEqual(getattrs(actual), getattrs(expected)) I'm not keen on a specialist test function, but I'm warming to the idea of exposing this functionality in a more general, and hence more useful, form. -- Steven

Henry Lin

9:45 p.m.

@Steven D'Aprano <steve@pearwood.info> All good ideas ☺ I'm in agreement that we should be building solutions which are generalizable. Are there more concerns people would like to bring up when considering the problem of object equality? On Sun, Jul 26, 2020 at 9:25 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

On Sun, Jul 26, 2020 at 11:12:39PM +0200, Alex Hall wrote:

...
There's another reason people might find this useful - if the objects have differing attributes, the assertion can show exactly which ones, instead of just saying that the objects are not equal.

That's a good point.

I sat down to start an implementation, when a fundamental issue with this came to mind. This proposed comparison is effectively something close to:

vars(actual) == vars(expected)

only recursively and with provision for objects with `__slots__` and/or no `__dict__`. And that observation lead me to the insight that as tests go, this is a risky, unreliable test.

A built-in example:

actual = lambda: 1 # simulate some complex object expected = lambda: 2 # another complex object vars(actual) == vars(expected) # returns True

So this is a comparison that needs to be used with care. It is easy for the test to pass while the objects are nevertheless not what you expect.

Having said that, another perspective is that unittest already has a smart test for comparing dicts, assertDictEqual, which is automatically called by assertEqual.

https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertDict...

So it may be sufficient to have a utility function that copies an instance's slots and dict into a dict, and then compare dicts. Here's a sketch:

d1 = vars(actual).copy() d1.update({key: value for key in actual.__slots__}) # Likewise for d2 from expected self.assertEqual(d1, d2)

Make that handle the corner cases where objects have no instance dict or slots, and we're done.

Thinking aloud here.... I see this as a kind of copy operation, and think this would be useful outside of testing. I've written code to copy attributes from instances on multiple occasions. So how about a new function in the `copy` module to do so:

copy.getattrs(obj, deep=False)

that returns a dict. Then the desired comparison could be a thin wrapper:

def assertEqualAttrs(self, actual, expected, msg=None): self.assertEqual(getattrs(actual), getattrs(expected))

I'm not keen on a specialist test function, but I'm warming to the idea of exposing this functionality in a more general, and hence more useful, form.

-- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MLRFS6... Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

10:22 p.m.

I am really surprised at the resistance against defining `__eq__` on the target objects. Every time this problem has cropped up in code I was working on (including code part of very large corporate code bases) the obvious solution was to define `__eq__`. The only reason I can think of why you are so resistant to this would be due to poor development practices, e.g. adding tests long after the "main" code has already been deployed, or having a separate team write tests. Regarding `__hash__`, it is a very bad idea to call `super().__hash__()`! Unless your `__eq__` also just calls `super().__eq__(other)` (and what would be the point of that?), defining `__hash__` that way will cause irreproducible behavior where *sometimes* an object that is equal to a dict key will not be found in the dict even though it is already present, because the two objects have different hash values. Defining `__hash__` as `id(self)` is no better. In fact, defining `__hash__` as returning the constant `42` is better, because it is fine if two objects that *don't* compare equal still have the same hash value (but not the other way around). The right way to define `__hash__` is to construct a tuple of all the attributes that are considered by `__eq__` and return the `hash()` of that tuple. (In some cases you can make it faster by leaving some expensive attribute out of the tuple -- again, that's fine, but don't consider anything that's not used by `__eq__`.) Finally, dataclasses get you all this for free, and they are the future. On Sun, Jul 26, 2020 at 7:48 PM Henry Lin <hlin117@gmail.com> wrote:

...

@Steven D'Aprano <steve@pearwood.info> All good ideas ☺ I'm in agreement that we should be building solutions which are generalizable.

Are there more concerns people would like to bring up when considering the problem of object equality?

On Sun, Jul 26, 2020 at 9:25 PM Steven D'Aprano <steve@pearwood.info> wrote:

...
On Sun, Jul 26, 2020 at 11:12:39PM +0200, Alex Hall wrote:

...
There's another reason people might find this useful - if the objects have differing attributes, the assertion can show exactly which ones, instead of just saying that the objects are not equal.

That's a good point.

I sat down to start an implementation, when a fundamental issue with this came to mind. This proposed comparison is effectively something close to:

vars(actual) == vars(expected)

only recursively and with provision for objects with `__slots__` and/or no `__dict__`. And that observation lead me to the insight that as tests go, this is a risky, unreliable test.

A built-in example:

actual = lambda: 1 # simulate some complex object expected = lambda: 2 # another complex object vars(actual) == vars(expected) # returns True

So this is a comparison that needs to be used with care. It is easy for the test to pass while the objects are nevertheless not what you expect.

Having said that, another perspective is that unittest already has a smart test for comparing dicts, assertDictEqual, which is automatically called by assertEqual.

https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertDict...

So it may be sufficient to have a utility function that copies an instance's slots and dict into a dict, and then compare dicts. Here's a sketch:

d1 = vars(actual).copy() d1.update({key: value for key in actual.__slots__}) # Likewise for d2 from expected self.assertEqual(d1, d2)

Make that handle the corner cases where objects have no instance dict or slots, and we're done.

Thinking aloud here.... I see this as a kind of copy operation, and think this would be useful outside of testing. I've written code to copy attributes from instances on multiple occasions. So how about a new function in the `copy` module to do so:

copy.getattrs(obj, deep=False)

that returns a dict. Then the desired comparison could be a thin wrapper:

def assertEqualAttrs(self, actual, expected, msg=None): self.assertEqual(getattrs(actual), getattrs(expected))

I'm not keen on a specialist test function, but I'm warming to the idea of exposing this functionality in a more general, and hence more useful, form.

-- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MLRFS6... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/U6RQ6G... Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Steven D'Aprano

27 Jul 27 Jul

12:12 a.m.

On Sun, Jul 26, 2020 at 08:22:47PM -0700, Guido van Rossum wrote:

...

Regarding `__hash__`, it is a very bad idea to call `super().__hash__()`!

Today I learned. Thank you. -- Steven

Christopher Barker

1:15 p.m.

On Sun, Jul 26, 2020 at 8:25 PM Guido van Rossum <guido@python.org> wrote:

...

The only reason I can think of why you are so resistant to this would be due to poor development practices, e.g. adding tests long after the "main" code has already been deployed, or having a separate team write tests.

and even then, maybe monkey-patch an __eq__ in for your tests? For my part, I have for sure defined __eq__ for no other reason than tests -- but I'm still glad I did. Though perhaps the idea (sorry, not sure who to credit) of providing a utility for object equality in the stdlib, so that in the common case, it would be simple to write a "standard" __eq__ would be nice to have. (note on that -- make sure it handles properties "properly" -- if that's possible) In fact, defining `__hash__` as returning the constant `42` is better,

...

because it is fine if two objects that *don't* compare equal still have the same hash value (but not the other way around).

Really? can anyone recommend something to read so I can "get" this -- it's counter intuitive to me. Is __eq__ always checked?!? I recently was faced with dealing with this issue in updating some old code, and I'm still a bit confused about the relationship between __hash__ and __eq__, and main Python docs did not clarify it for me.

...

Finally, dataclasses get you all this for free, and they are the future.

That is a great point -- I've learned that the really nice thing about dataclasses is that they keep a separate structure of all the attributes that matter, and some metadata about them -- type, etc. This is really useful, and better (or at least more stable) than simply relying on __dict__ and friends. I'm thinking that a "dataclasstools" package that builds on dataclasses, would be really nice -- clearly something to start on PyPi, but as a unified effort, we could get something cleaner than everyone building their own little bit on their own. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Ethan Furman

2 p.m.

On 7/27/20 11:15 AM, Christopher Barker wrote:

...

On Sun, Jul 26, 2020 at 8:25 PM Guido van Rossum wrote:

...

...
In fact, defining `__hash__` as returning the constant `42` is better, because it is fine if two objects that *don't* compare equal still have the same hash value (but not the other way around).

Really? can anyone recommend something to read so I can "get" this -- it's counter intuitive to me. Is __eq__ always checked?!? I recently was faced with dealing with this issue in updating some old code, and I'm still a bit confused about the relationship between __hash__ and __eq__, and main Python docs did not clarify it for me.

Equal objects must have equal hashes. Objects that compare equal must have hashes that compare equal. However, not all objects with the equal hashes compare equal themselves. From a practical standpoint, think of dictionaries: adding ------ - objects are sorted into buckets based on their hash - any one bucket can have several items with equal hashes - those several items (obviously) will not compare equal retrieving ---------- - get the hash of the object - find the bucket that would hold that hash - find the already stored objects with the same hash - use __eq__ on each one to find the match So, if an object's hash changes, then it will no longer be findable in any hash table (dict, set, etc.). -- ~Ethan~

Christopher Barker

7 p.m.

I guess this is the part I find confusing: when (and why) does __eq__ play a role? On Mon, Jul 27, 2020 at 12:01 PM Ethan Furman <ethan@stoneleaf.us> wrote:

...

Equal objects must have equal hashes. Objects that compare equal must have hashes that compare equal.

OK got it. However, not all objects with the equal hashes compare equal themselves.

...

That's the one I find confusing -- why is it not "bad" for two objects with the same has (the 42 example above) to not be equal? That seems like it would be very dangerous. Is this because it's possible, if very unlikely, for ANY hash algorithm to create the same hash for two different inputs? So equality always has to be checked anyway?

...

From a practical standpoint, think of dictionaries:

(that's the trick here -- you can't "get" this without knowing something about the implementation details of dicts.)

...

adding ------ - objects are sorted into buckets based on their hash - any one bucket can have several items with equal hashes

is this mostly because there are many more possible hashes than buckets? - those several items (obviously) will not compare equal

...

So the hash is a fast way to put stuff in buckets, so you only need to compare with the others that end up in the same bucket? retrieving

...

---------- - get the hash of the object - find the bucket that would hold that hash - find the already stored objects with the same hash - use __eq__ on each one to find the match

So here's my question: if there is only one object in that bucket, is __eq__ checked anyway? If so, then yes, can see why it's not dangerous (if potentially slow) to have a bunch of unequal objects with the same hash.

...

So, if an object's hash changes, then it will no longer be findable in any hash table (dict, set, etc.).

That part, I think I got. So what happens when there is no __eq__?The object can still be hashable -- I guess that's because there IS an __eq__ -- it defaults to an id check, yes? -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Ethan Furman

7:40 p.m.

On 7/27/20 5:00 PM, Christopher Barker wrote:

...

I guess this is the part I find confusing:

when (and why) does __eq__ play a role?

__eq__ is the final authority on whether two objects are equal. The default __eq__ punts and used identity.

...

On Mon, Jul 27, 2020 at 12:01 PM Ethan Furman wrote:

...

...
However, not all objects with the equal hashes compare equal themselves.

That's the one I find confusing -- why is it not "bad" for two objects with the same hash (the 42 example above) to not be equal? That seems like it would be very dangerous. Is this because it's possible, if very unlikely, for ANY hash algorithm to create the same hash for two different inputs? So equality always has to be checked anyway?

Well, there are a finite number of integers to be used as hashes, and potentially many more than that number of objects needing to be hashed. So, yes, hashes can (and will) be shared, and equality must be checked also. For example, if a hash algorithm decided to use short names, then a group of people might be sorted like this: Bob: Bob, Robert Chris: Christopher, Christine, Christian, Christina Ed: Edmund, Edward, Edwin, Edwina So if somebody draws a name from a hat: Christina You apply the hash to it: Chris Ignore the Bob and Ed buckets, then use equality checks on the Chris names to find the right one.

...

...
From a practical standpoint, think of dictionaries:

(that's the trick here -- you can't "get" this without knowing something about the implementation details of dicts.)

Depends on the person -- I always do better with a concrete application.

...

...
adding ------ - objects are sorted into buckets based on their hash - any one bucket can have several items with equal hashes

is this mostly because there are many more possible hashes than buckets?

Yes.

...

...
- those several items (obviously) will not compare equal

So the hash is a fast way to put stuff in buckets, so you only need to compare with the others that end up in the same bucket?

Yes.

...

...
retrieving ---------- - get the hash of the object - find the bucket that would hold that hash - find the already stored objects with the same hash - use __eq__ on each one to find the match

So here's my question: if there is only one object in that bucket, is __eq__ checked anyway?

Yes -- just because it has the same hash does not mean it's equal.

...

So what happens when there is no __eq__?The object can still be hashable -- I guess that's because there IS an __eq__ -- it defaults to an id check, yes?

Yes. The default hash, I believe, also defaults to the object id -- so, by default, objects are hashable and compare equal only to themselves. -- ~Ethan~

Christopher Barker

28 Jul 28 Jul

5:58 p.m.

On Mon, Jul 27, 2020 at 5:42 PM Ethan Furman <ethan@stoneleaf.us> wrote:

...

Chris Barker wrote:

...

...
Is this because it's possible, if very unlikely, for ANY hash algorithm to create the same hash for two different inputs? So equality always has to be checked anyway?

snip

For example, if a hash algorithm decided to use short names, then a

...

group of people might be sorted like this:

Bob: Bob, Robert Chris: Christopher, Christine, Christian, Christina Ed: Edmund, Edward, Edwin, Edwina

So if somebody draws a name from a hat:

Christina

You apply the hash to it:

Chris

Ignore the Bob and Ed buckets, then use equality checks on the Chris names to find the right one.

sure, but know (or assume anyway) that python dicts and sets don't use such a simple, naive hash algorithm, so in fact, non-equal strings are very unlikely to have the same hash: In [42]: hash("Christina") Out[42]: -8424898463413304204 In [43]: hash("Christopher") Out[43]: 4404166401429815751 In [44]: hash("Christian") Out[44]: 1032502133450913307 But a dict always has a LOT fewer buckets than possible hash values, so clashes within a bucket are not so rare, so equality needs to be checked always -- which is what I was missing. And while it wouldn't break anything, having a bunch of non-equal objects produce the same hash wouldn't break anything, it would break the O(1) performance of dicts. Have I got that right? -CHB

...

...
...
From a practical standpoint, think of dictionaries:

(that's the trick here -- you can't "get" this without knowing something about the implementation details of dicts.)

Depends on the person -- I always do better with a concrete application.

...
...
adding ------ - objects are sorted into buckets based on their hash - any one bucket can have several items with equal hashes

is this mostly because there are many more possible hashes than buckets?

Yes.

...
...
- those several items (obviously) will not compare equal

So the hash is a fast way to put stuff in buckets, so you only need to compare with the others that end up in the same bucket?

Yes.

...
...
retrieving ---------- - get the hash of the object - find the bucket that would hold that hash - find the already stored objects with the same hash - use __eq__ on each one to find the match

So here's my question: if there is only one object in that bucket, is __eq__ checked anyway?

Yes -- just because it has the same hash does not mean it's equal.

...
So what happens when there is no __eq__?The object can still be hashable -- I guess that's because there IS an __eq__ -- it defaults to an id check, yes?

Yes.

The default hash, I believe, also defaults to the object id -- so, by default, objects are hashable and compare equal only to themselves.

-- ~Ethan~ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XPUXOS... Code of Conduct: http://python.org/psf/codeofconduct/

-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

2QdxY4RzWzUUiLuE＠potatochowder.com

8:40 p.m.

On 2020-07-28 at 15:58:58 -0700, Christopher Barker <pythonchb@gmail.com> wrote:

...

But a dict always has a LOT fewer buckets than possible hash values, so clashes within a bucket are not so rare, so equality needs to be checked always -- which is what I was missing.

...

And while it wouldn't break anything, having a bunch of non-equal objects produce the same hash wouldn't break anything, it would break the O(1) performance of dicts.

...

Have I got that right?

Yes. Breaking O(1) performance was actually the root of possible Denial of Service attacks: if an attacker knows the algorithms, that attacker could specifically create keys (e.g., user names) whose hash values are the same, and then searching a dict degenerates to O(N), and then your server falls to its knees. At some point, Python added some randomization to the way dictionaries work in order to foil suck attacks.

Stephen J. Turnbull

29 Jul 29 Jul

12:26 a.m.

2QdxY4RzWzUUiLuE@potatochowder.com writes:

...

in order to foil suck attacks.

Typo of the Year candidate! (It was a typo, right?)

2QdxY4RzWzUUiLuE＠potatochowder.com

5:44 a.m.

On 2020-07-29 at 14:26:25 +0900, "Stephen J. Turnbull" <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

2QdxY4RzWzUUiLuE@potatochowder.com writes:

...
in order to foil suck attacks.

Typo of the Year candidate! (It was a typo, right?)

Call it a Freudian slip of the fingers.

Steven D'Aprano

26 Jul 26 Jul

8:24 p.m.

On Sun, Jul 26, 2020 at 12:31:17PM -0500, Henry Lin wrote:

...

Hi Steven,

You're right, declaring `__eq__` for the class we want to compare would solve this issue. However, we have the tradeoff that

- All classes need to implement the `__eq__` method to compare two instances;

One argument in favour of a standard solution would be to avoid duplicated implementations. Perhaps we should add something, not as a unittest method, but in functools: def compare(a, b): if a is b: return True # Simplified version. return vars(a) == vars(b) The actual implementation would be more complex, of course. Then classes could optionally implement equality: def __eq__(self, other): if isinstance(other, type(self): return functools.compare(self, other) return NotImplemented or if you prefer, you could call the function directly in your unit tests: self.assertTrue(functools.compare(actual, other))

...

- Any class implementing the `__eq__` operator is no longer hashable

Easy enough to add back in: def __hash__(self): return super().__hash__()

...

- Developers might not want to leak the `__eq__` function to other developers; I wouldn't want to invade the implementation of my class just for testing.

That seems odd to me. You are *literally* comparing two instances for equality, just calling it something different from `==`. Why would you not be happy to expose it?

...

In terms of the "popularity" of this potential feature, from what I understand (and through my own development), there are testing libraries built with this feature. For example, testfixtures.compare <https://testfixtures.readthedocs.io/en/latest/api.html#testfixtures.compare> can compare two objects recursively, and I am using it in my development for this purpose.

That's a good example of what we should *not* do, and why trying to create a single standard solution for every imaginable scenario can only end up with an over-engineered, complex, complicated, confusing API: testfixtures.compare( x, y, prefix=None, suffix=None, raises=True, recursive=True, strict=False, comparers=None, **kw) Not shown in the function signature are additional keyword arguments: actual, expected # alternative spelling for x, y x_label, y_label, ignore_eq That is literally thirteen optional parameters, plus arbitrary keyword parameters, for something that just compares two objects. But a simple comparison function, possibly in functools, that simply compares attributes, might be worthwhile. -- Steven

1616

Age (days ago)

1619

Last active (days ago)

List overview

Download

24 comments

10 participants

participants (10)

2QdxY4RzWzUUiLuE＠potatochowder.com
Alex Hall
Christopher Barker
Ethan Furman
Guido van Rossum
Henry Lin
Marco Sulla
Richard Damon
Stephen J. Turnbull
Steven D'Aprano

Thoughts about implementing object-compare in unittest package?

tags

participants (10)