Dataclasses and correct hashability
On Sun, Feb 4, 2018, 9:50 PM Guido van Rossum
https://mail.python.org/mailman/listinfo/python-dev> wrote: Looks like this is turning into a major flamewar regardless of what I say. :-( I really don't want to lose the ability to add a hash function to a mutable dataclass by flipping a flag in the decorator. I'll explain below. But I am fine if this flag has a name that clearly signals it's an unsafe thing to do.
I propose to replace the existing (as of 3.7.0b1) hash= keyword for the @dataclass decorator with a simpler flag named unsafe_hash=. This would be a simple bool (not a tri-state flag like the current hash=None|False|True). The default would be False, and the behavior then would be to add a hash function automatically only if it's safe (using the same rules as for hash=None currently). With unsafe_hash=True, a hash function would always be generated that takes all fields into account except those declared using field(hash=False). If there's already a `def __hash__` in the function I don't care what it does, maybe it should raise rather than quietly doing nothing or quietly overwriting it.
Here's my use case.
May be it is better to provide a special purpose function `make_unsafe_hash` in dataclass module which will patch a dataclass, instead of to clutter @dataclass API with arguments which are rather special case. This `unsafe_hash` argument will constantly raise questions among ordinary users like me, and will be possibly considered as a non-obvious design - there is a public API but it is somehow unsafe. On the other hand, with a function, when the user asks how to make a `frozen=False` dataclass hashable, you can suggest to use this `make_unsafe_hash` function with all its cautions in its docs or to try to implement __hash__ by yourself. Also taking into account the Python approach for backward compatibility it is better to stick with function and if it will be usefull to add a `unsafe_hash` argument in Python 3.8. It is easier to add later than to deprecate in the future. With kind regards, -gdg
If there's going to be an API for it, it should be in the class, not
something that mutates the class afterwards.
On Mon, Feb 5, 2018 at 1:59 AM, Kirill Balunov
On Sun, Feb 4, 2018, 9:50 PM Guido van Rossum
https://mail.python.org/mailman/listinfo/python-dev> wrote: Looks like this is turning into a major flamewar regardless of what I say. :-( I really don't want to lose the ability to add a hash function to a mutable dataclass by flipping a flag in the decorator. I'll explain below. But I am fine if this flag has a name that clearly signals it's an unsafe thing to do.
I propose to replace the existing (as of 3.7.0b1) hash= keyword for the @dataclass decorator with a simpler flag named unsafe_hash=. This would be a simple bool (not a tri-state flag like the current hash=None|False|True). The default would be False, and the behavior then would be to add a hash function automatically only if it's safe (using the same rules as for hash=None currently). With unsafe_hash=True, a hash function would always be generated that takes all fields into account except those declared using field(hash=False). If there's already a `def __hash__` in the function I don't care what it does, maybe it should raise rather than quietly doing nothing or quietly overwriting it.
Here's my use case.
May be it is better to provide a special purpose function `make_unsafe_hash` in dataclass module which will patch a dataclass, instead of to clutter @dataclass API with arguments which are rather special case.
This `unsafe_hash` argument will constantly raise questions among ordinary users like me, and will be possibly considered as a non-obvious design - there is a public API but it is somehow unsafe. On the other hand, with a function, when the user asks how to make a `frozen=False` dataclass hashable, you can suggest to use this `make_unsafe_hash` function with all its cautions in its docs or to try to implement __hash__ by yourself.
Also taking into account the Python approach for backward compatibility it is better to stick with function and if it will be usefull to add a `unsafe_hash` argument in Python 3.8. It is easier to add later than to deprecate in the future.
With kind regards, -gdg
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ guido%40python.org
-- --Guido van Rossum (python.org/~guido)
2018-02-05 20:47 GMT+03:00 Guido van Rossum
If there's going to be an API for it, it should be in the class, not something that mutates the class afterwards.
I apologize and don't want to make unnecessary noise. But the already selected design with decorator @dataclass implies that it will mutate the freshly created class (which in its turn already limits some possibilities), or I've missed something? If you meant that everything should be defined in one place, then I basically understand your desire as the least of two evils. With kind regards, -gdg
Yes, that's what I meant -- "afterwards" meaning after the @dataclass
decorator is applied.
On Mon, Feb 5, 2018 at 11:09 AM, Kirill Balunov
2018-02-05 20:47 GMT+03:00 Guido van Rossum
: If there's going to be an API for it, it should be in the class, not something that mutates the class afterwards.
I apologize and don't want to make unnecessary noise. But the already selected design with decorator @dataclass implies that it will mutate the freshly created class (which in its turn already limits some possibilities), or I've missed something? If you meant that everything should be defined in one place, then I basically understand your desire as the least of two evils.
With kind regards, -gdg
-- --Guido van Rossum (python.org/~guido)
I don't think it matters so much whether you are stacking two decorators or a single decorator, but would an @add_unsafe_hash decorator be useful for anything *except* data classes? If not, then there's no point in having a *second* decorator that can *only* modify the first one - particularly considering @dataclass actually takes arguments. On 02/05/2018 02:12 PM, Guido van Rossum wrote:
Yes, that's what I meant -- "afterwards" meaning after the @dataclass decorator is applied.
On Mon, Feb 5, 2018 at 11:09 AM, Kirill Balunov
wrote: 2018-02-05 20:47 GMT+03:00 Guido van Rossum
: If there's going to be an API for it, it should be in the class, not something that mutates the class afterwards.
I apologize and don't want to make unnecessary noise. But the already selected design with decorator @dataclass implies that it will mutate the freshly created class (which in its turn already limits some possibilities), or I've missed something? If you meant that everything should be defined in one place, then I basically understand your desire as the least of two evils.
With kind regards, -gdg
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
On 6 February 2018 at 03:47, Guido van Rossum
If there's going to be an API for it, it should be in the class, not something that mutates the class afterwards.
Something I realised after posting the __class__ setting idea is that you can actually use a comparable trick to inject an unsafe hash from the frozen version into the mutable version: >>> from dataclasses import dataclass >>> @dataclass ... class Example: ... a: int ... b: int ... >>> c = Example(1, 2) >>> hash(c) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'Example' >>> @dataclass(frozen=True) ... class LockedExample(Example): ... pass ... >>> Example.__hash__ = LockedExample.__hash__ >>> hash(c) 3713081631934410656 So "unsafe_hash=True" would just be a shorthand spelling of that which skips creating the full frozen version of the class (and with the explicit parameter, we can better document the risks of making something hashable without also freezing it post-creation). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
From my experience it is quite normal, I was always thinking about `hash()` as hashing a _value_, and never as hashing _identity_, if I would need the latter, there is a different function for this, `id()`. Moreover, I often did this in situations where dataclasses would be useful: a class with several fields, necessary dunders, and few other methods (record-like classes). My motivation was mostly speed-up by memorization. To be fair my use cases were mostly about some tree-like strictures, but
Just wanted to add my 5 cents here. I am a bit surprised how people are
scared by adding `__hash__` to mutable classes.
this is probably a coincidence.
FWIW it is not a super-safe practice, but neither super-dangerous.
--
Ivan
On 5 February 2018 at 22:56, Nick Coghlan
On 6 February 2018 at 03:47, Guido van Rossum
wrote: If there's going to be an API for it, it should be in the class, not something that mutates the class afterwards.
Something I realised after posting the __class__ setting idea is that you can actually use a comparable trick to inject an unsafe hash from the frozen version into the mutable version:
>>> from dataclasses import dataclass >>> @dataclass ... class Example: ... a: int ... b: int ... >>> c = Example(1, 2) >>> hash(c) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'Example'
>>> @dataclass(frozen=True) ... class LockedExample(Example): ... pass ... >>> Example.__hash__ = LockedExample.__hash__ >>> hash(c) 3713081631934410656
So "unsafe_hash=True" would just be a shorthand spelling of that which skips creating the full frozen version of the class (and with the explicit parameter, we can better document the risks of making something hashable without also freezing it post-creation).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ levkivskyi%40gmail.com
To add a counter-example that I'm painfully familiar with: the old Thrift for Python makes its (mutable) structures hashable. This is "useful" because you can memoize functions that take Thrift structures as arguments. You can key dictionaries with them. And so on. Unfortunately, more often then not this same structure is passed around and inevitably gets mutated in place. Everything breaks. What should be memoized isn't (and in pathological cases, what shouldn't be memoized is), dictionary access *using the same object* raises unexpected key errors but iterating on the dictionary reveals the key. It's clearly clowntown and in hindsight we shouldn't have enabled this behavior. But we can't go back since too much code relies on hashability now and it's "mostly" fine. As a solution, the new asyncio-only Thrift implementation for Python uses C-level structures to make sure they are truly immutable in Python. We hash them and cache them like there's no tomorrow. - Ł
On Feb 5, 2018, at 11:54 PM, Ivan Levkivskyi
wrote: Just wanted to add my 5 cents here. I am a bit surprised how people are scared by adding `__hash__` to mutable classes. From my experience it is quite normal, I was always thinking about `hash()` as hashing a _value_, and never as hashing _identity_, if I would need the latter, there is a different function for this, `id()`. Moreover, I often did this in situations where dataclasses would be useful: a class with several fields, necessary dunders, and few other methods (record-like classes). My motivation was mostly speed-up by memorization. To be fair my use cases were mostly about some tree-like strictures, but this is probably a coincidence.
FWIW it is not a super-safe practice, but neither super-dangerous.
-- Ivan
On 5 February 2018 at 22:56, Nick Coghlan
mailto:ncoghlan@gmail.com> wrote: On 6 February 2018 at 03:47, Guido van Rossum mailto:guido@python.org> wrote: If there's going to be an API for it, it should be in the class, not something that mutates the class afterwards.
Something I realised after posting the __class__ setting idea is that you can actually use a comparable trick to inject an unsafe hash from the frozen version into the mutable version:
>>> from dataclasses import dataclass >>> @dataclass ... class Example: ... a: int ... b: int ... >>> c = Example(1, 2) >>> hash(c) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'Example'
>>> @dataclass(frozen=True) ... class LockedExample(Example): ... pass ... >>> Example.__hash__ = LockedExample.__hash__ >>> hash(c) 3713081631934410656
So "unsafe_hash=True" would just be a shorthand spelling of that which skips creating the full frozen version of the class (and with the explicit parameter, we can better document the risks of making something hashable without also freezing it post-creation).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com mailto:ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org mailto:Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/levkivskyi%40gmail.com https://mail.python.org/mailman/options/python-dev/levkivskyi%40gmail.com
_______________________________________________ Python-Dev mailing list Python-Dev@python.org mailto:Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/lukasz%40langa.pl https://mail.python.org/mailman/options/python-dev/lukasz%40langa.pl
participants (6)
-
Guido van Rossum
-
Ivan Levkivskyi
-
Kirill Balunov
-
Lukasz Langa
-
Nick Coghlan
-
Paul G