Should the dataclass frozen property apply to subclasses?

When working on the docs for dataclasses, something unexpected came up. If a dataclass is specified to be frozen, that characteristic is inherited by subclasses which prevents them from assigning additional attributes: >>> @dataclass(frozen=True) class D: x: int = 10 >>> class S(D): pass >>> s = S() >>> s.cached = True Traceback (most recent call last): File "<pyshell#49>", line 1, in <module> s.cached = True File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/dataclasses.py", line 448, in _frozen_setattr raise FrozenInstanceError(f'cannot assign to field {name!r}') dataclasses.FrozenInstanceError: cannot assign to field 'cached' Other immutable classes in Python don't behave the same way: >>> class T(tuple): pass >>> t = T([10, 20, 30]) >>> t.cached = True >>> class F(frozenset): pass >>> f = F([10, 20, 30]) >>> f.cached = True >>> class B(bytes): pass >>> b = B() >>> b.cached = True Raymond

On 2/22/2018 1:56 AM, Raymond Hettinger wrote:
This is because "frozen-ness" is implemented by adding __setattr__ and __delattr__ methods in D, which get inherited by S.
The only way I can think of emulating this is checking in __setattr__ to see if the field name is a field of the frozen class, and only raising an error in that case. A related issue is that dataclasses derived from frozen dataclasses are automatically "promoted" to being frozen.
Maybe it should be an error to declare B as non-frozen? Eric.

On 22 February 2018 at 10:55, Eric V. Smith <eric@trueblade.com> wrote:
How about checking that the type of self is the type where decorator was applied? For example (pseudocode): def dataclass(cls, ...): def _set_attr(self, attr, value): if type(self) is not cls: use super() else: raise AttributeError cls.__setattr__ = _set_attr It can be also more sophisticated, for example raising for all fields on class where frozen=True was used, while only on frozen fields for subclasses. -- Ivan

On 22 February 2018 at 20:55, Eric V. Smith <eric@trueblade.com> wrote:
If you were going to do that then it would likely make more sense to convert the frozen fields to data descriptors, so __setattr__ only gets called for attempts to add new attributes. Then for the `frozen=False` case, the decorator could force __setattr__ and __delattr__ back to the default implementations from object, rather than relying on the behaviour inherited from base classes.
It would be nice to avoid that, as a mutable subclass of a frozen base class could be a nice way to model hashable-but-mutable types: >>> @dataclass(frozen=True) # This is the immutable/hashable bit ... class A: ... i: int ... >>> @dataclass # This adds the mutable-but-comparable parts ... class B(A): ... j: int ... __hash__ = A.__hash__ However, disallowing this case for now *would* be a reasonable way to postpone making a decision until 3.8 - in the meantime, folks would still be able to experiment by overriding __setattr__ and __delattr__ after the dataclass decorator sets them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2/22/18 9:43 PM, Nick Coghlan wrote:
I guess it depends on what we're trying to do. By using data descriptors, we'd be saying "the fields can't be modified, but other parts of the class can". For example, what should happen here: @dataclass(frozen=True) class F: i: int f = F(10) f.i = 0 # should be an error f.j = 0 # should this be an error? It's not a field. I was hoping to get all of this finished today, by a2, but it looks like that's not going to happen.
For today, I'm going to make it an error to derive a frozen dataclass from a non-frozen dataclass, and also an error to derive a non-frozen dataclass from a frozen dataclass. With any luck (and Guido and Ned willing), we can address this by a3. Eric

I've opened https://bugs.python.org/issue32953 to track this. On 2/22/18 5:55 AM, Eric V. Smith wrote:
...
This is tricky to fix. Here's the problem with a inheriting a non-frozen dataclass from a frozen one. Consider class Y in this example: @dataclass class X: x: int @dataclass class Y(X): y: int Y's __init__ looks like: def __init__(self, x, y): self.x = x self.y = y That is, all of the initializing for Y and its base classes which are dataclasses is done in Y's __init__. There are a number of reasons for this, including performance and not knowing how to call the base classes' __init__ methods. If Y is frozen, then the __init__ currently looks like: def __init__(self, x, y): object.__setattr__(self, 'x', x) object.__setattr__(self, 'y', y) If X is frozen but Y isn't, then it should look like: def __init__(self, x, y): object.__setattr__(self, 'x', x) self.y = y But I currently can't generate the code that way, because I don't know if a base dataclass is frozen. That information is not saved on a dataclass. I think the right thing to do is to record with each dataclass if it is frozen or not, so that derived classes can generate the correct code. Another option would be to always use object.__setattr__, but that will hurt performance in the common case. As long as I'm saving if a dataclass is frozen, I should save all of the dataclass parameters on the class. Since it's per-class, it's not a lot of overhead. I should probably separate the two issues here: 1) deriving a non-dataclass from a frozen dataclass and 2) deriving a non-frozen dataclass from a frozen dataclass, but since they're somewhat related I'm going to keep them together for the time being. #1 was Raymond's initial report. Eric

On 02/26/2018 05:40 AM, Eric V. Smith wrote:
As long as I'm saving if a dataclass is frozen, I should save all of the dataclass parameters on the class. Since it's per-class, it's not a lot of overhead.
That's what Enum does. A bit of pain, but makes so many other things easier. -- ~Ethan~

On 2/22/2018 1:56 AM, Raymond Hettinger wrote:
I'll provide some background, then get in to the dataclasses design issues. Note that I'm using "field" here in the PEP 557 sense. There are some questions to resolve: 1. What happens when a frozen dataclass inherits from a non-frozen dataclass? 2. What happens when a non-frozen dataclass inherits from a frozen dataclass? 3. What happens when a non-dataclass inherits from a frozen dataclass? 4. Can new non-field attributes be created for frozen dataclasses? I think it's useful to look at what attrs does. Unsurprisingly, attrs works the way the dataclasses implementation in 3.7.0a1 works: - If a frozen attrs class inherits from a non-frozen attrs class, the result is a frozen attrs class. - If a non-frozen attrs class inherits from a frozen attrs class, the result is a frozen attrs class. - For a frozen attrs class, you may not assign to any field, nor create new non-field instance attributes. - If a non-attrs class derives from a frozen attrs class, then you cannot assign to or create any non-field instance attributes. This is because they override the class's __setattr__ to always raise. This is the case that Raymond initially brought up on this thread (but for dataclasses, of course). As I said, this is also how 3.7.0a1 dataclasses also works. The only difference between this and 3.7.0.a2 is that I prohibited inheriting a non-frozen dataclass from a frozen one, and also prohibited the opposite: you can't inherit a frozen dataclass from a non-frozen dataclass. This was just a stop-gap measure to give us more wiggle room for future changes. But this does nothing to address Raymond's concern about non-dataclasses deriving from frozen dataclasses. A last piece of background info on how dataclasses and attrs work: the most derived class implements all of the functionality. They never call in to the base class to do anything. The base classes just exist to provide the list of fields. If frozen dataclasses only exist to protect fields that belong to the hash, then my suggestion is to change the implementation of frozen class to use properties instead of overwriting __setattr__ (Nick also suggested this). This would allow you to create non-field attributes. If the purpose is really to prevent any attributes from being added or modified, then I think __setattr__ should stay but we should change it to allow non-dataclass subclasses to add non-field instance attributes (addressing Raymond's concern). I think we shouldn't allow non-field instance attributes to be added to a frozen dataclass, although if anyone has a strong feeling about it, I'd like to hear it. So, given a frozen dataclass "C" with field names in "field_names", I propose changing __setattr__ to be: def __setattr__(self, name, value): if type(self) is C or name in field_names: raise FrozenInstanceError(f'cannot assign to field {name!r}') super(cls, self).__setattr__(name, value) In the current 3.7.0a2 implementation of frozen dataclasses, __setattr__ always raises. The change is the test and then call to super().__setattr__ if it's a derived class. The result is an exception if either self is an instance of C, or if self is an instance of a derived class, but the attribute being set is a field of C. So going back to original questions above, my suggestions are: 1. What happens when a frozen dataclass inherits from a non-frozen dataclass? The result is a frozen dataclass, and all fields are non-writable. No non-fields can be added. This is a reversion to the 3.7.0a1 behavior. 2. What happens when a non-frozen dataclass inherits from a frozen dataclass? The result is a frozen dataclass, and all fields are non-writable. No non-fields can be added. This is a reversion to the 3.7.0a1 behavior. I'd also be okay with this case being an error, and you'd have to explicitly mark the derived class as frozen. This is the 3.7.0a2 behavior. 3. What happens when a non-dataclass inherits from a frozen dataclass? The fields that are in the dataclass are non-writable, but new non-field attributes can be added. This is new behavior. 4. Can new non-field attributes be created for frozen dataclasses? No. This is existing behavior. I'm hoping this change isn't so large that we can't get it in to 3.7.0a3 next month. Eric

On 28 February 2018 at 10:37, Eric V. Smith <eric@trueblade.com> wrote:
I'm assuming you meant "3.7.0b2" here (and similarly alpha->beta for the other version numbers below)
+1 from me for the variant of this where dataclasses inheriting from frozen data classes must explicitly mark themselves as frozen (at least for now). That way we leave the door open to allowing a variant where a non-frozen dataclass that inherits from a frozen dataclass can set "hash=False" on all of the new fields it adds to avoid becoming frozen itself.
I'm hoping this change isn't so large that we can't get it in to 3.7.0a3 next month.
I think this qualifies as the beta period serving its intended purpose (i.e. reviewing and refining the behaviour of already added features, without allowing completely new features). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 3/1/2018 1:02 AM, Nick Coghlan wrote:
I'm assuming you meant "3.7.0b2" here (and similarly alpha->beta for the other version numbers below)
Oops, yes. Thanks.
I tend to agree. It's not like this is a huge burden, and at least the author is acknowledging that the class will end up frozen.
Thanks. Eric

On 2/22/2018 1:56 AM, Raymond Hettinger wrote:
This is because "frozen-ness" is implemented by adding __setattr__ and __delattr__ methods in D, which get inherited by S.
The only way I can think of emulating this is checking in __setattr__ to see if the field name is a field of the frozen class, and only raising an error in that case. A related issue is that dataclasses derived from frozen dataclasses are automatically "promoted" to being frozen.
Maybe it should be an error to declare B as non-frozen? Eric.

On 22 February 2018 at 10:55, Eric V. Smith <eric@trueblade.com> wrote:
How about checking that the type of self is the type where decorator was applied? For example (pseudocode): def dataclass(cls, ...): def _set_attr(self, attr, value): if type(self) is not cls: use super() else: raise AttributeError cls.__setattr__ = _set_attr It can be also more sophisticated, for example raising for all fields on class where frozen=True was used, while only on frozen fields for subclasses. -- Ivan

On 22 February 2018 at 20:55, Eric V. Smith <eric@trueblade.com> wrote:
If you were going to do that then it would likely make more sense to convert the frozen fields to data descriptors, so __setattr__ only gets called for attempts to add new attributes. Then for the `frozen=False` case, the decorator could force __setattr__ and __delattr__ back to the default implementations from object, rather than relying on the behaviour inherited from base classes.
It would be nice to avoid that, as a mutable subclass of a frozen base class could be a nice way to model hashable-but-mutable types: >>> @dataclass(frozen=True) # This is the immutable/hashable bit ... class A: ... i: int ... >>> @dataclass # This adds the mutable-but-comparable parts ... class B(A): ... j: int ... __hash__ = A.__hash__ However, disallowing this case for now *would* be a reasonable way to postpone making a decision until 3.8 - in the meantime, folks would still be able to experiment by overriding __setattr__ and __delattr__ after the dataclass decorator sets them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2/22/18 9:43 PM, Nick Coghlan wrote:
I guess it depends on what we're trying to do. By using data descriptors, we'd be saying "the fields can't be modified, but other parts of the class can". For example, what should happen here: @dataclass(frozen=True) class F: i: int f = F(10) f.i = 0 # should be an error f.j = 0 # should this be an error? It's not a field. I was hoping to get all of this finished today, by a2, but it looks like that's not going to happen.
For today, I'm going to make it an error to derive a frozen dataclass from a non-frozen dataclass, and also an error to derive a non-frozen dataclass from a frozen dataclass. With any luck (and Guido and Ned willing), we can address this by a3. Eric

I've opened https://bugs.python.org/issue32953 to track this. On 2/22/18 5:55 AM, Eric V. Smith wrote:
...
This is tricky to fix. Here's the problem with a inheriting a non-frozen dataclass from a frozen one. Consider class Y in this example: @dataclass class X: x: int @dataclass class Y(X): y: int Y's __init__ looks like: def __init__(self, x, y): self.x = x self.y = y That is, all of the initializing for Y and its base classes which are dataclasses is done in Y's __init__. There are a number of reasons for this, including performance and not knowing how to call the base classes' __init__ methods. If Y is frozen, then the __init__ currently looks like: def __init__(self, x, y): object.__setattr__(self, 'x', x) object.__setattr__(self, 'y', y) If X is frozen but Y isn't, then it should look like: def __init__(self, x, y): object.__setattr__(self, 'x', x) self.y = y But I currently can't generate the code that way, because I don't know if a base dataclass is frozen. That information is not saved on a dataclass. I think the right thing to do is to record with each dataclass if it is frozen or not, so that derived classes can generate the correct code. Another option would be to always use object.__setattr__, but that will hurt performance in the common case. As long as I'm saving if a dataclass is frozen, I should save all of the dataclass parameters on the class. Since it's per-class, it's not a lot of overhead. I should probably separate the two issues here: 1) deriving a non-dataclass from a frozen dataclass and 2) deriving a non-frozen dataclass from a frozen dataclass, but since they're somewhat related I'm going to keep them together for the time being. #1 was Raymond's initial report. Eric

On 02/26/2018 05:40 AM, Eric V. Smith wrote:
As long as I'm saving if a dataclass is frozen, I should save all of the dataclass parameters on the class. Since it's per-class, it's not a lot of overhead.
That's what Enum does. A bit of pain, but makes so many other things easier. -- ~Ethan~

On 2/22/2018 1:56 AM, Raymond Hettinger wrote:
I'll provide some background, then get in to the dataclasses design issues. Note that I'm using "field" here in the PEP 557 sense. There are some questions to resolve: 1. What happens when a frozen dataclass inherits from a non-frozen dataclass? 2. What happens when a non-frozen dataclass inherits from a frozen dataclass? 3. What happens when a non-dataclass inherits from a frozen dataclass? 4. Can new non-field attributes be created for frozen dataclasses? I think it's useful to look at what attrs does. Unsurprisingly, attrs works the way the dataclasses implementation in 3.7.0a1 works: - If a frozen attrs class inherits from a non-frozen attrs class, the result is a frozen attrs class. - If a non-frozen attrs class inherits from a frozen attrs class, the result is a frozen attrs class. - For a frozen attrs class, you may not assign to any field, nor create new non-field instance attributes. - If a non-attrs class derives from a frozen attrs class, then you cannot assign to or create any non-field instance attributes. This is because they override the class's __setattr__ to always raise. This is the case that Raymond initially brought up on this thread (but for dataclasses, of course). As I said, this is also how 3.7.0a1 dataclasses also works. The only difference between this and 3.7.0.a2 is that I prohibited inheriting a non-frozen dataclass from a frozen one, and also prohibited the opposite: you can't inherit a frozen dataclass from a non-frozen dataclass. This was just a stop-gap measure to give us more wiggle room for future changes. But this does nothing to address Raymond's concern about non-dataclasses deriving from frozen dataclasses. A last piece of background info on how dataclasses and attrs work: the most derived class implements all of the functionality. They never call in to the base class to do anything. The base classes just exist to provide the list of fields. If frozen dataclasses only exist to protect fields that belong to the hash, then my suggestion is to change the implementation of frozen class to use properties instead of overwriting __setattr__ (Nick also suggested this). This would allow you to create non-field attributes. If the purpose is really to prevent any attributes from being added or modified, then I think __setattr__ should stay but we should change it to allow non-dataclass subclasses to add non-field instance attributes (addressing Raymond's concern). I think we shouldn't allow non-field instance attributes to be added to a frozen dataclass, although if anyone has a strong feeling about it, I'd like to hear it. So, given a frozen dataclass "C" with field names in "field_names", I propose changing __setattr__ to be: def __setattr__(self, name, value): if type(self) is C or name in field_names: raise FrozenInstanceError(f'cannot assign to field {name!r}') super(cls, self).__setattr__(name, value) In the current 3.7.0a2 implementation of frozen dataclasses, __setattr__ always raises. The change is the test and then call to super().__setattr__ if it's a derived class. The result is an exception if either self is an instance of C, or if self is an instance of a derived class, but the attribute being set is a field of C. So going back to original questions above, my suggestions are: 1. What happens when a frozen dataclass inherits from a non-frozen dataclass? The result is a frozen dataclass, and all fields are non-writable. No non-fields can be added. This is a reversion to the 3.7.0a1 behavior. 2. What happens when a non-frozen dataclass inherits from a frozen dataclass? The result is a frozen dataclass, and all fields are non-writable. No non-fields can be added. This is a reversion to the 3.7.0a1 behavior. I'd also be okay with this case being an error, and you'd have to explicitly mark the derived class as frozen. This is the 3.7.0a2 behavior. 3. What happens when a non-dataclass inherits from a frozen dataclass? The fields that are in the dataclass are non-writable, but new non-field attributes can be added. This is new behavior. 4. Can new non-field attributes be created for frozen dataclasses? No. This is existing behavior. I'm hoping this change isn't so large that we can't get it in to 3.7.0a3 next month. Eric

On 28 February 2018 at 10:37, Eric V. Smith <eric@trueblade.com> wrote:
I'm assuming you meant "3.7.0b2" here (and similarly alpha->beta for the other version numbers below)
+1 from me for the variant of this where dataclasses inheriting from frozen data classes must explicitly mark themselves as frozen (at least for now). That way we leave the door open to allowing a variant where a non-frozen dataclass that inherits from a frozen dataclass can set "hash=False" on all of the new fields it adds to avoid becoming frozen itself.
I'm hoping this change isn't so large that we can't get it in to 3.7.0a3 next month.
I think this qualifies as the beta period serving its intended purpose (i.e. reviewing and refining the behaviour of already added features, without allowing completely new features). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 3/1/2018 1:02 AM, Nick Coghlan wrote:
I'm assuming you meant "3.7.0b2" here (and similarly alpha->beta for the other version numbers below)
Oops, yes. Thanks.
I tend to agree. It's not like this is a huge burden, and at least the author is acknowledging that the class will end up frozen.
Thanks. Eric
participants (5)
-
Eric V. Smith
-
Ethan Furman
-
Ivan Levkivskyi
-
Nick Coghlan
-
Raymond Hettinger