PEP 681: Descriptor fields with dataclass_transform
![](https://secure.gravatar.com/avatar/be78c7dd640e15b8ec61877b619e8034.jpg?s=120&d=mm&r=g)
We're considering an addition to PEP 681 (dataclass_transform) [1] to better support classes with fields that are descriptors. Setting a new dataclass_transform parameter named "transform_descriptor_types" to True would indicate that __init__ parameters corresponding to descriptor fields have the type of the descriptor's setter value parameter rather than the descriptor type. Here's an example: @dataclass_transform(transform_descriptor_types=True) def decorator() -> Callable[[Type[T]], Type[T]]: ... class Descriptor(Generic[_T]): def __get__(self, instance: object, owner: Any) -> Any: ... def __set__(self, instance: object, value: T | None): ... @decorator class InventoryItem: quantity_on_hand: Descriptor[int] In this case, type checkers would understand that the quantity_on_hand parameter of InventoryItem's __init__ method is of type "int | None" rather than "Descriptor[int]". This change was driven by a need in SQLAlchemy, and we'd appreciate any thoughts on broader applicability. Feedback is welcome either via email or on GitHub! [2] A reference implementation is available in Pyright 1.1.222 and Pylance 2022.2.3. [1] https://www.python.org/dev/peps/pep-0681/ [2] https://github.com/debonte/peps/pull/3/files
![](https://secure.gravatar.com/avatar/57da4d2e2a527026baaaab35e6872fa5.jpg?s=120&d=mm&r=g)
El mar, 22 feb 2022 a las 11:34, Erik De Bonte (<Erik.DeBonte@microsoft.com>) escribió:
We’re considering an addition to PEP 681 (dataclass_transform) [1] to better support classes with fields that are descriptors. Setting a new dataclass_transform parameter named “transform_descriptor_types” to True would indicate that __init__ parameters corresponding to descriptor fields have the type of the descriptor’s setter value parameter rather than the descriptor type.
Here’s an example:
@dataclass_transform(transform_descriptor_types=True)
def decorator() -> Callable[[Type[T]], Type[T]]: ...
class Descriptor(Generic[_T]):
def __get__(self, instance: object, owner: Any) -> Any: ...
def __set__(self, instance: object, value: T | None): ...
@decorator
class InventoryItem:
quantity_on_hand: Descriptor[int]
In this case, type checkers would understand that the quantity_on_hand parameter of InventoryItem’s __init__ method is of type “int | None” rather than “Descriptor[int]”.
This change was driven by a need in SQLAlchemy, and we’d appreciate any thoughts on broader applicability.
Feedback is welcome either via email or on GitHub! [2] A reference implementation is available in Pyright 1.1.222 and Pylance 2022.2.3.
I think this is a useful change and I support including it in PEP 681. I like how it integrates nicely with the descriptor protocol, so it's potentially useful for libraries other than SQLAlchemy too.
![](https://secure.gravatar.com/avatar/b1f36e554be0e1ae19f9a74d6ece9107.jpg?s=120&d=mm&r=g)
Hi Erik, On Tue, Feb 22, 2022 at 12:35 PM Erik De Bonte via Typing-sig <typing-sig@python.org> wrote:
We’re considering an addition to PEP 681 (dataclass_transform) [1] to better support classes with fields that are descriptors. Setting a new dataclass_transform parameter named “transform_descriptor_types” to True would indicate that __init__ parameters corresponding to descriptor fields have the type of the descriptor’s setter value parameter rather than the descriptor type.
Here’s an example:
@dataclass_transform(transform_descriptor_types=True) def decorator() -> Callable[[Type[T]], Type[T]]: ...
class Descriptor(Generic[_T]): def __get__(self, instance: object, owner: Any) -> Any: ... def __set__(self, instance: object, value: T | None): ...
@decorator class InventoryItem: quantity_on_hand: Descriptor[int]
In this case, type checkers would understand that the quantity_on_hand parameter of InventoryItem’s __init__ method is of type “int | None” rather than “Descriptor[int]”.
You only mention the impact on the `__init__` method. Is that the only effect of `transform_descriptor_types`? Or should a typechecker also understand that `instance_of_inventory_item.quantity_on_hand` is of type `Any` (due to the annotation on the `__get__` method of the descriptor type), not of type `Descriptor[int]`? Does that understanding depend on the presence of `transform_descriptor_types=True`? If I write a dataclass without `transform_descriptor_types=True` and give an annotation of `x: SomeTypeImplementingDescriptorProtocol`, can I assign and get instances of that type to the `x` attribute without the typechecker trying to invoke the descriptor protocol? In general, the annotation `x: Descriptor[int]` should only be understood as invoking the descriptor protocol if it is describing the type of the class attribute `x`, not instance attribute `x`. For a non-dataclass-transform type, clearly that annotation on a class attribute should mean the descriptor protocol is invoked, and typecheckers do this: https://mypy-play.net/?mypy=latest&python=3.10&gist=a79442b99b8f79535748912e128d0d8d But dataclass-transform generally means the annotation should be understood as annotating the instance attribute type, not a class attribute. E.g. `x: Final[int] = 3` on a dataclass (at least in the current typechecker support for stdlib dataclasses) does not mean "final int class attribute with value 3," it means "instance attribute of type int with default value 3 that cannot be modified after instantiation." It seems the proposal is for `transform_descriptor_types=True` to modify this such that annotations of descriptor types (only) are understood as class attribute type annotations. I think this is probably pragmatic and useful, but I hope that it does not only impact `__init__`, but also means the descriptor protocol is invoked for attribute gets/sets as well. And I hope the latter does not happen without `transform_descriptor_types=True`, so we can still have descriptor-protocol-implementing types stored as normal instance attributes. (If one wants both behaviors in a single dataclass, I guess one is just out of luck. Maybe this should be something you specify in a field descriptor rather than globally for the entire dataclass?) Carl
![](https://secure.gravatar.com/avatar/be78c7dd640e15b8ec61877b619e8034.jpg?s=120&d=mm&r=g)
Thanks Carl. What you're saying makes sense, and I agree that we should make this change. Eric Traut suggested to me that in addition to your suggestion, we should change transform_descriptor_types to mean that *all* fields are treated as class variables, rather than just having special treatment for descriptor fields. We'd rename transform_descriptor_types to something else if we made this change of course. Maybe class_variable_fields or class_variables.
Maybe this should be something you specify in a field descriptor rather than globally for the entire dataclass?
That's a reasonable suggestion. However, we're not aware of any libraries requiring per-field control of this behavior. And if we only support this behavior via field descriptors (not class-wide), I believe it will make life more awkward for libraries -- they wouldn't want users to need to set this parameter explicitly, so it would need to be inferred via overloads. Thoughts? -Erik -----Original Message----- From: Carl Meyer <carl@oddbird.net> Sent: Monday, February 28, 2022 1:09 PM To: Erik De Bonte <Erik.DeBonte@microsoft.com> Cc: typing-sig@python.org; Eric Traut <erictr@microsoft.com>; Jelle Zijlstra <jelle.zijlstra@gmail.com>; mike_mp@zzzcomputing.com Subject: Re: [Typing-sig] PEP 681: Descriptor fields with dataclass_transform [You don't often get email from carl@oddbird.net. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] Hi Erik, On Tue, Feb 22, 2022 at 12:35 PM Erik De Bonte via Typing-sig <typing-sig@python.org> wrote:
We're considering an addition to PEP 681 (dataclass_transform) [1] to better support classes with fields that are descriptors. Setting a new dataclass_transform parameter named "transform_descriptor_types" to True would indicate that __init__ parameters corresponding to descriptor fields have the type of the descriptor's setter value parameter rather than the descriptor type.
Here's an example:
@dataclass_transform(transform_descriptor_types=True) def decorator() -> Callable[[Type[T]], Type[T]]: ...
class Descriptor(Generic[_T]): def __get__(self, instance: object, owner: Any) -> Any: ... def __set__(self, instance: object, value: T | None): ...
@decorator class InventoryItem: quantity_on_hand: Descriptor[int]
In this case, type checkers would understand that the quantity_on_hand parameter of InventoryItem's __init__ method is of type "int | None" rather than "Descriptor[int]".
You only mention the impact on the `__init__` method. Is that the only effect of `transform_descriptor_types`? Or should a typechecker also understand that `instance_of_inventory_item.quantity_on_hand` is of type `Any` (due to the annotation on the `__get__` method of the descriptor type), not of type `Descriptor[int]`? Does that understanding depend on the presence of `transform_descriptor_types=True`? If I write a dataclass without `transform_descriptor_types=True` and give an annotation of `x: SomeTypeImplementingDescriptorProtocol`, can I assign and get instances of that type to the `x` attribute without the typechecker trying to invoke the descriptor protocol? In general, the annotation `x: Descriptor[int]` should only be understood as invoking the descriptor protocol if it is describing the type of the class attribute `x`, not instance attribute `x`. For a non-dataclass-transform type, clearly that annotation on a class attribute should mean the descriptor protocol is invoked, and typecheckers do this: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmypy-play.... But dataclass-transform generally means the annotation should be understood as annotating the instance attribute type, not a class attribute. E.g. `x: Final[int] = 3` on a dataclass (at least in the current typechecker support for stdlib dataclasses) does not mean "final int class attribute with value 3," it means "instance attribute of type int with default value 3 that cannot be modified after instantiation." It seems the proposal is for `transform_descriptor_types=True` to modify this such that annotations of descriptor types (only) are understood as class attribute type annotations. I think this is probably pragmatic and useful, but I hope that it does not only impact `__init__`, but also means the descriptor protocol is invoked for attribute gets/sets as well. And I hope the latter does not happen without `transform_descriptor_types=True`, so we can still have descriptor-protocol-implementing types stored as normal instance attributes. (If one wants both behaviors in a single dataclass, I guess one is just out of luck. Maybe this should be something you specify in a field descriptor rather than globally for the entire dataclass?) Carl
![](https://secure.gravatar.com/avatar/b1f36e554be0e1ae19f9a74d6ece9107.jpg?s=120&d=mm&r=g)
Hi Erik, On Fri, Mar 4, 2022 at 4:04 PM Erik De Bonte <Erik.DeBonte@microsoft.com> wrote:
Eric Traut suggested to me that in addition to your suggestion, we should change transform_descriptor_types to mean that *all* fields are treated as class variables, rather than just having special treatment for descriptor fields. We'd rename transform_descriptor_types to something else if we made this change of course. Maybe class_variable_fields or class_variables.
That's interesting. It seems more consistent/principled, which I like, but I don't have an immediate intuition about the practical usefulness. In the common case of `x: int` this would make no difference; that always describes an instance variable. Similarly for `x: int = 3` which always describes an instance variable with default, and `x: ClassVar[int] = 3`, which always describes a class variable (and thus should be excluded from the dataclass transform.) It seems like the cases where it would make a difference are descriptor types and the interpretation of `Final` (i.e. whether or not `x: Final[int] = 3` implies `ClassVar`) -- are there other cases where it would matter? As long as the descriptor behavior is consistent as described in my last mail, I guess I don't have strong feelings for or against this extended proposal. I think it will be hard to name well. A name including `class_variable(s)` is potentially misleading, as it seems to imply all attributes are implicitly ClassVar or something, which is not the case. The real distinction is a bit subtle and we don't have a good term for it. Something like `assume_instance_level_annotations=False` maybe.
Maybe this should be something you specify in a field descriptor rather than globally for the entire dataclass?
That's a reasonable suggestion. However, we're not aware of any libraries requiring per-field control of this behavior. And if we only support this behavior via field descriptors (not class-wide), I believe it will make life more awkward for libraries -- they wouldn't want users to need to set this parameter explicitly, so it would need to be inferred via overloads.
That makes sense. I retract that suggestion; this is something that should be defined by the library providing dataclass-like behavior (since that library will be providing the actual runtime behavior), not by the end user of the library. Carl
![](https://secure.gravatar.com/avatar/be78c7dd640e15b8ec61877b619e8034.jpg?s=120&d=mm&r=g)
...change transform_descriptor_types to mean that *all* fields are treated as class variables...
As long as the descriptor behavior is consistent as described in my last mail, I guess I don't have strong feelings for or against this extended proposal. I think it will be hard to name well.
Thanks Carl. I'm proposing "delete_class_attributes" as the new name. It would default to True (dataclass behavior), but when set to False all class attributes would be retained. There's a pending PR for this change at https://github.com/python/peps/pull/2455. It also clarifies that descriptor fields will support the "standard behavior expected of descriptors". I think that's a sufficient level of detail, but if you disagree I can add some more specifics about getter/setter types. -Erik -----Original Message----- From: Carl Meyer <carl@oddbird.net> Sent: Saturday, March 5, 2022 9:33 PM To: Erik De Bonte <Erik.DeBonte@microsoft.com> Cc: typing-sig@python.org; Eric Traut <erictr@microsoft.com>; Jelle Zijlstra <jelle.zijlstra@gmail.com>; mike_mp@zzzcomputing.com Subject: Re: [Typing-sig] PEP 681: Descriptor fields with dataclass_transform [You don't often get email from carl@oddbird.net. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] Hi Erik, On Fri, Mar 4, 2022 at 4:04 PM Erik De Bonte <Erik.DeBonte@microsoft.com> wrote:
Eric Traut suggested to me that in addition to your suggestion, we should change transform_descriptor_types to mean that *all* fields are treated as class variables, rather than just having special treatment for descriptor fields. We'd rename transform_descriptor_types to something else if we made this change of course. Maybe class_variable_fields or class_variables.
That's interesting. It seems more consistent/principled, which I like, but I don't have an immediate intuition about the practical usefulness. In the common case of `x: int` this would make no difference; that always describes an instance variable. Similarly for `x: int = 3` which always describes an instance variable with default, and `x: ClassVar[int] = 3`, which always describes a class variable (and thus should be excluded from the dataclass transform.) It seems like the cases where it would make a difference are descriptor types and the interpretation of `Final` (i.e. whether or not `x: Final[int] = 3` implies `ClassVar`) -- are there other cases where it would matter? As long as the descriptor behavior is consistent as described in my last mail, I guess I don't have strong feelings for or against this extended proposal. I think it will be hard to name well. A name including `class_variable(s)` is potentially misleading, as it seems to imply all attributes are implicitly ClassVar or something, which is not the case. The real distinction is a bit subtle and we don't have a good term for it. Something like `assume_instance_level_annotations=False` maybe.
Maybe this should be something you specify in a field descriptor rather than globally for the entire dataclass?
That's a reasonable suggestion. However, we're not aware of any libraries requiring per-field control of this behavior. And if we only support this behavior via field descriptors (not class-wide), I believe it will make life more awkward for libraries -- they wouldn't want users to need to set this parameter explicitly, so it would need to be inferred via overloads.
That makes sense. I retract that suggestion; this is something that should be defined by the library providing dataclass-like behavior (since that library will be providing the actual runtime behavior), not by the end user of the library. Carl
![](https://secure.gravatar.com/avatar/b1f36e554be0e1ae19f9a74d6ece9107.jpg?s=120&d=mm&r=g)
Hi Erik, Thanks for the reply, and for all your work on the PEP (which I'm strongly in favor of.) Unfortunately I don't think `delete_class_attributes` is a suitable name for this feature, and after exploring a bit more deeply I'm also no longer clear what use case this feature is intended to serve that isn't already served by the current behavior of dataclasses. Observe the current runtime behavior (in Python 3.10) of a dataclass with a descriptor field: ```
class Descriptor: ... def __get__(self, instance: Any, owner: object) -> int: ... return instance._x ... def __set__(self, instance: Any, value: float) -> None: ... instance._x = int(value) ... @dataclass ... class F: ... x: Descriptor = Descriptor() ... F.x Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in __get__ AttributeError: 'NoneType' object has no attribute '_x' F().x Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: F.__init__() missing 1 required positional argument: 'x' F(x=3.1).x 3 F(x=5.2).x 5
So at runtime, a descriptor-typed field in a dataclass already a)
keeps the descriptor around on the class, b) invokes the descriptor
protocol on both class-level and instance-level attribute access, and
c) takes the value passed to `__init__` for the field and passes it to
the descriptor's __set__ method. So it seems like dataclasses already
implements precisely the behavior desired by this "new" feature, by
default! It's just the type checkers that treat this in a way
inconsistent with the runtime behavior. Both mypy and pyright expect
the `__init__` argument `x` here to be an instance of the descriptor
type itself, which is just wrong, given the actual runtime behavior.
The runtime behavior totally changes if we have `x: Descriptor` (no
"default value") instead of `x: Descriptor = Descriptor()`. In the
former case, dataclasses doesn't know or care about the annotated type
or the fact that it's a descriptor type, and there is no runtime
descriptor to attach to the class at all, so we get normal
descriptor-less runtime behavior.
My conclusion from this is that most likely nobody ever thought very
hard about how dataclasses should work with descriptor-typed fields,
and the runtime behavior we get is simply what falls out naturally
from the way dataclasses handles field default values (i.e. they are
preserved as class attributes, if present.)
From a type-checking perspective, it feels natural that `x:
Descriptor` and `x: Descriptor = Descriptor()` ought to specify the
same typing for the `x` __init__ arg and attribute, since in typing we
are used to the idea that `x: Foo` specifies the same type for `x`
regardless of presence or absence of an assigned value. But
practically it's hard to see the use case for `x: Descriptor` -- if no
actual descriptor object is attached to the class, then where does one
expect the runtime descriptor behavior to come from? Did the
SQLAlchemy use case that motivated this change require `x: Descriptor`
to behave as a descriptor, or only `x: Descriptor = Descriptor()`? If
the former, how does that even work at runtime? If we somehow want to
make `x: Descriptor` and `x: Descriptor = Descriptor()` equivalent,
then I don't think we can avoid the need to propose and make changes
to the runtime dataclasses behavior, and it's not clear to me what
kind of change would even be workable: there is no feasible way in the
case of `x: Descriptor` for dataclasses at runtime to reliably know
that `Descriptor` is a descriptor type, and there's no reasonable way
for it to conjure a descriptor object into existence where none was
given.
If, on the other hand, the PEP is not intending to propose changes to
the runtime behavior of dataclasses, this suggests that a) no new
feature or dataclass_transform argument is needed in the PEP to serve
the descriptor use case, b) the current runtime behavior of
dataclasses (including the difference in behavior between `x:
Descriptor` and `x: Descriptor = Descriptor()`) should be specified as
the default behavior of dataclass_transform with descriptor field
types, and c) type-checkers should fix their dataclass handling to
make their typecheck for descriptor fields match the already-existing
runtime behavior.
I think my objection to the name `delete_class_attributes` is moot
given the above, but just for completeness I'll offer it anyway. The
_only_ case in which dataclasses ever deletes any class attribute is
in the case where the "default value" for a field is a Field object.
In this case the Field class attribute is replaced with the default
value specified in the Field object, if any, or is deleted if none. In
all other cases, dataclasses doesn't mess with class attributes at
all. In the case of `x: Foo = Foo()` it leaves that `Foo()` instance
as the class attribute; in the case of `x: Foo` there is no class
attribute to begin with, so nothing is deleted. So it doesn't make
sense for `delete_class_attributes=False` to imply different treatment
of either `x: Foo` or `x: Foo = Foo()`, since in neither of those
cases would dataclasses ever do anything that could be described as
deleting a class attribute.
Carl
![](https://secure.gravatar.com/avatar/3247bfd67c8fa4c4a08b8d30e49eab39.jpg?s=120&d=mm&r=g)
Thanks Carl. After reading your post and writing some sample code to confirm your findings, I think your analysis is spot on. It would appear that "delete_class_attributes" is not needed, which is great news. I've filed the following bug to remind myself to fix pyright's behavior for descriptors in dataclass fields so the type checking behavior matches the runtime behavior: https://github.com/microsoft/pyright/issues/3245. A similar fix will be needed in mypy (and perhaps pyre and pytype too). -- Eric Traut Contributor to Pyright & Pylance Microsoft
![](https://secure.gravatar.com/avatar/be78c7dd640e15b8ec61877b619e8034.jpg?s=120&d=mm&r=g)
Thanks Carl! I'll modify the PEP to move the descriptor-typed field support to Rejected Ideas -- https://github.com/python/peps/pull/2477 FYI, while experimenting with the runtime behavior of dataclass, I noticed something weird about descriptor-typed fields and default values. It's more an issue for dataclass than dataclass_transform, but I thought it was worth mentioning. Here's the sample code you provided: ``` class Descriptor: def __get__(self, instance: Any, owner: object) -> int: return instance._x def __set__(self, instance: Any, value: float) -> None: instance._x = int(value) @dataclass class F: x: Descriptor = Descriptor() ``` Given that code, I believe that a type checker's view of F's __init__ would be: ``` def __init__(self, x: int = Descriptor()) ``` But this is incorrect since, as you pointed out, x doesn't have a default value at runtime. F() fails at runtime with "F.__init__() missing 1 required positional argument: 'x'" I also noticed that when executing the code for F, __get__ is called with a None instance, and surprisingly __get__'s return value in that case is used as the field's default value. So if I change __get__ to the following, F() succeeds and initializes x to 100. ``` def __get__(self, instance: Any, owner: object) -> int: if instance is None: return 100 return instance._x ``` I couldn't find this behavior documented anywhere though, and I wouldn't expect a type checker to analyze the code in __get__ to find this default value. So I'm not sure how type checkers should handle this situation. With dataclass_transform, the library author can make this scenario work by adding a "default" parameter on Descriptor's constructor and adding Descriptor to field_descriptors. But I'm not sure if there's a good solution for dataclass or if it's even worth fixing. -Erik
![](https://secure.gravatar.com/avatar/b1f36e554be0e1ae19f9a74d6ece9107.jpg?s=120&d=mm&r=g)
On Mon, Mar 28, 2022 at 4:37 PM Erik De Bonte <Erik.DeBonte@microsoft.com> wrote:
I'll modify the PEP to move the descriptor-typed field support to Rejected Ideas -- https://github.com/python/peps/pull/2477
Looks good, thank you!
FYI, while experimenting with the runtime behavior of dataclass, I noticed something weird about descriptor-typed fields and default values. It's more an issue for dataclass than dataclass_transform, but I thought it was worth mentioning.
Here's the sample code you provided:
``` class Descriptor: def __get__(self, instance: Any, owner: object) -> int: return instance._x
def __set__(self, instance: Any, value: float) -> None: instance._x = int(value)
@dataclass class F: x: Descriptor = Descriptor() ```
Given that code, I believe that a type checker's view of F's __init__ would be:
``` def __init__(self, x: int = Descriptor()) ```
But this is incorrect since, as you pointed out, x doesn't have a default value at runtime. F() fails at runtime with "F.__init__() missing 1 required positional argument: 'x'"
I also noticed that when executing the code for F, __get__ is called with a None instance, and surprisingly __get__'s return value in that case is used as the field's default value. So if I change __get__ to the following, F() succeeds and initializes x to 100.
``` def __get__(self, instance: Any, owner: object) -> int: if instance is None: return 100
return instance._x ```
I couldn't find this behavior documented anywhere though, and I wouldn't expect a type checker to analyze the code in __get__ to find this default value. So I'm not sure how type checkers should handle this situation.
Yeah, good catch. My example code works in a strange and rather accidental way; because my __get__ method raises AttributeError when `instance` is None, and dataclasses uses `getattr(cls, attrname, MISSING)` to get the default value from the class attribute, which silences AttributeError to `MISSING`, thus no default. I think we should probably discard this "__get__ raises AttributeError" case as an unusual edge case that type-checkers can't be expected to detect and handle.
With dataclass_transform, the library author can make this scenario work by adding a "default" parameter on Descriptor's constructor and adding Descriptor to field_descriptors. But I'm not sure if there's a good solution for dataclass or if it's even worth fixing.
I think the dataclass runtime behavior (despite not having been intentionally designed :) ) is roughly pretty reasonable here: if you put a descriptor object (that is not a "field descriptor") in the "default value" position, that descriptor is attached to the class, and attribute sets should expect the type taken by __set__, and attribute gets should expect the type returned from __get__. I think it's also reasonable that the default value for instances is whatever is returned by __get__ when `instance` is None. This implies typecheckers should assume there is a default value and it is of the type returned by __get__. (That only requires looking at the return annotation of __get__, not at its implementation.) It also implies that it should be a type error if the return type of __get__ is not assignable to the type taken by __set__; I think this is also reasonable. (The example above, for instance, doesn't violate this, since int can be assigned to float.) I think it would be reasonable for PEP 681 to either specify all of the above, or else explicitly disallow descriptor "default values". (Obviously the latter would not make SQLAlchemy happy.) This discussion has also raised two other questions/issues for me about PEP 681. 1) Since "descriptor" already has a strong meaning in Python, which is unrelated to what PEP 681 currently calls "field descriptors," I think it might be advisable to rename the PEP 681 "field descriptor" concept to a different name. I suggest "field definition type" or "field specifier." 2) Should PEP 681 explicitly specify that dataclass_transform also has the effect of keeping field default values as class attributes on the class (as dataclass does) and placing them there if the default was specified explicitly via a "field descriptor"? I think the PEP should specify this; typecheckers already support this for dataclasses (see e.g. https://mypy-play.net/?mypy=latest&python=3.10&gist=278cc8cba9bf1473af6c1168cfb2f1bem and https://mypy-play.net/?mypy=latest&python=3.10&gist=827f5de7594243817912709013d3e658) and I'm sure there is code relying on it, so if dataclass_transform does not specify this, it would be a regression for typecheckers to replace their special-cased dataclass support with dataclass_transform. And the behavior specified above for descriptor "default values" really only makes sense if this is the case. This is probably just another bullet point to clarify under "assume dataclass semantics" here: https://peps.python.org/pep-0681/#dataclass-semantics Carl
![](https://secure.gravatar.com/avatar/be78c7dd640e15b8ec61877b619e8034.jpg?s=120&d=mm&r=g)
Thanks Carl. FYI, the PEP was submitted to the Steering Council on Friday for consideration in Python 3.11. https://github.com/python/steering-council/issues/117
This implies typecheckers should assume there is a default value and it is of the type returned by __get__. (That only requires looking at the return annotation of __get__, not at its implementation.)
Regarding looking at the implementation of __get__, I misspoke. Rather than type checkers, I was thinking about language servers which would likely want to know not just the default value's type, but the actual value.
I think it would be reasonable for PEP 681 to either specify all of the above... ... Should PEP 681 explicitly specify that dataclass_transform also has the effect of keeping field default values as class attributes on the class (as dataclass does) and placing them there if the default was specified explicitly via a "field descriptor"?
I recently added the following language to the Dataclass Semantics section of the PEP which I think makes explicit mention of these dataclass behaviors unnecessary: "Except where stated otherwise in this PEP, classes impacted by dataclass_transform, either by inheriting from a class that is decorated with dataclass_transform or by being decorated with a function decorated with dataclass_transform, are assumed to behave like stdlib dataclass." Also, FYI, I filed a bugs.python.org issue last week suggesting that we document the behavior of descriptor-typed fields on dataclasses. https://bugs.python.org/issue47174 I think it would be better to document these specifics in dataclass docs and/or unit tests rather than PEP 681.
advisable to rename the PEP 681 "field descriptor" concept...I suggest "field definition type" or "field specifier."
Good point. I emailed Jelle to see if he thinks I should change this while the Steering Council is considering the PEP.
You can't have `Descriptor(default=3)` and list `Descriptor` as a `field_descriptor` and expect to have its __get__ and __set__ respected, because in this case the class attribute is `3` and there is no runtime descriptor object used.
Requiring that field descriptors behave like dataclass.Field is not something I had considered before and I don't think the PEP is explicit about it at the moment. It's not even clear to me that the "assumed to behave like stdlib dataclass" sentence in the Dataclass Semantics section implies this. If we want this to be true, I think we'll need to add a dataclass_transform parameter to allow libraries to disable this "default value" behavior. Otherwise field descriptor types would need to choose between supporting default values and having their __get__/__set__ methods called reliably. Btw, I believe you sent your follow-up message to me alone, instead of to typing-sig. I've included it below for future posterity. -Erik ------------------------------------------------------------------ I realized one more thing after sending: On Mon, Mar 28, 2022 at 4:37 PM Erik De Bonte <Erik.DeBonte@microsoft.com> wrote:
With dataclass_transform, the library author can make this scenario work by adding a "default" parameter on Descriptor's constructor and adding Descriptor to field_descriptors. But I'm not sure if there's a good solution for dataclass or if it's even worth fixing.
Given my point (2) above about field default values being placed onto the class as the class attribute of that name, I think it's important to clarify that what you describe here should _not_ work. If Descriptor is a "field descriptor" type, then the value passed to its "default" argument should be the thing that ends up as the class attribute. And if this is the case, the Descriptor instance is not attached to the class, and its __get__ and __set__ should be ignored -- the fact that it's a Python descriptor becomes irrelevant. You should only get real descriptor behavior when using a "field descriptor" if the object passed to the field descriptor's "default" argument is _itself_ a descriptor object. E.g. `field(default=Descriptor())`. You can't have `Descriptor(default=3)` and list `Descriptor` as a `field_descriptor` and expect to have its __get__ and __set__ respected, because in this case the class attribute is `3` and there is no runtime descriptor object used. Carl
![](https://secure.gravatar.com/avatar/b1f36e554be0e1ae19f9a74d6ece9107.jpg?s=120&d=mm&r=g)
Hi Erik, On Mon, Apr 4, 2022 at 6:38 PM Erik De Bonte <Erik.DeBonte@microsoft.com> wrote:
FYI, the PEP was submitted to the Steering Council on Friday for consideration in Python 3.11. https://github.com/python/steering-council/issues/117
Great!
I recently added the following language to the Dataclass Semantics section of the PEP which I think makes explicit mention of these dataclass behaviors unnecessary:
"Except where stated otherwise in this PEP, classes impacted by dataclass_transform, either by inheriting from a class that is decorated with dataclass_transform or by being decorated with a function decorated with dataclass_transform, are assumed to behave like stdlib dataclass."
Yep, I think that covers it.
advisable to rename the PEP 681 "field descriptor" concept...I suggest "field definition type" or "field specifier."
Good point. I emailed Jelle to see if he thinks I should change this while the Steering Council is considering the PEP.
I saw that you went ahead and made this change; looks good!
You can't have `Descriptor(default=3)` and list `Descriptor` as a `field_descriptor` and expect to have its __get__ and __set__ respected, because in this case the class attribute is `3` and there is no runtime descriptor object used.
Requiring that field descriptors behave like dataclass.Field is not something I had considered before
By "behave like dataclasses.field" here you mean specifically the behavior that the field default value is set as the class attribute for the field name? I.e. that when you have @dataclasses.dataclass class F: x: int = 3 y: str = dataclasses.field(default="foo") that `F.x` is `3` and `F."y"` is `"foo"`? And the alternative behavior for some hypothetical dataclasses alternative would be that `F."y"` is instead the `field()` instance?
and I don't think the PEP is explicit about it at the moment. It's not even clear to me that the "assumed to behave like stdlib dataclass" sentence in the Dataclass Semantics section implies this.
I do think this behavior (that field defaults are accessible as class attributes) is important and dataclasses-using code likely relies on it (and the existing typecheckers already implement it in their dataclass handling.) I don't see any reason why a reader would not assume this behavior to be covered by the "assumed to behave like stdlib dataclass" clause in the PEP, barring explicit treatment of the topic.
If we want this to be true, I think we'll need to add a dataclass_transform parameter to allow libraries to disable this "default value" behavior. Otherwise field descriptor types would need to choose between supporting default values and having their __get__/__set__ methods called reliably.
I guess you're right, it might be attractive for some dataclass-like library to use a field specifier type that is also a descriptor, and in this case they would need the field specifier left as the class attribute, even if a default value were given. I think the best way to handle this for PEP 681 wouldn't require an additional dataclass_transform parameter. It would be to state the requirement only in terms of the results of class attribute access, not the exact runtime behavior of what goes in the class dict. That is, accessing the class attribute should return the default value, but the details of how this is done are unspecified. In my example above, that means PEP 681 would require that `F.y` return the default value `"foo"`, but at runtime the library could satisfy this requirement either by having `F.__dict__["y"]` actually be set to `"foo"` (as dataclasses does) OR by having the field specifier be a descriptor whose `__get__` returns `"foo"` when `instance` argument is `None`, meaning that in either case `F.y == "foo"`. I think this allows either natural runtime implementation approach, while forbidding the thing we probably don't want to allow from a typing perspective: having `F.y` actually return a field specifier instance instead of the default value. I think this might be subtle enough to deserve specific treatment in the PEP, but I'll leave that up to you and Jelle and the SC :)
Btw, I believe you sent your follow-up message to me alone, instead of to typing-sig. I've included it below for future posterity.
Oops, thanks! Carl
![](https://secure.gravatar.com/avatar/be78c7dd640e15b8ec61877b619e8034.jpg?s=120&d=mm&r=g)
And the alternative behavior for some hypothetical dataclasses alternative would be that `F."y"` is instead the `field()` instance?
Right. That's what SQLAlchemy is expecting.
I don't see any reason why a reader would not assume this behavior to be covered by the "assumed to behave like stdlib dataclass" clause in the PEP
I was thinking that readers might believe that the class attribute behavior is specific to Field objects and therefore custom field specifiers would not be required to behave the same way.
That is, accessing the class attribute should return the default value, but the details of how this is done are unspecified
That's clever and seems reasonable. However, I believe that SQLAlchemy relies on accesses to the class attribute returning the actual descriptor. I've reached out to them to confirm. -Erik
![](https://secure.gravatar.com/avatar/2828041405aa313004b6549acf918228.jpg?s=120&d=mm&r=g)
On 3/23/2022 1:13 AM, Carl Meyer via Typing-sig wrote:
My conclusion from this is that most likely nobody ever thought very hard about how dataclasses should work with descriptor-typed fields, and the runtime behavior we get is simply what falls out naturally from the way dataclasses handles field default values (i.e. they are preserved as class attributes, if present.)
I can assure you that this is a true statement! Eric
participants (5)
-
Carl Meyer
-
Eric Traut
-
Eric V. Smith
-
Erik De Bonte
-
Jelle Zijlstra