Issues with PEP 526 Variable Notation at the class level

Both typing.NamedTuple and dataclasses.dataclass use the somewhat beautiful PEP 526 variable notations at the class level: @dataclasses.dataclass class Color: hue: int saturation: float lightness: float = 0.5 and class Color(typing.NamedTuple): hue: int saturation: float lightness: float = 0.5 I'm looking for guidance or workarounds for two issues that have arisen. First, the use of default values seems to completely preclude the use of __slots__. For example, this raises a ValueError: class A: __slots__ = ['x', 'y'] x: int = 10 y: int = 20 The second issue is that the different annotations give different signatures than would produced for manually written classes. It is unclear what the best practice is for where to put the annotations and their associated docstrings. In Pydoc for example, this class: class A: 'Class docstring. x is distance in miles' x: int y: int gives a different signature and docstring than for this class: class A: 'Class docstring' def __init__(self, x: int, y: int): 'x is distance in kilometers' pass or for this class: class A: 'Class docstring' def __new__(cls, x: int, y: int) -> A: '''x is distance in inches A is a singleton (once instance per x,y) ''' if (x, y) in cache: return cache[x, y] return object.__new__(cls, x, y) The distinction is important because the dataclass decorator allows you to suppress the generation of __init__ when you need more control than dataclass offers or when you need a __new__ method. I'm unclear on where the docstring and signature for the class is supposed to go so that we get useful signatures and matching docstrings.

On 12/7/17 3:27 PM, Raymond Hettinger wrote: ...
I'm looking for guidance or workarounds for two issues that have arisen.
First, the use of default values seems to completely preclude the use of __slots__. For example, this raises a ValueError:
class A: __slots__ = ['x', 'y'] x: int = 10 y: int = 20
Hmm, I wasn't aware of that. I'm not sure I understand why that's an error. Maybe it could be fixed? Otherwise, I have a decorator that takes a dataclass and returns a new class with slots set:
from dataclasses import dataclass from dataclass_tools import add_slots @add_slots ... @dataclass ... class C: ... x: int = 0 ... y: int = 0 ... c = C() c C(x=0, y=0) c.z = 3 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'C' object has no attribute 'z'
This doesn't help the general case (your class A), but it does at least solve it for dataclasses. Whether it should be actually included, and what the interface would look like, can be (and I'm sure will be!) argued. The reason I didn't include it (as @dataclass(slots=True)) is because it has to return a new class, and the rest of the dataclass features just modifies the given class in place. I wanted to maintain that conceptual simplicity. But this might be a reason to abandon that. For what it's worth, attrs does have an @attr.s(slots=True) that returns a new class with __slots__ set.
The second issue is that the different annotations give different signatures than would produced for manually written classes. It is unclear what the best practice is for where to put the annotations and their associated docstrings.
I don't have any suggestions here. Eric.

Yes, I think this is a reasonable argument for adding a 'slots' option (off by default) for @dataclass(). However I don't think we need to rush it in. I'm not very happy with the general idea of slots any more, and I think that it's probably being overused, and at the same time I expect that there are a lot of classes with a slots declaration that still have a dict as well, because they inherit from a class without slots. I'm not sure what to do about docstrings -- I'm not a big user of pydoc and I find help() often too verbose (I usually read the source. Maybe we could add a 'doc' option to field()? That's similar to what we offer for property(). On Thu, Dec 7, 2017 at 12:47 PM, Eric V. Smith <eric@trueblade.com> wrote:
On 12/7/17 3:27 PM, Raymond Hettinger wrote: ...
I'm looking for guidance or workarounds for two issues that have arisen.
First, the use of default values seems to completely preclude the use of __slots__. For example, this raises a ValueError:
class A: __slots__ = ['x', 'y'] x: int = 10 y: int = 20
Hmm, I wasn't aware of that. I'm not sure I understand why that's an error. Maybe it could be fixed?
Otherwise, I have a decorator that takes a dataclass and returns a new class with slots set:
from dataclasses import dataclass from dataclass_tools import add_slots @add_slots ... @dataclass ... class C: ... x: int = 0 ... y: int = 0 ... c = C() c C(x=0, y=0) c.z = 3 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'C' object has no attribute 'z'
This doesn't help the general case (your class A), but it does at least solve it for dataclasses. Whether it should be actually included, and what the interface would look like, can be (and I'm sure will be!) argued.
The reason I didn't include it (as @dataclass(slots=True)) is because it has to return a new class, and the rest of the dataclass features just modifies the given class in place. I wanted to maintain that conceptual simplicity. But this might be a reason to abandon that. For what it's worth, attrs does have an @attr.s(slots=True) that returns a new class with __slots__ set.
The second issue is that the different annotations give different
signatures than would produced for manually written classes. It is unclear what the best practice is for where to put the annotations and their associated docstrings.
I don't have any suggestions here.
Eric.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% 40python.org
-- --Guido van Rossum (python.org/~guido)

On Dec 7, 2017, at 12:47 PM, Eric V. Smith <eric@trueblade.com> wrote:
On 12/7/17 3:27 PM, Raymond Hettinger wrote: ...
I'm looking for guidance or workarounds for two issues that have arisen.
First, the use of default values seems to completely preclude the use of __slots__. For example, this raises a ValueError:
class A: __slots__ = ['x', 'y'] x: int = 10 y: int = 20
Hmm, I wasn't aware of that. I'm not sure I understand why that's an error. Maybe it could be fixed?
The way __slots__ works is that the type() metaclass automatically assigns member-objects to the class variables 'x' and 'y'. Member objects are descriptors that do the actual lookup. So, I don't think the language limitation can be "fixed". Essentially, we're wanting to use the class variables 'x' and 'y' to hold both member objects and a default value.
This doesn't help the general case (your class A), but it does at least solve it for dataclasses. Whether it should be actually included, and what the interface would look like, can be (and I'm sure will be!) argued.
The reason I didn't include it (as @dataclass(slots=True)) is because it has to return a new class, and the rest of the dataclass features just modifies the given class in place. I wanted to maintain that conceptual simplicity. But this might be a reason to abandon that. For what it's worth, attrs does have an @attr.s(slots=True) that returns a new class with __slots__ set.
I recommend that you follow the path taken by attrs and return a new class. Otherwise, we're leaving users with a devil's choice. You can have default values or you can have slots, but you can't have both. The slots are pretty important. With slots, a three attribute instance is only 64 bytes. Without slots, it is 296 bytes.
The second issue is that the different annotations give different signatures than would produced for manually written classes. It is unclear what the best practice is for where to put the annotations and their associated docstrings.
I don't have any suggestions here.
I'm hoping the typing experts will chime in here. The question is straight-forward. Where should we look for the signature and docstring for constructing instances? Should they be attached to the class, to __init__(), or to __new__() when it used. It would be nice to have an official position on that before, it gets set in stone through arbitrary choices made by pycharm, pydoc, mypy, typing.NamedTuple, and dataclasses.dataclass. Raymond

On 12/8/2017 1:28 PM, Raymond Hettinger wrote:
On Dec 7, 2017, at 12:47 PM, Eric V. Smith <eric@trueblade.com> wrote:
On 12/7/17 3:27 PM, Raymond Hettinger wrote: ...
I'm looking for guidance or workarounds for two issues that have arisen.
First, the use of default values seems to completely preclude the use of __slots__. For example, this raises a ValueError:
class A: __slots__ = ['x', 'y'] x: int = 10 y: int = 20
Hmm, I wasn't aware of that. I'm not sure I understand why that's an error. Maybe it could be fixed?
The way __slots__ works is that the type() metaclass automatically assigns member-objects to the class variables 'x' and 'y'. Member objects are descriptors that do the actual lookup.
So, I don't think the language limitation can be "fixed". Essentially, we're wanting to use the class variables 'x' and 'y' to hold both member objects and a default value.
Thanks. I figured this out after doing some research. Here's a thread "__slots__ and default values" from 14+ years ago from some guy named Hettinger: https://mail.python.org/pipermail/python-dev/2003-May/035575.html As to whether we add slots=True to @dataclasses, I'll let Guido decide. The code already exists as a separate decorator here: https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py#L3, if you want to play with it. Usage:
@add_slots ... @dataclass ... class A: ... x: int = 10 ... y: int = 20 ... a = A() a A(x=10, y=20) a.x = 15 a A(x=15, y=20) a.z = 30 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'A' object has no attribute 'z'
Folding it in to @dataclass is easy enough. On the other hand, since it just uses the dataclasses public API, it's not strictly required to be in @dataclass.
The second issue is that the different annotations give different signatures than would produced for manually written classes. It is unclear what the best practice is for where to put the annotations and their associated docstrings.
I don't have any suggestions here.
I'm hoping the typing experts will chime in here. The question is straight-forward. Where should we look for the signature and docstring for constructing instances? Should they be attached to the class, to __init__(), or to __new__() when it used.
It would be nice to have an official position on that before, it gets set in stone through arbitrary choices made by pycharm, pydoc, mypy, typing.NamedTuple, and dataclasses.dataclass.
I'm not sure I see why this would relate specifically to typing, since I don't think they'd inspect docstrings. But yes, it would be good to come to an agreement. Eric.

On Fri, Dec 8, 2017 at 3:44 PM, Eric V. Smith <eric@trueblade.com> wrote:
On 12/8/2017 1:28 PM, Raymond Hettinger wrote:
On Dec 7, 2017, at 12:47 PM, Eric V. Smith <eric@trueblade.com> wrote:
On 12/7/17 3:27 PM, Raymond Hettinger wrote: ...
I'm looking for guidance or workarounds for two issues that have arisen.
First, the use of default values seems to completely preclude the use of __slots__. For example, this raises a ValueError:
class A: __slots__ = ['x', 'y'] x: int = 10 y: int = 20
Hmm, I wasn't aware of that. I'm not sure I understand why that's an error. Maybe it could be fixed?
The way __slots__ works is that the type() metaclass automatically assigns member-objects to the class variables 'x' and 'y'. Member objects are descriptors that do the actual lookup.
So, I don't think the language limitation can be "fixed". Essentially, we're wanting to use the class variables 'x' and 'y' to hold both member objects and a default value.
Thanks. I figured this out after doing some research. Here's a thread "__slots__ and default values" from 14+ years ago from some guy named Hettinger: https://mail.python.org/pipermail/python-dev/2003-May/035575.html
As to whether we add slots=True to @dataclasses, I'll let Guido decide.
The code already exists as a separate decorator here: https://github.com/ericvsmith/dataclasses/blob/master/datacl ass_tools.py#L3, if you want to play with it.
Usage:
@add_slots ... @dataclass ... class A: ... x: int = 10 ... y: int = 20 ... a = A() a A(x=10, y=20) a.x = 15 a A(x=15, y=20) a.z = 30 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'A' object has no attribute 'z'
Folding it in to @dataclass is easy enough. On the other hand, since it just uses the dataclasses public API, it's not strictly required to be in @dataclass.
Let's do it. For most people the new class is an uninteresting implementation detail; for the rest we can document clearly that it is special.
The second issue is that the different annotations give different
signatures than would produced for manually written classes. It is unclear what the best practice is for where to put the annotations and their associated docstrings.
I don't have any suggestions here.
I'm hoping the typing experts will chime in here. The question is straight-forward. Where should we look for the signature and docstring for constructing instances? Should they be attached to the class, to __init__(), or to __new__() when it used.
It would be nice to have an official position on that before, it gets set in stone through arbitrary choices made by pycharm, pydoc, mypy, typing.NamedTuple, and dataclasses.dataclass.
I'm not sure I see why this would relate specifically to typing, since I don't think they'd inspect docstrings. But yes, it would be good to come to an agreement.
I don't recall in detail what all these tools and classes do with docstrings. Maybe if someone summarizes the status quo and explains how PEP 557 changes that it will be simple to decide. -- --Guido van Rossum (python.org/~guido)

On 8 December 2017 at 19:28, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
I'm hoping the typing experts will chime in here. The question is straight-forward. Where should we look for the signature and docstring for constructing instances? Should they be attached to the class, to __init__(), or to __new__() when it used.
It would be nice to have an official position on that before, it gets set in stone through arbitrary choices made by pycharm, pydoc, mypy, typing.NamedTuple, and dataclasses.dataclass.
Here are some thoughts about this: 1. Instance variables are given very little attention in pydoc. Consider this example:
class C: ... x: int = 1 ... def meth(self, y: int) -> None: ... ... help(C)
Help on class C in module __main__: class C(builtins.object) | Methods defined here: | | meth(self, y: int) -> None | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __annotations__ = {'x': <class 'int'>} | | x = 1 The methods defined are listed first and are nicely formatted, while variables together with __annotations__ are left at the very end. I think that a line like x: int = 1 should appear for every instance variable should appear first, even before methods, since this is how people write (and read) classes. See also https://bugs.python.org/issue28519 for another problem with pydoc. 2. pydoc already extracts the signature of class from __init__ and __new__ (giving the preference to later if both are present) including the type annotations. I think this can be kept as is, but the special constructs like NamedTuple and dataclass that auto-generate methods should add annotations to them. For example, there is an issue to add annotations to __new__ by NamedTuple, see https://bugs.python.org/issue31006 and https://github.com/python/typing/issues/454 -- Ivan

I'm not a typing expert, but I want to second Raymond's concerns, and perhaps I'm qualified to do so as I gave the PyCon USA __slots__ talk this year and I have a highly voted answer describing them on Stack Overflow. Beautiful thing we're doing here with the dataclasses, by the way. I think addressing the slots issue could be a killer feature of dataclasses. I hope this doesn't muddy the water: If I could change a couple of things about __slots__ it would be 1. to allow multiple inheritance with multiple parents with nonempty slots (raises "TypeError: multiple bases have instance lay-out conflict"), and 2. to avoid creating redundant slots if extant in a parent (but maybe we should do this in the C level for all classes?). It seems to me that Dataclasses could (and should) help us avoid the second issue regardless (should be trivial to look in the bases for preexisting slots, right?). My workaround for the first issue is to inherit from ABCs with empty slots, but you need cooperative multiple inheritance for this - and you need to track the expected attributes (easy if you use abstract properties, which slots provide for. Maybe not all users of Dataclasses are advanced enough to do this? So, maybe this is crazy (please don't call the nice men in white coats on me), came to me as I was responding, and definitely outside the box here, but perhaps we could make decorated dataclass be the abstract parent of the instantiated class? Thanks, Aaron Hall On Friday, December 8, 2017, 1:31:44 PM EST, Raymond Hettinger <raymond.hettinger@gmail.com> wrote: The way __slots__ works is that the type() metaclass automatically assigns member-objects to the class variables 'x' and 'y'. Member objects are descriptors that do the actual lookup. So, I don't think the language limitation can be "fixed". Essentially, we're wanting to use the class variables 'x' and 'y' to hold both member objects and a default value. I recommend that you follow the path taken by attrs and return a new class. Otherwise, we're leaving users with a devil's choice. You can have default values or you can have slots, but you can't have both. The slots are pretty important. With slots, a three attribute instance is only 64 bytes. Without slots, it is 296 bytes. I'm hoping the typing experts will chime in here. The question is straight-forward. Where should we look for the signature and docstring for constructing instances? Should they be attached to the class, to __init__(), or to __new__() when it used. It would be nice to have an official position on that before, it gets set in stone through arbitrary choices made by pycharm, pydoc, mypy, typing.NamedTuple, and dataclasses.dataclass. Raymond _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/aaronchall%40yahoo.com

On Dec 7, 2017 12:49, "Eric V. Smith" <eric@trueblade.com> wrote: The reason I didn't include it (as @dataclass(slots=True)) is because it has to return a new class, and the rest of the dataclass features just modifies the given class in place. I wanted to maintain that conceptual simplicity. But this might be a reason to abandon that. For what it's worth, attrs does have an @attr.s(slots=True) that returns a new class with __slots__ set. They actually switched to always returning a new class, regardless of whether slots is set: https://github.com/python-attrs/attrs/pull/260 You'd have to ask Hynek to get the full rationale, but I believe it was both for consistency with slot classes, and for consistency with regular class definition. For example, type.__new__ actually does different things depending on whether it sees an __eq__ method, so adding a method after the fact led to weird bugs with hashing. That class of bug goes away if you always set up the autogenerated methods and then call type.__new__. -n

On 12/8/2017 9:14 PM, Nathaniel Smith wrote:
On Dec 7, 2017 12:49, "Eric V. Smith" <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:
The reason I didn't include it (as @dataclass(slots=True)) is because it has to return a new class, and the rest of the dataclass features just modifies the given class in place. I wanted to maintain that conceptual simplicity. But this might be a reason to abandon that. For what it's worth, attrs does have an @attr.s(slots=True) that returns a new class with __slots__ set.
They actually switched to always returning a new class, regardless of whether slots is set:
In the end, it looks like that PR ended up just refactoring things, and the decision to always return a new class was deferred. I still haven't finished evaluating exactly what the refactoring does, though. Eric.
You'd have to ask Hynek to get the full rationale, but I believe it was both for consistency with slot classes, and for consistency with regular class definition. For example, type.__new__ actually does different things depending on whether it sees an __eq__ method, so adding a method after the fact led to weird bugs with hashing. That class of bug goes away if you always set up the autogenerated methods and then call type.__new__.
They have a bunch of test cases that I'll have to review, too. Eric.

On 9 December 2017 at 12:14, Nathaniel Smith <njs@pobox.com> wrote:
You'd have to ask Hynek to get the full rationale, but I believe it was both for consistency with slot classes, and for consistency with regular class definition. For example, type.__new__ actually does different things depending on whether it sees an __eq__ method, so adding a method after the fact led to weird bugs with hashing. That class of bug goes away if you always set up the autogenerated methods and then call type.__new__.
The main case I'm aware of where we do method inference in type.__new__ is setting "__hash__ = None" if "__eq__" is set. The main *problem* that arises with type replacement is that it currently interacts pretty badly with zero-argument super, since we don't make it easy to find and remap all the "__class__" references to the new class object. So right now, I think this trade-off tilts heavily in favour of "Keep the same class, but reimplement any required method inference logic when injecting methods". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (7)
-
Aaron Hall
-
Eric V. Smith
-
Guido van Rossum
-
Ivan Levkivskyi
-
Nathaniel Smith
-
Nick Coghlan
-
Raymond Hettinger