
I've written a PEP for what might be thought of as "mutable namedtuples with defaults, but not inheriting tuple's behavior" (a mouthful, but it sounded simpler when I first thought of it). It's heavily influenced by the attrs project. It uses PEP 526 type annotations to define fields. From the overview section: @dataclass class InventoryItem: name: str unit_price: float quantity_on_hand: int = 0 def total_cost(self) -> float: return self.unit_price * self.quantity_on_hand Will automatically add these methods: def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0) -> None: self.name = name self.unit_price = unit_price self.quantity_on_hand = quantity_on_hand def __repr__(self): return f'InventoryItem(name={self.name!r},unit_price={self.unit_price!r},quantity_on_hand={self.quantity_on_hand!r})' def __eq__(self, other): if other.__class__ is self.__class__: return (self.name, self.unit_price, self.quantity_on_hand) == (other.name, other.unit_price, other.quantity_on_hand) return NotImplemented def __ne__(self, other): if other.__class__ is self.__class__: return (self.name, self.unit_price, self.quantity_on_hand) != (other.name, other.unit_price, other.quantity_on_hand) return NotImplemented def __lt__(self, other): if other.__class__ is self.__class__: return (self.name, self.unit_price, self.quantity_on_hand) < (other.name, other.unit_price, other.quantity_on_hand) return NotImplemented def __le__(self, other): if other.__class__ is self.__class__: return (self.name, self.unit_price, self.quantity_on_hand) <= (other.name, other.unit_price, other.quantity_on_hand) return NotImplemented def __gt__(self, other): if other.__class__ is self.__class__: return (self.name, self.unit_price, self.quantity_on_hand) > (other.name, other.unit_price, other.quantity_on_hand) return NotImplemented def __ge__(self, other): if other.__class__ is self.__class__: return (self.name, self.unit_price, self.quantity_on_hand) >= (other.name, other.unit_price, other.quantity_on_hand) return NotImplemented Data Classes saves you from writing and maintaining these functions. The PEP is largely complete, but could use some filling out in places. Comments welcome! Eric. P.S. I wrote this PEP when I was in my happy place.

Oops, I forgot the link. It should show up shortly at https://www.python.org/dev/peps/pep-0557/. Eric. On 9/8/17 7:57 AM, Eric V. Smith wrote:

Interesting. I note that this under "Specification": """ field's may optionally specify a default value, using normal Python syntax: @dataclass class C: int a # 'a' has no default value int b = 0 # assign a default value for 'b' """ ...does not look like "normal Python syntax". On Fri, Sep 8, 2017 at 11:44 AM Eric V. Smith <eric@trueblade.com> wrote:

On 9/8/2017 11:01 AM, Eric V. Smith wrote:
Oops, I forgot the link. It should show up shortly at https://www.python.org/dev/peps/pep-0557/.
And now I've pushed a version that works with Python 3.6 to PyPI at https://pypi.python.org/pypi/dataclasses It implements the PEP as it currently stands. I'll be making some tweaks in the coming weeks. Feedback is welcomed. The repo is at https://github.com/ericvsmith/dataclasses Eric.

+1 Overall, this looks very well thought out. Nice work! Once you get agreement on the functionality, name bike-shedding will likely be next. In a way, all classes are data classes so that name doesn't tell me much. Instead, it would be nice to have something suggestive of what it actually does which is automatically adding boilerplate methods to a general purpose class. Perhaps, @boilerplate or @autoinit or some such. Raymond

On 11 September 2017 at 12:27, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Once you get agreement on the functionality, name bike-shedding will likely be next. In a way, all classes are data classes so that name doesn't tell me much. Instead, it would be nice to have something suggestive of what it actually does which is automatically adding boilerplate methods to a general purpose class. Perhaps, @boilerplate or @autoinit or some such.
"data class" is essentially short for "declarative data class" or "data-centric class": as a class author the decorator allows you to focus on declaring the data fields, and *not* on procedurally defining how those fields are initialised (and compared, and displayed, and hashed, ...) the way you do with a traditional imperative class definition. When I changed the name of contextlib.ignored to the more cryptic contextlib.suppress, I made the mistake of letting the folks that knew how the context manager worked dictate the name, rather than allowing it to keep the name that described what it was for. I think the same will apply here: we'll get a better name if we focus on describing the problem the capability solves in the simplest possible terms than we will if we choose something that more accurately describes how it is implemented. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 9/10/17 10:27 PM, Raymond Hettinger wrote:
Thank you.
Once you get agreement on the functionality, name bike-shedding will likely be next. In a way, all classes are data classes so that name doesn't tell me much. Instead, it would be nice to have something suggestive of what it actually does which is automatically adding boilerplate methods to a general purpose class. Perhaps, @boilerplate or @autoinit or some such.
There was some discussion on naming at https://github.com/ericvsmith/dataclasses/issues/12. In that issue, Guido said use Data Classes (the concept), dataclasses (the module) and dataclass (the decorator). I think if someone came up with an awesomely better name, we could all be convinced to use it. But so far, nothing's better. Eric.

On 2017-09-11 05:26, Eric V. Smith wrote:
On 9/10/17 10:27 PM, Raymond Hettinger wrote:
I've typically used these type of objects as records. When in an irreverent mood I've called them bags. The short name is helpful as they get used all over the place. I'll add Nick's "declarative" as it describes the problem well from another angle: - record - bag - declarative Anyone like these? I find them more intuitive than the existing name. Also, considering their uses, it might make sense to put them in the collections module. -Mike

- record +1 This really does match well with the record concept in databases, and most people are familiar with that. Though it will. E a touch confusing until (if ever) most of the database and cab traders, etc start using them. It also matches pretty well with numpy "record arrays": https://docs.scipy.org/doc/numpy-1.13.0/user/basics.rec.html - bag I also like this -- not many folks will have a ore-conceived notion of what it is. - declarative Yeach-- that's an adjective (at least on most common use) -- and a programming term that means something else. Also, considering their uses, it might make sense to put them in the collections module. Yup. -CHB

On Sep 12, 2017, at 9:01 AM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
This really does match well with the record concept in databases, and most people are familiar with that. Though it will. E a touch confusing until (if ever) most of the database and cab traders, etc start using them.
I REALLY need to stop quickly posting from my phone... ... Though it will be a touch confusing until (if ever) most of the database and csv readers etc. start using them. -CHB

On 13 September 2017 at 02:01, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
This really does match well with the record concept in databases, and most people are familiar with that.
No, most people aren't familiar with that - they only become familiar with it *after* they've learned to program and learned what a database is.
Though it will. E a touch confusing until (if ever) most of the database and cab traders, etc start using them.
Aside from the potential confusion with other technical uses of "record", a further problem with "record" is that it's ambiguous as to whether its referring to the noun (wreck-ord) or the verb (ree-cord). Even if folks correctly interpret it as a noun, there's still plenty of opportunities for folks to guess incorrectly about what it means based on the other conventional English uses of the word (e.g. a "personal record" will consist of multiple "records" in the database sense). So in this case, the vagueness of "data class" is considered a feature - since it doesn't inherently mean *anything*, folks are more likely to realise that they need to look up "Python data class", and if I search for that in a private window, the first Google hit is https://stackoverflow.com/questions/3357581/using-python-class-as-a-data-con... and the second is Eric's PEP.
Also, considering their uses, it might make sense to put them in the collections module.
Data classes are things you're likely to put *in* a collection, rather than really being collections themselves (they're only collections in the same sense that all Python classes are collections of attributes, and that's not the way the collections module uses the term). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 12, 2017 at 7:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think "data classes" is a fine moniker for this concept. It's ironic that some people dislike "data classes" because these are regular classes, not just for data, while others are proposing alternative names that emphasize the data container aspect. So "data classes" splits the difference, by referring to both data and classes. Let's bikeshed about something else. -- --Guido van Rossum (python.org/~guido)

On 2017-09-12 21:05, Guido van Rossum wrote:
True that these data-classes will be a superset of a traditional record. But, we already have objects and inheritance for those use cases. The data-class is meant to be used primarily like a record, so why not name it that way? Almost everything is extensible in Python; that shouldn't prevent focused names, should it?
Let's bikeshed about something else.
An elegant name can make the difference between another obscure module thrown in the stdlib to be never seen again and one that gets used every day. Which is more intuitive? from collections import record from dataclass import dataclass Would the language be as nice if "object" was named an "instanceclass?" Or perhaps the "requests" module could have been named "httpcall." Much of the reluctance to use the attrs module is about its weird naming. Due to the fact that this is a simple, potentially ubiquitous enhancement an elegant name is important. "For humans," or something, haha. -Mike

On Thu, Sep 14, 2017 at 10:24:52AM -0700, Mike Miller wrote:
I'd expect something like a C struct or an ML record.
from dataclass import dataclass
This is more intuitive, since the PEP example also has attached methods like total_cost(). I don't think this is really common for records. Stefan Krah

On 2017-09-14 10:45, Stefan Krah wrote:
I'd expect something like a C struct or an ML record.
Struct is taken, and your second example is record.
Every class can be extended, does that mean they can't be given appropriate names? (Not to mention dataclass is hardly intuitive for something that can have methods added.) -Mike

On Thu, Sep 14, 2017 at 11:06:15AM -0700, Mike Miller wrote:
*If* the name were collections.record, I'd expect collections.record to be something like a C struct or an ML record. I'm NOT proposing "record".
A class is not a record. This brief conversation already convinced me that "record" is a bad name for the proposed construct. Stefan Krah

On 2017-09-15 05:08, Michel Desmoulin wrote:
Because given how convenient it is, it will most probably becomes the default way to write classes in Python. Not just for record.
Yes, would have been great if this was how the original object worked and the current barebones object was a base(object) or something like that. Too late however. Another option was "bag" which is more generic and brief, and might seem to fit better, but the discussion went towards record. -Mike

On 2017-09-12 19:09, Nick Coghlan wrote:
Pretty sure he was talking about programmers, and they are introduced to the concept early. Structs, objects with fields, random access files, databases, etc. Lay-folks are familiar with "keeping records" as you mention, but they are not the primary customer it seems. Record is the most common name for this ubiquitous concept.
whether its referring to the noun (wreck-ord) or the verb (ree-cord).
This can be grasped from context quickly, and due to mentioned ubiquity, not likely to be a problem in the real world. "Am I going to ree-cord this class?"
Yes, a collection of attributes, not significantly different than the namedtuple (that began this thread) or the various dictionaries implemented there already. The criteria doesn't appear to be very strict, should it be? (Also, could be put into a submodule and imported into it maintain modularity. Where it lands though isn't so important, just that collections is relatively likely to be imported already on medium sized projects, and I think the definition fits, collections == "bags of stuff".) Cheers, -Mike

On Sep 14, 2017, at 09:56, Mike Miller <python-dev@mgmiller.net> wrote:
Record is the most common name for this ubiquitous concept.
Mind if we call them Eric Classes to keep it clear? Because if its name is not Eric Classes, it will cause a little confusion. g’day-bruce-ly y’rs, -Barry

On 15 September 2017 at 02:56, Mike Miller <python-dev@mgmiller.net> wrote:
Python is an incredibly common first programming language, so we need to keep folks with *zero* knowledge of programming jargon firmly in mind when designing new features. That isn't always the most important consideration, but it's always *a* consideration. And, as Stefan notes in his reply, we also need to keep *misleading* inferences in mind when we consider repurposing existing jargon for a new use case - what seems like an obviously intuitive connection based on our own individual experiences with a term may turn out to be extremely counterintuitive for someone with a different experience of the same term. In such cases, it can make sense to look for new *semantically neutral* terminology as the official glossary entry and API naming scheme, and rely on documentation to indicate that this is a realisation of a feature that goes by other names in other contexts. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Eric, A few quick comments: Why do you even have a hash= argument on individual fields? For the whole class, I can imagine you might want to explicitly mark a whole class as unhashable, but it seems like the only thing you can do with the field-level hash= argument is to create a class where the __hash__ and __eq__ take different fields into account, and why would you ever want that? Though honestly I can see a reasonable argument for removing the class-level hash= option too. And even if you keep it you might want to error on some truly nonsensical options like defining __hash__ without __eq__. (Also watch out that Python's usual rule about defining __eq__ blocking the inheritance of __hash__ does not kick in if __eq__ is added after the class is created.) I've sometimes wished that attrs let me control whether it generated equality methods (eq/ne/hash) separately from ordering methods (lt/gt/...). Maybe the cmp= argument should take an enum with options none/equality-only/full? The "why not attrs" section kind of reads like "because it's too popular and useful"? -n On Sep 8, 2017 08:44, "Eric V. Smith" <eric@trueblade.com> wrote: Oops, I forgot the link. It should show up shortly at https://www.python.org/dev/peps/pep-0557/. Eric. On 9/8/17 7:57 AM, Eric V. Smith wrote:

On 9/10/17 11:08 PM, Nathaniel Smith wrote:
The use case is that you have a cache, or something similar, that doesn't affect the object identity.
Yeah, I've thought about this, too. But I don't have any use case in mind, and if it hasn't come up with attrs, then I'm reluctant to break new ground here.
The "why not attrs" section kind of reads like "because it's too popular and useful"?
I'll add some words to that section, probably focused on typing compatibility. My general feeling is that attrs has some great design decisions, but goes a little too far (e.g., conversions, validations). As with most things we add, I'm trying to be as minimalist as possible, while still being widely useful and allowing 3rd party extensions and future features. Eric.

Hi Eric, I have on question not addressed yet. The implementation is based on "__annotations__" where the type is specified. But "__annotations__" is not always filled. An interpreter version with special optimization could remove all __annotations__ for performance reasons. (Discussed in other threads) In this case the dataclass does not work or will there be a fallback? I know it is a little bit hypothetical because an interpreter with this optimization is not there yet. I am looking only in the future a bit. Asking this because type annotations are stated as completely optional for Python. And this use case will break this assumption. Personally I am a heavy user of attrs and happy to have a dataclass in the std lib. Regards Wolfgang

On 9/11/17 9:43 AM, tds333@mailbox.org wrote:
Yes, if there are no __annotations__, then Data Classes would break. typing.NamedTuple has the same issue. We discussed it a little bit last week, but I don't think we came to any conclusions. Since @dataclass ignores the value of the annotation (except for typing.ClassVar), it would continue to work if the type was present, buy maybe mapped to None or similar. Eric.

On Mon, Sep 11, 2017 at 6:51 AM, Eric V. Smith <eric@trueblade.com> wrote:
Let's not worry about a future where there's no __annotations__. Type annotations will gradually become more mainstream. You won't have to use them, but some newer features of the language will be inaccessible if you don't. This has already started with the pattern based on inheriting from typing.NamedTuple and using field annotations. Dataclasses are simply another example. (That said, I strongly oppose *runtime type checking* based on annotations. It's by and large a mistaken idea. But this has nothing to do with that.) -- --Guido van Rossum (python.org/~guido)

On Mon, Sep 11, 2017 at 5:32 AM, Eric V. Smith <eric@trueblade.com> wrote:
But wouldn't this just be field(cmp=False), no need to fiddle with hash=?
https://github.com/python-attrs/attrs/issues/170
If the question is "given that we're going to add something to the stdlib, why shouldn't that thing be attrs?" then I guess it's sufficient to say "because the attrs developers didn't want it". But I think the PEP should also address the question "why are we adding something to the stdlib, instead of just recommending people install attrs". -n

On 9/11/2017 12:34 PM, Nathaniel Smith wrote:
Ah, true. You're right, I can't see any good use for setting hash on a field that isn't already controlled by cmp. I think field level hash can go.
At the class level, I think it makes more sense. But I'll write up some motivating examples.
I'll respond to other emails about this, probably tomorrow. Eric.

On Sep 10, 2017, at 20:08, Nathaniel Smith <njs@pobox.com> wrote:
I've sometimes wished that attrs let me control whether it generated equality methods (eq/ne/hash) separately from ordering methods (lt/gt/...). Maybe the cmp= argument should take an enum with options none/equality-only/full?
I have had use cases where I needed equality comparisons but not ordered comparisons, so I’m in favor of the option to split them. (atm, I can’t bring up a specific case, but it’s not uncommon.) Given that you only want to support the three states that Nathaniel describes, I think an enum makes the most sense, and it certainly would read well. I.e. there’s no sense in supporting the ordered comparisons and not equality, so that’s not a state that needs to be represented. I’d make one other suggestion here: please let’s not call the keyword `cmp`. That’s reminiscent of Python 2’s `cmp` built-in, which of course doesn’t exist in Python 3. Using `cmp` is just an unnecessarily obfuscating abbreviation. I’d suggest just `compare` with an enum like so: enum Compare(enum.Enum): none = 1 unordered = 2 ordered = 3 One thing I can’t avoid is DRY: from dataclasses import Compare, dataclass @dataclass(compare=Compare.unordered) class Foo: # … Maybe exposing the enum items in the module namespace? ——dataclasses/__init__.py----- from enum import Enum class Compare(Enum): none = 1 unordered = 2 ordered =3 none = Compare.none unordered = Compare.unordered ordered = Compare.ordered ——dataclasses/__init__.py----- from dataclasses import dataclass, unordered @dataclass(compare=unordered) class Foo: # … Cheers, -Barry

Oddly I don't like the enum (flag names get too long that way), but I do agree with everything else Barry said (it should be a trivalue flag and please don't name it cmp). On Mon, Sep 11, 2017 at 3:16 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
-- --Guido van Rossum (python.org/~guido)

On 9/11/2017 6:28 PM, Guido van Rossum wrote:
So if we don't do enums, I think the choices are ints, strs, or maybe True/False/None. Do you have a preference here? If int or str, I assume we'd want module-level constants. I like the name compare=, and 3 values makes sense: None, Equality, Ordered. Eric.

On Sep 11, 2017, at 18:36, Eric V. Smith <eric@trueblade.com> wrote:
+1 for the name, the 3 values, and making them module constants. After that, I don’t think it really matters what their implementation is. User code will look the same either way. One minor nice effect of using an enum is that the dataclass function can use `is` instead of `==` to compare keyword argument values. -Barry

Or we could just have two arguments, eq=<bool> and order=<bool>, and some rule so that you only need to specify one or the other but not both. (E.g. order=True implies eq=True.) That seems better than needing new constants just for this flag. On Mon, Sep 11, 2017 at 6:49 PM, Barry Warsaw <barry@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)

On Sep 11, 2017, at 19:16, Guido van Rossum <guido@python.org> wrote:
Or we could just have two arguments, eq=<bool> and order=<bool>, and some rule so that you only need to specify one or the other but not both. (E.g. order=True implies eq=True.) That seems better than needing new constants just for this flag.
You’d have to disallow the combination `order=True, eq=False` then, right? Or would you ignore eq for any value of order=True? Seems like a clumsier API than a single tri-value parameter. Do the module constants bother you that much? -Barry

On Mon, Sep 11, 2017 at 8:21 PM, Barry Warsaw <barry@python.org> wrote:
Yes they do. You may have to import them, or you have to prefix them with the module name -- whereas keyword args and True/False require neither. We could disallow order=True, eq=True. Or we could have the default being to generate __eq__, __ne__ and __hash__, and a flag to prevent these (since equality by object identity is probably less popular than equality by elementwise comparison). Perhaps: order: bool = False eq: bool = True and disallowing order=True, eq=False. -- --Guido van Rossum (python.org/~guido)

On 09/11/2017 03:28 PM, Guido van Rossum wrote:
Oddly I don't like the enum (flag names get too long that way), but I do agree with everything else Barry said (it should be a trivalue flag and please don't name it cmp).
Hmmm, named constants are one of the motivating factors for having an Enum type. It's easy to keep the name a reasonable length, however: export them into the module-level namespace. re is an excellent example; the existing flags were moved into a FlagEnum, and then (for backwards compatibility) aliased back to the module level: class RegexFlag(enum.IntFlag): ASCII = sre_compile.SRE_FLAG_ASCII # assume ascii "locale" IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode "locale" MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments A = ASCII I = IGNORECASE L = LOCALE U = UNICODE M = MULTILINE S = DOTALL X = VERBOSE # sre extensions (experimental, don't rely on these) TEMPLATE = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking T = TEMPLATE DEBUG = sre_compile.SRE_FLAG_DEBUG # dump pattern after compilation globals().update(RegexFlag.__members__) So we can still do re.I instead of re.RegexFlag.I. Likewise, if we had: class Compare(enum.Enum): NONE = 'each instance is an island' EQUAL = 'instances can be equal to each other' ORDERED = 'instances can be ordered and/or equal' globals().update(Compare.__members__) then we can still use, for example, EQUAL, but get the more informative repr and str when we need to. -- ~Ethan~

On 9/10/2017 11:08 PM, Nathaniel Smith wrote:
See the discussion at https://github.com/ericvsmith/dataclasses/issues/44 for why we're keeping the field-level hash function. Quick version, from Guido: "There's a legitimate reason for having a field in the eq but not in the hash. After all hash is always followed by an eq check and it is totally legit for the eq check to say two objects are not equal even though their hashes are equal." This would be in the case where a field should be used for equality testing, but computing its hash is expensive. Eric.

On 8 September 2017 at 15:57, Eric V. Smith <eric@trueblade.com> wrote:
Looks good! One minor point - apparently in your happy place, C and Python have the same syntax :-) """ field's may optionally specify a default value, using normal Python syntax: @dataclass class C: int a # 'a' has no default value int b = 0 # assign a default value for 'b' """

On 8 September 2017 at 07:57, Eric V. Smith <eric@trueblade.com> wrote:
Very nice!
My one technical question about the PEP relates to the use of an exact type check in the comparison methods, rather than "isinstance(other, self.__class__)". I think I agree with that decision, but it isn't immediately obvious that the class identity is considered part of the instance value for a data class, so if you do: @dataclass class BaseItem: value: Any class DerivedItem: pass Then instances of DerivedItem *won't* be considered equivalent to instances of BaseItem, and they also won't be orderable relative to each other, even though "DerivedItem" doesn't actually add any new data fields. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 08, 2017 at 10:37:12AM -0700, Nick Coghlan wrote:
I haven't read the whole PEP in close detail, but that method stood out for me too. Only, unlike Nick, I don't think I agree with the decision. I'm also not convinced that we should be adding ordered comparisons (__lt__ __gt__ etc) by default, if these DataClasses are considered more like structs/records than tuples. The closest existing equivalent to a struct in the std lib (apart from namedtuple) is, I think, SimpleNamespace, and they are unorderable. -- Steve

On 09/08/2017 11:38 AM, Steven D'Aprano wrote:
I'll split the difference. ;) I agree with D'Aprano that an isinstance check should be used, but I disagree with him about the rich comparison methods -- please include them. I think unordered should only be the default when order is impossible, extremely difficult, or nonsensical. -- ~Ethan~

On 2017-09-08 07:57, Eric V. Smith wrote:
I've written a PEP for…
Apologies for the following list dumb questions and bikesheds: - 'Classes can be thought of as "mutable namedtuples with defaults".' - A C/C++ (struct)ure sounds like a simpler description that many more would understand. - dataclass name: - class, redundant - data, good but very common - struct, used? - Record? (best I could come up with) - Source needs blanks between functions, hard to read. - Are types required? Maybe an example or two with Any? - Intro discounts inheritance and metaclasses as "potentially interfering", but unclear why that would be the case. Inheritance is easy to override, metaclasses not sure? - Perhaps mention ORMs/metaclass approach, as prior art: https://docs.djangoproject.com/en/dev/topics/db/models/ - Perhaps mention Kivy Properties, as prior art: https://kivy.org/docs/api-kivy.properties.html - For mutable default values: @dataclass class C: x: list # = field(default_factory=list) Could it detect list as a mutable class "type" and set it as a factory automatically? The PEP/bug #3 mentions using copy, but that's not exactly what I'm asking above.

On 2017-09-08 07:57, Eric V. Smith wrote:
I've written a PEP for…
Apologies for the following list dumb questions and bikesheds: - 'Classes can be thought of as "mutable namedtuples with defaults".' - A C/C++ (struct)ure sounds like a simpler description that many more would understand. - dataclass name: - class, redundant - data, good but very common - struct, used? - Record? (best I could come up with) - Source needs blanks between functions, hard to read. - Are types required? Maybe an example or two with Any? - Intro discounts inheritance and metaclasses as "potentially interfering", but unclear why that would be the case. Inheritance is easy to override, metaclasses not sure? - Perhaps mention ORMs/metaclass approach, as prior art: https://docs.djangoproject.com/en/dev/topics/db/models/ - Perhaps mention Kivy Properties, as prior art: https://kivy.org/docs/api-kivy.properties.html - For mutable default values: @dataclass class C: x: list # = field(default_factory=list) Could it detect list as a mutable class "type" and set it as a factory automatically? The PEP/bug #3 mentions using copy, but that's not exactly what I'm asking above.

On 9/8/17 3:20 PM, Mike Miller wrote:
Yes, other people have pointed out that this might not be the best "elevator pitch" example. I'm thinking about it.
There was a bunch of discussions on this. We're delaying the name bikeshedding for later (and maybe never).
- Source needs blanks between functions, hard to read.
It's supposed to be hard to read! You're just supposed to think "am I glad I don't have to read or write that". But I'll look at it.
- Are types required?
Annotations are required, the typing module is not.
Maybe an example or two with Any?
I'd rather leave it like it is: typing is referenced only once, for ClassVar.
I don't really want to get in to the history of why people don't like inheritance, single and multi. Or how metaclass magic can make life difficult. I just want to point out that Data Classes don't interfere at all.
Those are all good. Thanks.
The problem is: how do you know what's a mutable type? There's no general way to know. The behavior in the PEP is just mean to stop the worst of it. I guess we could have an option that says: call the type to create a new, empty instance. @dataclass class C: x: list = field(default_type_is_factory=True) Thanks for the critical reading and your comments. I'm going to push a new version early next week, when I get back from traveling. Eric.

I think it would be useful to write 1-2 sentences about the problem with inheritance -- in that case you pretty much have to use a metaclass, and the use of a metaclass makes life harder for people who want to use their own metaclass (since metaclasses don't combine without some manual intervention). On Fri, Sep 8, 2017 at 3:40 PM, Eric V. Smith <eric@trueblade.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 9 September 2017 at 01:00, Guido van Rossum <guido@python.org> wrote:
I think it would be useful to write 1-2 sentences about the problem with inheritance -- in that case you pretty much have to use a metaclass,
It is not the case now. I think __init_subclass__ has almost the same possibilities as a decorator, it just updates an already created class and can add some methods to it. This is a more subtle question, these two for example would be equivalent: from dataclass import Data, Frozen class Point(Frozen, Data): x: int y: int and from dataclass import dataclass @dataclass(frozen=True) class Point: x: int y: int But the problem with inheritance based pattern is that it cannot support automatic addition of __slots__. Also I think a decorator will be easier to maintain. But on the other hand I think inheritance based scheme is a bit more readable. -- Ivan

Hi list, first, a big thanks to the authors of PEP 557! Great idea! For me, the dataclasses were a typical example for inheritance, to be more precise, for metaclasses. I was astonished to see them implemented using decorators, and I was not the only one, citing Guido:
Python is at a weird point here. At about every new release of Python, a new idea shows up that could be easily solved using metaclasses, yet every time we hesitate to use them, because of said necessary manual intervention for metaclass combination. So I think we have two options now: We could deprecate metaclasses, going down routes like PEP 487's __init_subclass__. Unfortunately, for data classes __init_subclass__ it is too late in the class creation process for it to influence the __slots__ mechanism. A __new_subclass__, that acts earlier, could do the job, but to me that simply sounds like reinventing the wheel of metaclasses. The other option would be to simply make metaclasses work properly. We would just have to define a way to automatically combine metaclasses. Guido once mention once (here: https://mail.python.org/pipermail/python-dev/2017-June/148501.html) that he left out automatic synthesis of combined metaclasses on purpose, but given that this seems to be a major problem, I think it is about time to overthink this decision. So I propose to add such an automatic synthesis. My idea is that a metaclass author can define the __or__ and __ror__ methods for automatic metaclass synthesis. Then if a class C inherits from two classes A and B with metaclasses MetaA and MetaB, the metaclass would be MetaA | MetaB. Greetings Martin

You're right that if it were easier to combine metaclasses we would not shy away from them so easily. Perhaps you and others interested in this topic can try to prototype an implementation and see how it would work in practice (with some realistic existing metaclasses)? Then the next step would be to write a PEP. But in this case I really recommend trying to implement it first (in pure Python) to see if it can actually work. On Thu, Oct 12, 2017 at 11:21 AM, Martin Teichmann <lkb.teichmann@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

I think we've drifted into a new topic, but... I was astonished to see them
implemented using decorators, and I was not the only one.
...
I was thinking about this last spring, when I tried to cram all sorts of python metaprogramming into one 3hr class... Trying to come up with a an exam[ple for metclasses, I couldn't come up with anything that couldn't be done more claerly (to me) with a class decorator. I also found some commentary on the web (sorry, no links :-( ) indicating that metacalsses were added before class decorators, and that they really don't have a compelling use case any more. Now it seem that not only do they not have a compelling use case, in some (many) instances, there are compelling reasons to NOT use them, and rather use decorators. So why deprecate them? or at least discourage their use? The other option would be to simply make metaclasses work properly. We
would just have to define a way to automatically combine metaclasses.
"just"? Anyway, let's say that is doable -- would you then be able to do something with metaclasses that you could not do with decorators? or it in a cleaner, easier to write or understand way? There-should-be-one--and-preferably-only-one--obvious-way-to-do-it-ly yours, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 10/12/2017 03:44 PM, Chris Barker wrote:
I think we've drifted into a new topic, but...
The Enum data type requires metaclasses. Any time you want to modify the behavior of a class (not its instances, the class itself) you need a metaclass. Agreed that it's pretty rare, but we need them. -- ~Ethan~

Thanks for the info. On 2017-09-08 15:40, Eric V. Smith wrote:
Guess I really meant "object" or "type" above, not "typing.Any." For a decade or two, my use of structs/records in Python has been dynamically-typed and that hasn't been an issue. As the problem this PEP is solving is orthogonal to typing improvements, it feels a bit like typing is being coupled and pushed with it, whether wanted or not. Not that I dislike typing mind you (appreciated on projects with large teams), it's just as mentioned not related to the problem of class definition verbosity or lacking functionality. Cheers, -Mike

On 10 September 2017 at 23:05, Mike Miller <python-ideas@mgmiller.net> wrote:
[...] As the problem this PEP is solving is orthogonal to typing improvements
This is not the case, static support for dataclasses is an import point of motivation. It is hard to support static typing for many third party packages like attrs, since they use a lot of "magic". -- Ivan

On 2017-09-10 14:23, Ivan Levkivskyi wrote:
This is not the case, static support for dataclasses is an import point of motivation.
I've needed this functionality a decade before types became cool again. ;-)
As mentioned, nothing against static typing, would simply like an example without, to show that it is not required. -Mike

Thanks for the info. On 2017-09-08 15:40, Eric V. Smith wrote:
Guess I really meant "object" or "type" above, not "typing.Any." For a decade or two, my use of structs/records in Python has been dynamically-typed and that hasn't been an issue. As the problem this PEP is solving is orthogonal to typing improvements, it feels a bit like typing is being coupled and pushed with it, whether wanted or not. Not that I dislike typing mind you (appreciated on projects with large teams), it's just as mentioned not related to the problem of class definition verbosity or lacking functionality. Cheers, -Mike

Using type annotations buys two things. One is the concise syntax: it's how the decorator finds the fields. In the simple case, it's what let's you define a data class without using the call to fields(), which is analogous to attrs's attr.ib(). The other thing it buys you is compatibility with type checkers. As Ivan said, the design was very careful to be compatible with type checkers. So other than requiring some type as an annotation, there's no dependency added for typing in the genera sense. You can use a type of object if you want, or just lie to it and say None (though I'm not recommending that!). It's completely ignored at runtime (except for the ClassVar case, as the PEP states). I'll think about adding some more language to the PEP about it. Eric.

Hi, first post here. My two cents: Here's a list of "prior arts" that I have collected over the years, besides attrs, that address similar needs (and often, much more): - https://github.com/bluedynamics/plumber - https://github.com/ionelmc/python-fields - https://github.com/frasertweedale/elk - https://github.com/kuujo/yuppy Regarding the name, 'dataclass', I agree that it can be a bit misleading (my first idea of a "dataclass" would be a class with only data and no behaviour, e.g. a 'struct', a 'record', a DTO, 'anemic' class, etc.). Scala has 'case classes' with some similarities ( https://docs.scala-lang.org/tour/case-classes.html). Regards, S. On Fri, Sep 8, 2017 at 4:57 PM, Eric V. Smith <eric@trueblade.com> wrote:
-- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group / Systematic Cluster - http://www.gt-logiciel-libre.org/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyData Paris - http://pydata.fr/ --- “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.” — R. Buckminster Fuller

Hi, it is not clear whether anything is done to total_cost: def total_cost(self) -> float: Does this become a property automatically, or is it still a method call? To that end, some examples of *using* a data class, not just defining one, would be helpful. If it remains a normal method, why put it in this example at all? Makes little sense... Otherwise I really like this idea, thanks! On 8 September 2017 at 15:57, Eric V. Smith <eric@trueblade.com> wrote:
-- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert

On 9/9/2017 11:41 AM, Gustavo Carneiro wrote:
Nothing is done with total_cost, it's still a method. It's meant to show that you can use methods in a Data Class. Maybe I should add a method that has a parameter, or at least explain why that method is present in the example. I'm not sure how I'd write an example showing you can you do everything you can with an undecorated class. I think some text explanation would be better. Eric.

The reaction is overwhelmingly positive everywhere: hacker news, reddit, twitter. People have been expecting something like that for a long time. 3 questions: - is providing validation/conversion hooks completely out of the question of still open for debate ? I know it's to keep the implementation simple but have a few callbacks run in the __init__ in a foo loop is not that much complexity. You don't have to provide validators, but having a validators parameters on field() would be a huge time saver. Just a list of callables called when the value is first set, potentially raising an exception that you don't even need to process in any way. It returns the value converted, and voilà. We all do that every day manually. - I read Guido talking about some base class as alternative to the generator version, but don't see it in the PEP. Is it still considered ? - any chance it becomes a built in later ? When classes have been improved in Python 2, the object built-in was added. Imagine if we had had to import it every time... Or maybe just plug it to object like @object.dataclass.

On 9/10/2017 10:00 AM, Michel Desmoulin wrote:
The reaction is overwhelmingly positive everywhere: hacker news, reddit, twitter.
Do you have a pointer to the Hacker News discussion? I missed it.
People have been expecting something like that for a long time.
Me, too!
I don't particularly want to add validation specifically. I want to make it possible to add validation yourself, or via a library. What I think I'll do is add a metadata parameter to fields(), defaulting to None. Then you could write a post-init hook that does whatever single- and multi-field validations you want (or whatever else you want to do). Although this plays poorly with "frozen" classes: it's always something! I'll think about it. To make this most useful, I need to get the post-init hook to take an optional parameter so you can get data to it. I don't have a good way to do this, yet. Suggestions welcomed. Although if the post-init hook takes a param that you can pass in at object creation time, I guess there's really no need for a per-field metadata parameter: you could use the field name as a key to look up whatever you wanted to know about the field.
- I read Guido talking about some base class as alternative to the generator version, but don't see it in the PEP. Is it still considered ?
I'm going to put some words in explaining why I don't want to use base classes (I don't think it buys you anything). Do you have a reason for preferring base classes?
Because of the choice of using module-level functions so as to not introduce conflicts in the object's namespace, it would be difficult to make this a builtin. Although now that I think about it, maybe what are currently module-level functions should instead be methods on the "dataclass" decorator itself: @dataclass class C: i: int = dataclass.field(default=1, init=False) j: str c = C('hello') dataclass.asdict(c) {'i': 1, 'j': 'hello'} Then, "dataclass" would be the only name the module exports, making it easier to someday be a builtin. I'm not sure it's important enough for this to be a builtin, but this would make it easier. Thoughts? I'm usually not a fan of having attributes on a function: it's why itertools.chain.from_iterable() is hard to find. Eric.

On Sun, Sep 10, 2017 at 9:36 AM, Eric V. Smith <eric@trueblade.com> wrote:
The temptation to make everything a builtin should be resisted.
Let's not do that. It would be better to design the module so that people can write `from dataclasses import *` and they will only get things that are clearly part of dataclasses (I guess dataclass, field, asdict, and a few more like that). That way people who really want this to look like a builtin can just violate PEP 8. -- --Guido van Rossum (python.org/~guido)

Le 10/09/2017 à 18:36, Eric V. Smith a écrit :
Err... I may have been over enthusiastic and created the hacker news thread in my mind.
It doesn't really allow you to do anything you couldn't do as easily as in __init__. Alternatively, you could have a "on_set" hooks for field(), that just take the field value, and return the field value. By default, it's an identity function and is always called (minus implementation optimizations): from functools improt reduce self.foo = reduce((lambda data, next: next(*data)), on_set_hooks, (field, val)) And people can do whatever they want: default values, factories, transformers, converters/casters, validation, logging...
Not preferring, but having it as an alternative. Mainly for 2 reasons: 1 - data classes allow one to type in classes very quickly, let's harvest the benefit from that. Typing a decorator in a shell is much less comfortable than using inheritance. Same thing about IDE: all current ones have snippet with auto-switch to the class parents on tab. All in all, if you are doing exploratory programming, and thus disposable code, which data classes are fantastic for, inheritance will keep you in the flow. 2 - it will help sell the data classes I train a lot of people to Python each year. I never have to explain classes to people with any kind of programming background. I _always_ have to explain decorators. People are not used to it, and even kind fear it for quite some time. Inheritance however, is familiar, and will not only push people to use data classes more, but also will let them do less mistakes: they know the danger of parent ordering, but not the ones of decorators ordering.

Thanks for the PEP! :) I like the naming. ;) Though, I would like to add to Michel's argument in favor of a base class. On 11.09.2017 08:38, Michel Desmoulin wrote:
3) - the order of base classes can arranged appropriately In our day-to-day work, we use mixins and cooperative multiple inheritance a lot. So, having dataclasses as a base class or a mixin would be great! :) Combined with 1) and 2), I am much in favor of having dataclasses as base class/mixin than as a decorator. What are the benefits of the decorator? Maybe both is possible? Cheers, Sven PS: @Michel good observation 1). Typing decorators in shell is annoying.

I make this suggestion in trepidation, given that Guido called a halt on the Great Naming Debate, but it seems that a short, neutral name with data connotations previously not a part of many popular subsystems is required. I therefore propose "row", which is sufficiently neutral to avoid most current opposition and yet a common field-oriented mechanism for accessing units of retrieved data by name. regards Steve Steve Holden On Sat, Sep 16, 2017 at 3:44 PM, Sven R. Kunze <srkunze@mail.de> wrote:

(Apologies for reviving a dead horse, but may not be around at the blessed time.) As potential names of this concept, I liked record and row, but agreed they were a bit too specific and not quite exact. In my recent (unrelated) reading however, I came across another term and think it might fit better, called an "entity." It has some nice properties: - Traditional dictionary definition, meaning "thing" - Same specificity as the current base-class name: object - Corresponds to a class or instance (depending on context) in data terminology From: http://ewebarchitecture.com/web-databases/database-entities An entity is a thing or object of importance about which data must be captured. Information about an entity is captured in the form of attributes and/or relationships. All things aren't entities—only those about which information should be captured. If something is a candidate for being an entity and it has no attributes or relationships, it isn't an entity. Thoughts? Another candidate is "container" but is not very descriptive. -Mike On 2017-09-16 11:14, Steve Holden wrote:

On 12 October 2017 at 06:33, Mike Miller <python-dev@mgmiller.net> wrote:
By contrast, if we give them their own name (as with suggestions like record, row, entity), that makes them start to sound more like enums: an alternative base class with different runtime behaviour from a regular class. Cheers, Nick. P.S. I'll grant that this reasoning doesn't entirely mesh with the naming of "Abstract Base Class", but that phrase at least explicitly has the word "base" in it, suggesting that inheritance is involved in the way it works. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2017-10-11 19:56, Nick Coghlan wrote:
IMO, the problem with the dataclass name isn't the data part, but the "class" part. No other class has "class" in its name(?), not even object. The Department of Redundancy Department will love it. If it must be a compound name, it should rather be dataobject, no?
This pep also adds many methods for use at runtime, though perhaps the behavior is more subtle.
There was some discussion over inheritance vs. decoration, not sure if it was settled. (Just noticed that the abc module got away with a class name of "ABC," perhaps dataclass would be more palatable as "DC", though entity sounds a bit nicer.) Cheers, -Mike

On 12 October 2017 at 14:49, Mike Miller <python-dev@mgmiller.net> wrote:
No, because dataclass is the name of a class decorator ("This class is a data class"), not the name of a type. It's akin to "static method", "class method", and "instance method" for function definitions (although the last one isn't a typical decorator, since it's the default behaviour for functions placed inside a class). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Oct 11, 2017 at 10:33 PM, Mike Miller <python-dev@mgmiller.net> wrote:
I'm not familiar with ER modelling but I would advise against using the term "entity", as it has, in domain-driven design (DDD) a very specific meaning: "An object that is not defined by its attributes, but rather by a thread of continuity and its identity." (from https://en.wikipedia.org/wiki/Domain-driven_design#Building_blocks) See also the more general Wikipedia definition "An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not." ( https://en.wikipedia.org/wiki/Entity). In the context of DDD, entities are usually opposed to value objects: "An object that contains attributes but has no conceptual identity. They should be treated as immutable.". ( https://en.wikipedia.org/wiki/Domain-driven_design#Building_blocks) Attrs, and by extension the dataclass proposal (I guess), provide some support for both: - Providing support for quickly constructing immutable objects from a bag of attributes, and providing equality based on those attributes, it helps implement Value Objects (not sure much more is needed actually) - By supporting equality based on some "primary key", it will also help with maintaining the concept of "equality" in entities. It would be great if the dataclass proposal could help implement DDD technical concepts in Python, but its terminology should not conflict the DDD terminology, if we want to avoid confusion. Cheers, S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group / Systematic Cluster - http://www.gt-logiciel-libre.org/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyData Paris - http://pydata.fr/ --- “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.” — R. Buckminster Fuller

On Thu, Oct 12, 2017 at 10:20 AM, Mike Miller <python-dev@mgmiller.net> wrote:
Yes, for the lifetime of the object in the Python VM. But if you are dealing with objects that are persisted using some kind of ORM, ODM, OODB, then it wont work. It's quite common (but not always the best solution) to use some kind of UUID to represent the identity of each entity. Also, there can be circumstances where two objects can exist at the same time in the VM which represent the same object, in which case one should ensure that a == b iff a.uid == a.uid (in the case 'uid' is the attribute used to carry the unique identifier).
I don't believe either module particularly supports or restricts immutability?
http://www.attrs.org/en/stable/examples.html#immutability https://www.python.org/dev/peps/pep-0557/#frozen-instances S.
-- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group / Systematic Cluster - http://www.gt-logiciel-libre.org/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyData Paris - http://pydata.fr/ --- “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.” — R. Buckminster Fuller

On Thu, Oct 12, 2017 at 9:20 AM, Mike Miller <python-dev@mgmiller.net> wrote:
distinction similar to the one between classes (entities) and instances (value objects). The reason I liked "row" as a name is because it resembles "vector" and hence is loosely assocaited with the concept of a tuple as well as being familiar to database users. In fact the answer to a relational query was, I believe, originally formally defined as a set of tuples. Sometimes one can simply be too hifalutin' [ http://www.dictionary.com/browse/hifalutin], and language that attempts to be precise obscures meaning to the less specialised reader. See also the more general Wikipedia definition "An entity is something that

On 12 October 2017 at 11:20, Steve Holden <steve@holdenweb.com> wrote:
But rows and tuples are usually immutable, at least in database terms. These data classes are not immutable (by default). If you want tuple-like behaviour, you can continue to use tuples. I see dataclasses as something closer to C `struct`. Most likely someone already considered `struct` as name; if not, please consider it. Else stick with dataclass, it's a good name IMHO.

I am still firmly convinced that @dataclass is the right name for the decorator (and `dataclasses` for the module). -- --Guido van Rossum (python.org/~guido)

On Oct 12, 2017, at 10:46, Guido van Rossum <guido@python.org> wrote:
I am still firmly convinced that @dataclass is the right name for the decorator (and `dataclasses` for the module).
Darn, and I was going to suggest they be called EricTheHalfABees, with enums being renamed to EricTheHalfNotBees. -Barry

On Oct 12, 2017, at 7:46 AM, Guido van Rossum <guido@python.org> wrote:
I am still firmly convinced that @dataclass is the right name for the decorator (and `dataclasses` for the module).
+1 from me. The singular/plural pair has the same nice feel as "from fractions import Fraction", "from itertools import product" and "from collections import namedtuple". Raymond

On Thu, Oct 12, 2017 at 3:20 AM, Steve Holden <steve@holdenweb.com> wrote:
Is the intent that these things preserve order? in which case, I like row is OK (though still don't see what's wrong with record). I still dop'nt love it though -- it gives the expectation of a row in a data table )or csv file, or.. which will be a common use case, but really, it doesn't conceptually have anything to do with tabular data. in fact, one might want to store a bunch of these in, say, a 2D (or 3D) array, then row would be pretty weird.... I don't much like entity either -- it is either way to generic -- everyting is an entity! even less specific than "object". Or two specific (and incorrect) in the lexicon of particular domains. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Oops, I forgot the link. It should show up shortly at https://www.python.org/dev/peps/pep-0557/. Eric. On 9/8/17 7:57 AM, Eric V. Smith wrote:

Interesting. I note that this under "Specification": """ field's may optionally specify a default value, using normal Python syntax: @dataclass class C: int a # 'a' has no default value int b = 0 # assign a default value for 'b' """ ...does not look like "normal Python syntax". On Fri, Sep 8, 2017 at 11:44 AM Eric V. Smith <eric@trueblade.com> wrote:

On 9/8/2017 11:01 AM, Eric V. Smith wrote:
Oops, I forgot the link. It should show up shortly at https://www.python.org/dev/peps/pep-0557/.
And now I've pushed a version that works with Python 3.6 to PyPI at https://pypi.python.org/pypi/dataclasses It implements the PEP as it currently stands. I'll be making some tweaks in the coming weeks. Feedback is welcomed. The repo is at https://github.com/ericvsmith/dataclasses Eric.

+1 Overall, this looks very well thought out. Nice work! Once you get agreement on the functionality, name bike-shedding will likely be next. In a way, all classes are data classes so that name doesn't tell me much. Instead, it would be nice to have something suggestive of what it actually does which is automatically adding boilerplate methods to a general purpose class. Perhaps, @boilerplate or @autoinit or some such. Raymond

On 11 September 2017 at 12:27, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Once you get agreement on the functionality, name bike-shedding will likely be next. In a way, all classes are data classes so that name doesn't tell me much. Instead, it would be nice to have something suggestive of what it actually does which is automatically adding boilerplate methods to a general purpose class. Perhaps, @boilerplate or @autoinit or some such.
"data class" is essentially short for "declarative data class" or "data-centric class": as a class author the decorator allows you to focus on declaring the data fields, and *not* on procedurally defining how those fields are initialised (and compared, and displayed, and hashed, ...) the way you do with a traditional imperative class definition. When I changed the name of contextlib.ignored to the more cryptic contextlib.suppress, I made the mistake of letting the folks that knew how the context manager worked dictate the name, rather than allowing it to keep the name that described what it was for. I think the same will apply here: we'll get a better name if we focus on describing the problem the capability solves in the simplest possible terms than we will if we choose something that more accurately describes how it is implemented. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 9/10/17 10:27 PM, Raymond Hettinger wrote:
Thank you.
Once you get agreement on the functionality, name bike-shedding will likely be next. In a way, all classes are data classes so that name doesn't tell me much. Instead, it would be nice to have something suggestive of what it actually does which is automatically adding boilerplate methods to a general purpose class. Perhaps, @boilerplate or @autoinit or some such.
There was some discussion on naming at https://github.com/ericvsmith/dataclasses/issues/12. In that issue, Guido said use Data Classes (the concept), dataclasses (the module) and dataclass (the decorator). I think if someone came up with an awesomely better name, we could all be convinced to use it. But so far, nothing's better. Eric.

On 2017-09-11 05:26, Eric V. Smith wrote:
On 9/10/17 10:27 PM, Raymond Hettinger wrote:
I've typically used these type of objects as records. When in an irreverent mood I've called them bags. The short name is helpful as they get used all over the place. I'll add Nick's "declarative" as it describes the problem well from another angle: - record - bag - declarative Anyone like these? I find them more intuitive than the existing name. Also, considering their uses, it might make sense to put them in the collections module. -Mike

- record +1 This really does match well with the record concept in databases, and most people are familiar with that. Though it will. E a touch confusing until (if ever) most of the database and cab traders, etc start using them. It also matches pretty well with numpy "record arrays": https://docs.scipy.org/doc/numpy-1.13.0/user/basics.rec.html - bag I also like this -- not many folks will have a ore-conceived notion of what it is. - declarative Yeach-- that's an adjective (at least on most common use) -- and a programming term that means something else. Also, considering their uses, it might make sense to put them in the collections module. Yup. -CHB

On Sep 12, 2017, at 9:01 AM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
This really does match well with the record concept in databases, and most people are familiar with that. Though it will. E a touch confusing until (if ever) most of the database and cab traders, etc start using them.
I REALLY need to stop quickly posting from my phone... ... Though it will be a touch confusing until (if ever) most of the database and csv readers etc. start using them. -CHB

On 13 September 2017 at 02:01, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
This really does match well with the record concept in databases, and most people are familiar with that.
No, most people aren't familiar with that - they only become familiar with it *after* they've learned to program and learned what a database is.
Though it will. E a touch confusing until (if ever) most of the database and cab traders, etc start using them.
Aside from the potential confusion with other technical uses of "record", a further problem with "record" is that it's ambiguous as to whether its referring to the noun (wreck-ord) or the verb (ree-cord). Even if folks correctly interpret it as a noun, there's still plenty of opportunities for folks to guess incorrectly about what it means based on the other conventional English uses of the word (e.g. a "personal record" will consist of multiple "records" in the database sense). So in this case, the vagueness of "data class" is considered a feature - since it doesn't inherently mean *anything*, folks are more likely to realise that they need to look up "Python data class", and if I search for that in a private window, the first Google hit is https://stackoverflow.com/questions/3357581/using-python-class-as-a-data-con... and the second is Eric's PEP.
Also, considering their uses, it might make sense to put them in the collections module.
Data classes are things you're likely to put *in* a collection, rather than really being collections themselves (they're only collections in the same sense that all Python classes are collections of attributes, and that's not the way the collections module uses the term). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Sep 12, 2017 at 7:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think "data classes" is a fine moniker for this concept. It's ironic that some people dislike "data classes" because these are regular classes, not just for data, while others are proposing alternative names that emphasize the data container aspect. So "data classes" splits the difference, by referring to both data and classes. Let's bikeshed about something else. -- --Guido van Rossum (python.org/~guido)

On 2017-09-12 21:05, Guido van Rossum wrote:
True that these data-classes will be a superset of a traditional record. But, we already have objects and inheritance for those use cases. The data-class is meant to be used primarily like a record, so why not name it that way? Almost everything is extensible in Python; that shouldn't prevent focused names, should it?
Let's bikeshed about something else.
An elegant name can make the difference between another obscure module thrown in the stdlib to be never seen again and one that gets used every day. Which is more intuitive? from collections import record from dataclass import dataclass Would the language be as nice if "object" was named an "instanceclass?" Or perhaps the "requests" module could have been named "httpcall." Much of the reluctance to use the attrs module is about its weird naming. Due to the fact that this is a simple, potentially ubiquitous enhancement an elegant name is important. "For humans," or something, haha. -Mike

On Thu, Sep 14, 2017 at 10:24:52AM -0700, Mike Miller wrote:
I'd expect something like a C struct or an ML record.
from dataclass import dataclass
This is more intuitive, since the PEP example also has attached methods like total_cost(). I don't think this is really common for records. Stefan Krah

On 2017-09-14 10:45, Stefan Krah wrote:
I'd expect something like a C struct or an ML record.
Struct is taken, and your second example is record.
Every class can be extended, does that mean they can't be given appropriate names? (Not to mention dataclass is hardly intuitive for something that can have methods added.) -Mike

On Thu, Sep 14, 2017 at 11:06:15AM -0700, Mike Miller wrote:
*If* the name were collections.record, I'd expect collections.record to be something like a C struct or an ML record. I'm NOT proposing "record".
A class is not a record. This brief conversation already convinced me that "record" is a bad name for the proposed construct. Stefan Krah

Le 14/09/2017 à 19:24, Mike Miller a écrit :
Because given how convenient it is, it will most probably becomes the default way to write classes in Python. Not just for record. Everybody end up wishing for a less verbose way to write day to day classes after a while.

On 2017-09-15 05:08, Michel Desmoulin wrote:
Because given how convenient it is, it will most probably becomes the default way to write classes in Python. Not just for record.
Yes, would have been great if this was how the original object worked and the current barebones object was a base(object) or something like that. Too late however. Another option was "bag" which is more generic and brief, and might seem to fit better, but the discussion went towards record. -Mike

On 2017-09-12 19:09, Nick Coghlan wrote:
Pretty sure he was talking about programmers, and they are introduced to the concept early. Structs, objects with fields, random access files, databases, etc. Lay-folks are familiar with "keeping records" as you mention, but they are not the primary customer it seems. Record is the most common name for this ubiquitous concept.
whether its referring to the noun (wreck-ord) or the verb (ree-cord).
This can be grasped from context quickly, and due to mentioned ubiquity, not likely to be a problem in the real world. "Am I going to ree-cord this class?"
Yes, a collection of attributes, not significantly different than the namedtuple (that began this thread) or the various dictionaries implemented there already. The criteria doesn't appear to be very strict, should it be? (Also, could be put into a submodule and imported into it maintain modularity. Where it lands though isn't so important, just that collections is relatively likely to be imported already on medium sized projects, and I think the definition fits, collections == "bags of stuff".) Cheers, -Mike

On Sep 14, 2017, at 09:56, Mike Miller <python-dev@mgmiller.net> wrote:
Record is the most common name for this ubiquitous concept.
Mind if we call them Eric Classes to keep it clear? Because if its name is not Eric Classes, it will cause a little confusion. g’day-bruce-ly y’rs, -Barry

On 15 September 2017 at 02:56, Mike Miller <python-dev@mgmiller.net> wrote:
Python is an incredibly common first programming language, so we need to keep folks with *zero* knowledge of programming jargon firmly in mind when designing new features. That isn't always the most important consideration, but it's always *a* consideration. And, as Stefan notes in his reply, we also need to keep *misleading* inferences in mind when we consider repurposing existing jargon for a new use case - what seems like an obviously intuitive connection based on our own individual experiences with a term may turn out to be extremely counterintuitive for someone with a different experience of the same term. In such cases, it can make sense to look for new *semantically neutral* terminology as the official glossary entry and API naming scheme, and rely on documentation to indicate that this is a realisation of a feature that goes by other names in other contexts. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Eric, A few quick comments: Why do you even have a hash= argument on individual fields? For the whole class, I can imagine you might want to explicitly mark a whole class as unhashable, but it seems like the only thing you can do with the field-level hash= argument is to create a class where the __hash__ and __eq__ take different fields into account, and why would you ever want that? Though honestly I can see a reasonable argument for removing the class-level hash= option too. And even if you keep it you might want to error on some truly nonsensical options like defining __hash__ without __eq__. (Also watch out that Python's usual rule about defining __eq__ blocking the inheritance of __hash__ does not kick in if __eq__ is added after the class is created.) I've sometimes wished that attrs let me control whether it generated equality methods (eq/ne/hash) separately from ordering methods (lt/gt/...). Maybe the cmp= argument should take an enum with options none/equality-only/full? The "why not attrs" section kind of reads like "because it's too popular and useful"? -n On Sep 8, 2017 08:44, "Eric V. Smith" <eric@trueblade.com> wrote: Oops, I forgot the link. It should show up shortly at https://www.python.org/dev/peps/pep-0557/. Eric. On 9/8/17 7:57 AM, Eric V. Smith wrote:

On 9/10/17 11:08 PM, Nathaniel Smith wrote:
The use case is that you have a cache, or something similar, that doesn't affect the object identity.
Yeah, I've thought about this, too. But I don't have any use case in mind, and if it hasn't come up with attrs, then I'm reluctant to break new ground here.
The "why not attrs" section kind of reads like "because it's too popular and useful"?
I'll add some words to that section, probably focused on typing compatibility. My general feeling is that attrs has some great design decisions, but goes a little too far (e.g., conversions, validations). As with most things we add, I'm trying to be as minimalist as possible, while still being widely useful and allowing 3rd party extensions and future features. Eric.

Hi Eric, I have on question not addressed yet. The implementation is based on "__annotations__" where the type is specified. But "__annotations__" is not always filled. An interpreter version with special optimization could remove all __annotations__ for performance reasons. (Discussed in other threads) In this case the dataclass does not work or will there be a fallback? I know it is a little bit hypothetical because an interpreter with this optimization is not there yet. I am looking only in the future a bit. Asking this because type annotations are stated as completely optional for Python. And this use case will break this assumption. Personally I am a heavy user of attrs and happy to have a dataclass in the std lib. Regards Wolfgang

On 9/11/17 9:43 AM, tds333@mailbox.org wrote:
Yes, if there are no __annotations__, then Data Classes would break. typing.NamedTuple has the same issue. We discussed it a little bit last week, but I don't think we came to any conclusions. Since @dataclass ignores the value of the annotation (except for typing.ClassVar), it would continue to work if the type was present, buy maybe mapped to None or similar. Eric.

On Mon, Sep 11, 2017 at 6:51 AM, Eric V. Smith <eric@trueblade.com> wrote:
Let's not worry about a future where there's no __annotations__. Type annotations will gradually become more mainstream. You won't have to use them, but some newer features of the language will be inaccessible if you don't. This has already started with the pattern based on inheriting from typing.NamedTuple and using field annotations. Dataclasses are simply another example. (That said, I strongly oppose *runtime type checking* based on annotations. It's by and large a mistaken idea. But this has nothing to do with that.) -- --Guido van Rossum (python.org/~guido)

On Mon, Sep 11, 2017 at 5:32 AM, Eric V. Smith <eric@trueblade.com> wrote:
But wouldn't this just be field(cmp=False), no need to fiddle with hash=?
https://github.com/python-attrs/attrs/issues/170
If the question is "given that we're going to add something to the stdlib, why shouldn't that thing be attrs?" then I guess it's sufficient to say "because the attrs developers didn't want it". But I think the PEP should also address the question "why are we adding something to the stdlib, instead of just recommending people install attrs". -n

On 9/11/2017 12:34 PM, Nathaniel Smith wrote:
Ah, true. You're right, I can't see any good use for setting hash on a field that isn't already controlled by cmp. I think field level hash can go.
At the class level, I think it makes more sense. But I'll write up some motivating examples.
I'll respond to other emails about this, probably tomorrow. Eric.

On Sep 10, 2017, at 20:08, Nathaniel Smith <njs@pobox.com> wrote:
I've sometimes wished that attrs let me control whether it generated equality methods (eq/ne/hash) separately from ordering methods (lt/gt/...). Maybe the cmp= argument should take an enum with options none/equality-only/full?
I have had use cases where I needed equality comparisons but not ordered comparisons, so I’m in favor of the option to split them. (atm, I can’t bring up a specific case, but it’s not uncommon.) Given that you only want to support the three states that Nathaniel describes, I think an enum makes the most sense, and it certainly would read well. I.e. there’s no sense in supporting the ordered comparisons and not equality, so that’s not a state that needs to be represented. I’d make one other suggestion here: please let’s not call the keyword `cmp`. That’s reminiscent of Python 2’s `cmp` built-in, which of course doesn’t exist in Python 3. Using `cmp` is just an unnecessarily obfuscating abbreviation. I’d suggest just `compare` with an enum like so: enum Compare(enum.Enum): none = 1 unordered = 2 ordered = 3 One thing I can’t avoid is DRY: from dataclasses import Compare, dataclass @dataclass(compare=Compare.unordered) class Foo: # … Maybe exposing the enum items in the module namespace? ——dataclasses/__init__.py----- from enum import Enum class Compare(Enum): none = 1 unordered = 2 ordered =3 none = Compare.none unordered = Compare.unordered ordered = Compare.ordered ——dataclasses/__init__.py----- from dataclasses import dataclass, unordered @dataclass(compare=unordered) class Foo: # … Cheers, -Barry

Oddly I don't like the enum (flag names get too long that way), but I do agree with everything else Barry said (it should be a trivalue flag and please don't name it cmp). On Mon, Sep 11, 2017 at 3:16 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
-- --Guido van Rossum (python.org/~guido)

On 9/11/2017 6:28 PM, Guido van Rossum wrote:
So if we don't do enums, I think the choices are ints, strs, or maybe True/False/None. Do you have a preference here? If int or str, I assume we'd want module-level constants. I like the name compare=, and 3 values makes sense: None, Equality, Ordered. Eric.

On Sep 11, 2017, at 18:36, Eric V. Smith <eric@trueblade.com> wrote:
+1 for the name, the 3 values, and making them module constants. After that, I don’t think it really matters what their implementation is. User code will look the same either way. One minor nice effect of using an enum is that the dataclass function can use `is` instead of `==` to compare keyword argument values. -Barry

Or we could just have two arguments, eq=<bool> and order=<bool>, and some rule so that you only need to specify one or the other but not both. (E.g. order=True implies eq=True.) That seems better than needing new constants just for this flag. On Mon, Sep 11, 2017 at 6:49 PM, Barry Warsaw <barry@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)

On Sep 11, 2017, at 19:16, Guido van Rossum <guido@python.org> wrote:
Or we could just have two arguments, eq=<bool> and order=<bool>, and some rule so that you only need to specify one or the other but not both. (E.g. order=True implies eq=True.) That seems better than needing new constants just for this flag.
You’d have to disallow the combination `order=True, eq=False` then, right? Or would you ignore eq for any value of order=True? Seems like a clumsier API than a single tri-value parameter. Do the module constants bother you that much? -Barry

On Mon, Sep 11, 2017 at 8:21 PM, Barry Warsaw <barry@python.org> wrote:
Yes they do. You may have to import them, or you have to prefix them with the module name -- whereas keyword args and True/False require neither. We could disallow order=True, eq=True. Or we could have the default being to generate __eq__, __ne__ and __hash__, and a flag to prevent these (since equality by object identity is probably less popular than equality by elementwise comparison). Perhaps: order: bool = False eq: bool = True and disallowing order=True, eq=False. -- --Guido van Rossum (python.org/~guido)

On 09/11/2017 03:28 PM, Guido van Rossum wrote:
Oddly I don't like the enum (flag names get too long that way), but I do agree with everything else Barry said (it should be a trivalue flag and please don't name it cmp).
Hmmm, named constants are one of the motivating factors for having an Enum type. It's easy to keep the name a reasonable length, however: export them into the module-level namespace. re is an excellent example; the existing flags were moved into a FlagEnum, and then (for backwards compatibility) aliased back to the module level: class RegexFlag(enum.IntFlag): ASCII = sre_compile.SRE_FLAG_ASCII # assume ascii "locale" IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode "locale" MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments A = ASCII I = IGNORECASE L = LOCALE U = UNICODE M = MULTILINE S = DOTALL X = VERBOSE # sre extensions (experimental, don't rely on these) TEMPLATE = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking T = TEMPLATE DEBUG = sre_compile.SRE_FLAG_DEBUG # dump pattern after compilation globals().update(RegexFlag.__members__) So we can still do re.I instead of re.RegexFlag.I. Likewise, if we had: class Compare(enum.Enum): NONE = 'each instance is an island' EQUAL = 'instances can be equal to each other' ORDERED = 'instances can be ordered and/or equal' globals().update(Compare.__members__) then we can still use, for example, EQUAL, but get the more informative repr and str when we need to. -- ~Ethan~

On 9/10/2017 11:08 PM, Nathaniel Smith wrote:
See the discussion at https://github.com/ericvsmith/dataclasses/issues/44 for why we're keeping the field-level hash function. Quick version, from Guido: "There's a legitimate reason for having a field in the eq but not in the hash. After all hash is always followed by an eq check and it is totally legit for the eq check to say two objects are not equal even though their hashes are equal." This would be in the case where a field should be used for equality testing, but computing its hash is expensive. Eric.

On 8 September 2017 at 15:57, Eric V. Smith <eric@trueblade.com> wrote:
Looks good! One minor point - apparently in your happy place, C and Python have the same syntax :-) """ field's may optionally specify a default value, using normal Python syntax: @dataclass class C: int a # 'a' has no default value int b = 0 # assign a default value for 'b' """

On 8 September 2017 at 07:57, Eric V. Smith <eric@trueblade.com> wrote:
Very nice!
My one technical question about the PEP relates to the use of an exact type check in the comparison methods, rather than "isinstance(other, self.__class__)". I think I agree with that decision, but it isn't immediately obvious that the class identity is considered part of the instance value for a data class, so if you do: @dataclass class BaseItem: value: Any class DerivedItem: pass Then instances of DerivedItem *won't* be considered equivalent to instances of BaseItem, and they also won't be orderable relative to each other, even though "DerivedItem" doesn't actually add any new data fields. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Sep 08, 2017 at 10:37:12AM -0700, Nick Coghlan wrote:
I haven't read the whole PEP in close detail, but that method stood out for me too. Only, unlike Nick, I don't think I agree with the decision. I'm also not convinced that we should be adding ordered comparisons (__lt__ __gt__ etc) by default, if these DataClasses are considered more like structs/records than tuples. The closest existing equivalent to a struct in the std lib (apart from namedtuple) is, I think, SimpleNamespace, and they are unorderable. -- Steve

On 09/08/2017 11:38 AM, Steven D'Aprano wrote:
I'll split the difference. ;) I agree with D'Aprano that an isinstance check should be used, but I disagree with him about the rich comparison methods -- please include them. I think unordered should only be the default when order is impossible, extremely difficult, or nonsensical. -- ~Ethan~

On 2017-09-08 07:57, Eric V. Smith wrote:
I've written a PEP for…
Apologies for the following list dumb questions and bikesheds: - 'Classes can be thought of as "mutable namedtuples with defaults".' - A C/C++ (struct)ure sounds like a simpler description that many more would understand. - dataclass name: - class, redundant - data, good but very common - struct, used? - Record? (best I could come up with) - Source needs blanks between functions, hard to read. - Are types required? Maybe an example or two with Any? - Intro discounts inheritance and metaclasses as "potentially interfering", but unclear why that would be the case. Inheritance is easy to override, metaclasses not sure? - Perhaps mention ORMs/metaclass approach, as prior art: https://docs.djangoproject.com/en/dev/topics/db/models/ - Perhaps mention Kivy Properties, as prior art: https://kivy.org/docs/api-kivy.properties.html - For mutable default values: @dataclass class C: x: list # = field(default_factory=list) Could it detect list as a mutable class "type" and set it as a factory automatically? The PEP/bug #3 mentions using copy, but that's not exactly what I'm asking above.

On 2017-09-08 07:57, Eric V. Smith wrote:
I've written a PEP for…
Apologies for the following list dumb questions and bikesheds: - 'Classes can be thought of as "mutable namedtuples with defaults".' - A C/C++ (struct)ure sounds like a simpler description that many more would understand. - dataclass name: - class, redundant - data, good but very common - struct, used? - Record? (best I could come up with) - Source needs blanks between functions, hard to read. - Are types required? Maybe an example or two with Any? - Intro discounts inheritance and metaclasses as "potentially interfering", but unclear why that would be the case. Inheritance is easy to override, metaclasses not sure? - Perhaps mention ORMs/metaclass approach, as prior art: https://docs.djangoproject.com/en/dev/topics/db/models/ - Perhaps mention Kivy Properties, as prior art: https://kivy.org/docs/api-kivy.properties.html - For mutable default values: @dataclass class C: x: list # = field(default_factory=list) Could it detect list as a mutable class "type" and set it as a factory automatically? The PEP/bug #3 mentions using copy, but that's not exactly what I'm asking above.

On 9/8/17 3:20 PM, Mike Miller wrote:
Yes, other people have pointed out that this might not be the best "elevator pitch" example. I'm thinking about it.
There was a bunch of discussions on this. We're delaying the name bikeshedding for later (and maybe never).
- Source needs blanks between functions, hard to read.
It's supposed to be hard to read! You're just supposed to think "am I glad I don't have to read or write that". But I'll look at it.
- Are types required?
Annotations are required, the typing module is not.
Maybe an example or two with Any?
I'd rather leave it like it is: typing is referenced only once, for ClassVar.
I don't really want to get in to the history of why people don't like inheritance, single and multi. Or how metaclass magic can make life difficult. I just want to point out that Data Classes don't interfere at all.
Those are all good. Thanks.
The problem is: how do you know what's a mutable type? There's no general way to know. The behavior in the PEP is just mean to stop the worst of it. I guess we could have an option that says: call the type to create a new, empty instance. @dataclass class C: x: list = field(default_type_is_factory=True) Thanks for the critical reading and your comments. I'm going to push a new version early next week, when I get back from traveling. Eric.

I think it would be useful to write 1-2 sentences about the problem with inheritance -- in that case you pretty much have to use a metaclass, and the use of a metaclass makes life harder for people who want to use their own metaclass (since metaclasses don't combine without some manual intervention). On Fri, Sep 8, 2017 at 3:40 PM, Eric V. Smith <eric@trueblade.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 9 September 2017 at 01:00, Guido van Rossum <guido@python.org> wrote:
I think it would be useful to write 1-2 sentences about the problem with inheritance -- in that case you pretty much have to use a metaclass,
It is not the case now. I think __init_subclass__ has almost the same possibilities as a decorator, it just updates an already created class and can add some methods to it. This is a more subtle question, these two for example would be equivalent: from dataclass import Data, Frozen class Point(Frozen, Data): x: int y: int and from dataclass import dataclass @dataclass(frozen=True) class Point: x: int y: int But the problem with inheritance based pattern is that it cannot support automatic addition of __slots__. Also I think a decorator will be easier to maintain. But on the other hand I think inheritance based scheme is a bit more readable. -- Ivan

Hi list, first, a big thanks to the authors of PEP 557! Great idea! For me, the dataclasses were a typical example for inheritance, to be more precise, for metaclasses. I was astonished to see them implemented using decorators, and I was not the only one, citing Guido:
Python is at a weird point here. At about every new release of Python, a new idea shows up that could be easily solved using metaclasses, yet every time we hesitate to use them, because of said necessary manual intervention for metaclass combination. So I think we have two options now: We could deprecate metaclasses, going down routes like PEP 487's __init_subclass__. Unfortunately, for data classes __init_subclass__ it is too late in the class creation process for it to influence the __slots__ mechanism. A __new_subclass__, that acts earlier, could do the job, but to me that simply sounds like reinventing the wheel of metaclasses. The other option would be to simply make metaclasses work properly. We would just have to define a way to automatically combine metaclasses. Guido once mention once (here: https://mail.python.org/pipermail/python-dev/2017-June/148501.html) that he left out automatic synthesis of combined metaclasses on purpose, but given that this seems to be a major problem, I think it is about time to overthink this decision. So I propose to add such an automatic synthesis. My idea is that a metaclass author can define the __or__ and __ror__ methods for automatic metaclass synthesis. Then if a class C inherits from two classes A and B with metaclasses MetaA and MetaB, the metaclass would be MetaA | MetaB. Greetings Martin

You're right that if it were easier to combine metaclasses we would not shy away from them so easily. Perhaps you and others interested in this topic can try to prototype an implementation and see how it would work in practice (with some realistic existing metaclasses)? Then the next step would be to write a PEP. But in this case I really recommend trying to implement it first (in pure Python) to see if it can actually work. On Thu, Oct 12, 2017 at 11:21 AM, Martin Teichmann <lkb.teichmann@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

I think we've drifted into a new topic, but... I was astonished to see them
implemented using decorators, and I was not the only one.
...
I was thinking about this last spring, when I tried to cram all sorts of python metaprogramming into one 3hr class... Trying to come up with a an exam[ple for metclasses, I couldn't come up with anything that couldn't be done more claerly (to me) with a class decorator. I also found some commentary on the web (sorry, no links :-( ) indicating that metacalsses were added before class decorators, and that they really don't have a compelling use case any more. Now it seem that not only do they not have a compelling use case, in some (many) instances, there are compelling reasons to NOT use them, and rather use decorators. So why deprecate them? or at least discourage their use? The other option would be to simply make metaclasses work properly. We
would just have to define a way to automatically combine metaclasses.
"just"? Anyway, let's say that is doable -- would you then be able to do something with metaclasses that you could not do with decorators? or it in a cleaner, easier to write or understand way? There-should-be-one--and-preferably-only-one--obvious-way-to-do-it-ly yours, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 10/12/2017 03:44 PM, Chris Barker wrote:
I think we've drifted into a new topic, but...
The Enum data type requires metaclasses. Any time you want to modify the behavior of a class (not its instances, the class itself) you need a metaclass. Agreed that it's pretty rare, but we need them. -- ~Ethan~

Thanks for the info. On 2017-09-08 15:40, Eric V. Smith wrote:
Guess I really meant "object" or "type" above, not "typing.Any." For a decade or two, my use of structs/records in Python has been dynamically-typed and that hasn't been an issue. As the problem this PEP is solving is orthogonal to typing improvements, it feels a bit like typing is being coupled and pushed with it, whether wanted or not. Not that I dislike typing mind you (appreciated on projects with large teams), it's just as mentioned not related to the problem of class definition verbosity or lacking functionality. Cheers, -Mike

On 10 September 2017 at 23:05, Mike Miller <python-ideas@mgmiller.net> wrote:
[...] As the problem this PEP is solving is orthogonal to typing improvements
This is not the case, static support for dataclasses is an import point of motivation. It is hard to support static typing for many third party packages like attrs, since they use a lot of "magic". -- Ivan

On 2017-09-10 14:23, Ivan Levkivskyi wrote:
This is not the case, static support for dataclasses is an import point of motivation.
I've needed this functionality a decade before types became cool again. ;-)
As mentioned, nothing against static typing, would simply like an example without, to show that it is not required. -Mike

Thanks for the info. On 2017-09-08 15:40, Eric V. Smith wrote:
Guess I really meant "object" or "type" above, not "typing.Any." For a decade or two, my use of structs/records in Python has been dynamically-typed and that hasn't been an issue. As the problem this PEP is solving is orthogonal to typing improvements, it feels a bit like typing is being coupled and pushed with it, whether wanted or not. Not that I dislike typing mind you (appreciated on projects with large teams), it's just as mentioned not related to the problem of class definition verbosity or lacking functionality. Cheers, -Mike

Using type annotations buys two things. One is the concise syntax: it's how the decorator finds the fields. In the simple case, it's what let's you define a data class without using the call to fields(), which is analogous to attrs's attr.ib(). The other thing it buys you is compatibility with type checkers. As Ivan said, the design was very careful to be compatible with type checkers. So other than requiring some type as an annotation, there's no dependency added for typing in the genera sense. You can use a type of object if you want, or just lie to it and say None (though I'm not recommending that!). It's completely ignored at runtime (except for the ClassVar case, as the PEP states). I'll think about adding some more language to the PEP about it. Eric.

Hi, first post here. My two cents: Here's a list of "prior arts" that I have collected over the years, besides attrs, that address similar needs (and often, much more): - https://github.com/bluedynamics/plumber - https://github.com/ionelmc/python-fields - https://github.com/frasertweedale/elk - https://github.com/kuujo/yuppy Regarding the name, 'dataclass', I agree that it can be a bit misleading (my first idea of a "dataclass" would be a class with only data and no behaviour, e.g. a 'struct', a 'record', a DTO, 'anemic' class, etc.). Scala has 'case classes' with some similarities ( https://docs.scala-lang.org/tour/case-classes.html). Regards, S. On Fri, Sep 8, 2017 at 4:57 PM, Eric V. Smith <eric@trueblade.com> wrote:
-- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group / Systematic Cluster - http://www.gt-logiciel-libre.org/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyData Paris - http://pydata.fr/ --- “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.” — R. Buckminster Fuller

Hi, it is not clear whether anything is done to total_cost: def total_cost(self) -> float: Does this become a property automatically, or is it still a method call? To that end, some examples of *using* a data class, not just defining one, would be helpful. If it remains a normal method, why put it in this example at all? Makes little sense... Otherwise I really like this idea, thanks! On 8 September 2017 at 15:57, Eric V. Smith <eric@trueblade.com> wrote:
-- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert

On 9/9/2017 11:41 AM, Gustavo Carneiro wrote:
Nothing is done with total_cost, it's still a method. It's meant to show that you can use methods in a Data Class. Maybe I should add a method that has a parameter, or at least explain why that method is present in the example. I'm not sure how I'd write an example showing you can you do everything you can with an undecorated class. I think some text explanation would be better. Eric.

The reaction is overwhelmingly positive everywhere: hacker news, reddit, twitter. People have been expecting something like that for a long time. 3 questions: - is providing validation/conversion hooks completely out of the question of still open for debate ? I know it's to keep the implementation simple but have a few callbacks run in the __init__ in a foo loop is not that much complexity. You don't have to provide validators, but having a validators parameters on field() would be a huge time saver. Just a list of callables called when the value is first set, potentially raising an exception that you don't even need to process in any way. It returns the value converted, and voilà. We all do that every day manually. - I read Guido talking about some base class as alternative to the generator version, but don't see it in the PEP. Is it still considered ? - any chance it becomes a built in later ? When classes have been improved in Python 2, the object built-in was added. Imagine if we had had to import it every time... Or maybe just plug it to object like @object.dataclass.

On 9/10/2017 10:00 AM, Michel Desmoulin wrote:
The reaction is overwhelmingly positive everywhere: hacker news, reddit, twitter.
Do you have a pointer to the Hacker News discussion? I missed it.
People have been expecting something like that for a long time.
Me, too!
I don't particularly want to add validation specifically. I want to make it possible to add validation yourself, or via a library. What I think I'll do is add a metadata parameter to fields(), defaulting to None. Then you could write a post-init hook that does whatever single- and multi-field validations you want (or whatever else you want to do). Although this plays poorly with "frozen" classes: it's always something! I'll think about it. To make this most useful, I need to get the post-init hook to take an optional parameter so you can get data to it. I don't have a good way to do this, yet. Suggestions welcomed. Although if the post-init hook takes a param that you can pass in at object creation time, I guess there's really no need for a per-field metadata parameter: you could use the field name as a key to look up whatever you wanted to know about the field.
- I read Guido talking about some base class as alternative to the generator version, but don't see it in the PEP. Is it still considered ?
I'm going to put some words in explaining why I don't want to use base classes (I don't think it buys you anything). Do you have a reason for preferring base classes?
Because of the choice of using module-level functions so as to not introduce conflicts in the object's namespace, it would be difficult to make this a builtin. Although now that I think about it, maybe what are currently module-level functions should instead be methods on the "dataclass" decorator itself: @dataclass class C: i: int = dataclass.field(default=1, init=False) j: str c = C('hello') dataclass.asdict(c) {'i': 1, 'j': 'hello'} Then, "dataclass" would be the only name the module exports, making it easier to someday be a builtin. I'm not sure it's important enough for this to be a builtin, but this would make it easier. Thoughts? I'm usually not a fan of having attributes on a function: it's why itertools.chain.from_iterable() is hard to find. Eric.

On Sun, Sep 10, 2017 at 9:36 AM, Eric V. Smith <eric@trueblade.com> wrote:
The temptation to make everything a builtin should be resisted.
Let's not do that. It would be better to design the module so that people can write `from dataclasses import *` and they will only get things that are clearly part of dataclasses (I guess dataclass, field, asdict, and a few more like that). That way people who really want this to look like a builtin can just violate PEP 8. -- --Guido van Rossum (python.org/~guido)

Le 10/09/2017 à 18:36, Eric V. Smith a écrit :
Err... I may have been over enthusiastic and created the hacker news thread in my mind.
It doesn't really allow you to do anything you couldn't do as easily as in __init__. Alternatively, you could have a "on_set" hooks for field(), that just take the field value, and return the field value. By default, it's an identity function and is always called (minus implementation optimizations): from functools improt reduce self.foo = reduce((lambda data, next: next(*data)), on_set_hooks, (field, val)) And people can do whatever they want: default values, factories, transformers, converters/casters, validation, logging...
Not preferring, but having it as an alternative. Mainly for 2 reasons: 1 - data classes allow one to type in classes very quickly, let's harvest the benefit from that. Typing a decorator in a shell is much less comfortable than using inheritance. Same thing about IDE: all current ones have snippet with auto-switch to the class parents on tab. All in all, if you are doing exploratory programming, and thus disposable code, which data classes are fantastic for, inheritance will keep you in the flow. 2 - it will help sell the data classes I train a lot of people to Python each year. I never have to explain classes to people with any kind of programming background. I _always_ have to explain decorators. People are not used to it, and even kind fear it for quite some time. Inheritance however, is familiar, and will not only push people to use data classes more, but also will let them do less mistakes: they know the danger of parent ordering, but not the ones of decorators ordering.

Thanks for the PEP! :) I like the naming. ;) Though, I would like to add to Michel's argument in favor of a base class. On 11.09.2017 08:38, Michel Desmoulin wrote:
3) - the order of base classes can arranged appropriately In our day-to-day work, we use mixins and cooperative multiple inheritance a lot. So, having dataclasses as a base class or a mixin would be great! :) Combined with 1) and 2), I am much in favor of having dataclasses as base class/mixin than as a decorator. What are the benefits of the decorator? Maybe both is possible? Cheers, Sven PS: @Michel good observation 1). Typing decorators in shell is annoying.

I make this suggestion in trepidation, given that Guido called a halt on the Great Naming Debate, but it seems that a short, neutral name with data connotations previously not a part of many popular subsystems is required. I therefore propose "row", which is sufficiently neutral to avoid most current opposition and yet a common field-oriented mechanism for accessing units of retrieved data by name. regards Steve Steve Holden On Sat, Sep 16, 2017 at 3:44 PM, Sven R. Kunze <srkunze@mail.de> wrote:

(Apologies for reviving a dead horse, but may not be around at the blessed time.) As potential names of this concept, I liked record and row, but agreed they were a bit too specific and not quite exact. In my recent (unrelated) reading however, I came across another term and think it might fit better, called an "entity." It has some nice properties: - Traditional dictionary definition, meaning "thing" - Same specificity as the current base-class name: object - Corresponds to a class or instance (depending on context) in data terminology From: http://ewebarchitecture.com/web-databases/database-entities An entity is a thing or object of importance about which data must be captured. Information about an entity is captured in the form of attributes and/or relationships. All things aren't entities—only those about which information should be captured. If something is a candidate for being an entity and it has no attributes or relationships, it isn't an entity. Thoughts? Another candidate is "container" but is not very descriptive. -Mike On 2017-09-16 11:14, Steve Holden wrote:

On 12 October 2017 at 06:33, Mike Miller <python-dev@mgmiller.net> wrote:
By contrast, if we give them their own name (as with suggestions like record, row, entity), that makes them start to sound more like enums: an alternative base class with different runtime behaviour from a regular class. Cheers, Nick. P.S. I'll grant that this reasoning doesn't entirely mesh with the naming of "Abstract Base Class", but that phrase at least explicitly has the word "base" in it, suggesting that inheritance is involved in the way it works. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2017-10-11 19:56, Nick Coghlan wrote:
IMO, the problem with the dataclass name isn't the data part, but the "class" part. No other class has "class" in its name(?), not even object. The Department of Redundancy Department will love it. If it must be a compound name, it should rather be dataobject, no?
This pep also adds many methods for use at runtime, though perhaps the behavior is more subtle.
There was some discussion over inheritance vs. decoration, not sure if it was settled. (Just noticed that the abc module got away with a class name of "ABC," perhaps dataclass would be more palatable as "DC", though entity sounds a bit nicer.) Cheers, -Mike

On 12 October 2017 at 14:49, Mike Miller <python-dev@mgmiller.net> wrote:
No, because dataclass is the name of a class decorator ("This class is a data class"), not the name of a type. It's akin to "static method", "class method", and "instance method" for function definitions (although the last one isn't a typical decorator, since it's the default behaviour for functions placed inside a class). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Oct 11, 2017 at 10:33 PM, Mike Miller <python-dev@mgmiller.net> wrote:
I'm not familiar with ER modelling but I would advise against using the term "entity", as it has, in domain-driven design (DDD) a very specific meaning: "An object that is not defined by its attributes, but rather by a thread of continuity and its identity." (from https://en.wikipedia.org/wiki/Domain-driven_design#Building_blocks) See also the more general Wikipedia definition "An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not." ( https://en.wikipedia.org/wiki/Entity). In the context of DDD, entities are usually opposed to value objects: "An object that contains attributes but has no conceptual identity. They should be treated as immutable.". ( https://en.wikipedia.org/wiki/Domain-driven_design#Building_blocks) Attrs, and by extension the dataclass proposal (I guess), provide some support for both: - Providing support for quickly constructing immutable objects from a bag of attributes, and providing equality based on those attributes, it helps implement Value Objects (not sure much more is needed actually) - By supporting equality based on some "primary key", it will also help with maintaining the concept of "equality" in entities. It would be great if the dataclass proposal could help implement DDD technical concepts in Python, but its terminology should not conflict the DDD terminology, if we want to avoid confusion. Cheers, S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group / Systematic Cluster - http://www.gt-logiciel-libre.org/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyData Paris - http://pydata.fr/ --- “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.” — R. Buckminster Fuller

On Thu, Oct 12, 2017 at 10:20 AM, Mike Miller <python-dev@mgmiller.net> wrote:
Yes, for the lifetime of the object in the Python VM. But if you are dealing with objects that are persisted using some kind of ORM, ODM, OODB, then it wont work. It's quite common (but not always the best solution) to use some kind of UUID to represent the identity of each entity. Also, there can be circumstances where two objects can exist at the same time in the VM which represent the same object, in which case one should ensure that a == b iff a.uid == a.uid (in the case 'uid' is the attribute used to carry the unique identifier).
I don't believe either module particularly supports or restricts immutability?
http://www.attrs.org/en/stable/examples.html#immutability https://www.python.org/dev/peps/pep-0557/#frozen-instances S.
-- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group / Systematic Cluster - http://www.gt-logiciel-libre.org/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyData Paris - http://pydata.fr/ --- “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.” — R. Buckminster Fuller

On Thu, Oct 12, 2017 at 9:20 AM, Mike Miller <python-dev@mgmiller.net> wrote:
distinction similar to the one between classes (entities) and instances (value objects). The reason I liked "row" as a name is because it resembles "vector" and hence is loosely assocaited with the concept of a tuple as well as being familiar to database users. In fact the answer to a relational query was, I believe, originally formally defined as a set of tuples. Sometimes one can simply be too hifalutin' [ http://www.dictionary.com/browse/hifalutin], and language that attempts to be precise obscures meaning to the less specialised reader. See also the more general Wikipedia definition "An entity is something that

On 12 October 2017 at 11:20, Steve Holden <steve@holdenweb.com> wrote:
But rows and tuples are usually immutable, at least in database terms. These data classes are not immutable (by default). If you want tuple-like behaviour, you can continue to use tuples. I see dataclasses as something closer to C `struct`. Most likely someone already considered `struct` as name; if not, please consider it. Else stick with dataclass, it's a good name IMHO.

I am still firmly convinced that @dataclass is the right name for the decorator (and `dataclasses` for the module). -- --Guido van Rossum (python.org/~guido)

On Oct 12, 2017, at 10:46, Guido van Rossum <guido@python.org> wrote:
I am still firmly convinced that @dataclass is the right name for the decorator (and `dataclasses` for the module).
Darn, and I was going to suggest they be called EricTheHalfABees, with enums being renamed to EricTheHalfNotBees. -Barry

On Oct 12, 2017, at 7:46 AM, Guido van Rossum <guido@python.org> wrote:
I am still firmly convinced that @dataclass is the right name for the decorator (and `dataclasses` for the module).
+1 from me. The singular/plural pair has the same nice feel as "from fractions import Fraction", "from itertools import product" and "from collections import namedtuple". Raymond

On Thu, Oct 12, 2017 at 3:20 AM, Steve Holden <steve@holdenweb.com> wrote:
Is the intent that these things preserve order? in which case, I like row is OK (though still don't see what's wrong with record). I still dop'nt love it though -- it gives the expectation of a row in a data table )or csv file, or.. which will be a common use case, but really, it doesn't conceptually have anything to do with tabular data. in fact, one might want to store a bunch of these in, say, a 2D (or 3D) array, then row would be pretty weird.... I don't much like entity either -- it is either way to generic -- everyting is an entity! even less specific than "object". Or two specific (and incorrect) in the lexicon of particular domains. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (25)
-
Barry Warsaw
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Eric V. Smith
-
Ethan Furman
-
Glenn Linderman
-
Guido van Rossum
-
Guido van Rossum
-
Gustavo Carneiro
-
Ivan Levkivskyi
-
Jonathan Goble
-
Martin Teichmann
-
Michel Desmoulin
-
Mike Miller
-
Mike Miller
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Raymond Hettinger
-
Stefan Krah
-
Steve Holden
-
Steven D'Aprano
-
Stéfane Fermigier
-
Sven R. Kunze
-
tds333@mailbox.org