Good proposal! I have a few questions.
On Mon, Mar 15, 2021 at 2:22 PM Eric V. Smith email@example.com wrote:
[I'm sort of loose with the terms field, parameter, and argument here. Forgive me: I think it's still understandable. Also I'm not specifying types here, I'm using Any everywhere. Use your imagination and substitute real types if it helps you.]
Here's version 2 of my proposal:
There have been many requests to add keyword-only fields to dataclasses. These fields would result in __init__ parameters that are keyword-only.
In a previous proposal, I suggested also including positional arguments for dataclasses. That proposal is at
https://firstname.lastname@example.org/message/I3RKK4... . After some discussion, I think it's clear that positional arguments aren't going to work well with dataclasses. The deal breaker for me is that the generated repr would either not work with eval(), or it would contain fields without names (since they're positional). There are additional concerns mentioned in that thread. Accordingly, I'm going to drop positional arguments from this proposal.
Basically, I want to add a flag to each field, stating whether the field results in a normal parameter or a keyword-only parameter to __init__. Then when I'm generating __init__, I'll examine those flags and put the normal arguments first, followed by the keyword-only ones.
The trick becomes: how do you specify what type of parameter each field represents?
What attrs does
First, here's what attrs does. There's a parameter to their attr.ib() function (the moral equivalent of dataclasses.field()) named kw_only, which if set, marks the field as being keyword-only. From https://www.attrs.org/en/stable/examples.html#keyword-only-attributes :
... class A: ... a = attr.ib(kw_only=True)
Traceback (most recent call last): ... TypeError: A() missing 1 required keyword-only argument: 'a'
There's also a parameter to attr.s (the equivalent of dataclasses.dataclass), also named kw_only, which if true marks every field as being keyword-only:
... class A: ... a = attr.ib() ... b = attr.ib()
Traceback (most recent call last): ... TypeError: __init__() takes 1 positional argument but 3 were given
I propose to adopt both of these methods (dataclass(kw_ony=True) and field(kw_only=True) in dataclasses. The above example would become:
... class A: ... a: Any = field(kw_only=True)
... class A: ... a: Any ... b: Any
But, I'd also like to make this a little easier to use, especially in the case where you're defining a dataclass that has some normal fields and some keyword-only fields. Using the attrs approach, you'd need to declare the keyword-only fields using the "=field(kw_only=True)" syntax, which I think is needlessly verbose, especially when you have many keyword-only fields.
The problem is that if you have 1 normal parameter and 10 keyword-only ones, you'd be forced to say:
@dataclasses.dataclass class LotsOfFields: a: Any b: Any = field(kw_only=True, default=0) c: Any = field(kw_only=True, default='foo') d: Any = field(kw_only=True) e: Any = field(kw_only=True, default=0.0) f: Any = field(kw_only=True) g: Any = field(kw_only=True, default=()) h: Any = field(kw_only=True, default='bar') i: Any = field(kw_only=True, default=3+4j) j: Any = field(kw_only=True, default=10) k: Any = field(kw_only=True)
That's way too verbose for me.
Ideally, I'd like something like this example:
@dataclasses.dataclass class A: a: Any # pragma: KW_ONLY b: Any
And then b would become a keyword-only field, while a is a normal field. But we need some way of telling dataclasses.dataclass what's going on, since obviously pragmas are out.
I propose the following. I'll add a singleton to the dataclasses module: KW_ONLY. When scanning the __attribute__'s that define the fields, a field with this type would be ignored, except for assigning the kw_only flag to fields declared after these singletons are used. So you'd get:
@dataclasses.dataclass class B: a: Any _: dataclasses.KW_ONLY b: Any
This would generate:
def __init__(self, a, *, b):
This example is equivalent to:
@dataclasses.dataclass class B: a: Any b: Any = field(kw_only=True)
The name of the KW_ONLY field doesn't matter, since it's discarded. I think _ is a fine name, and '_: dataclasses.KW_ONLY' would be the pythonic way of saying "the following fields are keyword-only".
My example above would become:
@dataclasses.dataclass class LotsOfFields: a: Any _: dataclasses.KW_ONLY b: Any = 0 c: Any = 'foo' d: Any e: Any = 0.0 f: Any g: Any = () h: Any = 'bar' i: Any = 3+4j j: Any = 10 k: Any
Which I think is a lot clearer.
The generated __init__ would look like:
def __init__(self, a, *, b=0, c='foo', d, e=0.0, f, g=(), h='bar', i=3+4j, j=10, k):
The idea is that all normal argument fields would appear first in the class definition, then all keyword argument fields. This is the same requirement as in a function definition. There would be no switching back and forth between the two types of fields: once you use KW_ONLY, all subsequent fields are keyword-only. A field of type KW_ONLY can appear only once in a particular dataclass (but see the discussion below about inheritance).
Re-ordering args in __init__
If, using field(kw_only=True), you specify keyword-only fields before non-keyword-only fields, all of the keyword-only fields will be moved to the end of the __init__ argument list. Within the list of non-keyword-only arguments, all arguments will keep the same relative order as in the class definition. Ditto for within keyword-only arguments.
@dataclasses.dataclass class C: a: Any b: Any = field(kw_only=True) c: Any d: Any = field(kw_only=True)
Then the generated __init__ will look like:
def __init__(self, a, c, *, b, d):
__init__ is the only place where this rearranging will take place. Everywhere else, and importantly in __repr__ and any dunder comparison methods, the order will be the same as it is now: in field declaration order.
Can you be specific and show what the repr() would be? E.g. if I create C(1, 2, b=3, d=4) the repr() be C(a=1, b=3, c=2, d=4), right?
This is the same behavior that attrs uses.
Nevertheless I made several typos trying to make the examples in my sentence above correct. Perhaps we could instead disallow mixing kw-only and regular args? Do you know why attrs does it this way?
There are a few additional quirks involving inheritance, but the behavior would follow naturally from how dataclasses already handles fields via inheritance and the __init__ argument re-ordering discussed above. Basically, all fields in a derived class are computed like they are today. Then any __init__ argument re-ordering will take place, as discussed above.
@dataclasses.dataclass(kw_only=True) class D: a: Any
@dataclasses.dataclass class E(D): b: Any
@dataclasses.dataclass(kw_only=True) class F(E): c: Any
This will result in the __init__ signature of:
def __init__(self, b, *, a, c):
However, the repr() will still produce the fields in order a, b, c. Comparisons will also use the same order.
This can be simulated by flattening the inheritance tree and adding explicit field(kw_only=True) to all fields of classes using kw_only=True in the class decorator as well as all fields affected by _: KW_ONLY, right? So the above would behave like this:
@dataclasses.dataclass class F: a: Any = field(kw_only=True) b: Any c: Any = field(kw_only=True)
which IIUC indeed gives the same __init__ signature and repr().
Remember, the only point of all of these hoops is to add a flag to each field saying what type of __init__ argument it becomes: normal or keyword-only. Any of the 3 methods discussed above (kw_only flag to @dataclass(), kw_only flag to field(), or the KW_ONLY marker) all have the same result: setting the kw_only flag on one or more fields.
The value of that flag, on a per-field basis, is used to re-order __init__ arguments, and is used in generating the __init__ signature. It's not used anywhere else.
I expect the two most common use cases to be the kw_only flag to @dataclass() and the KW_ONLY marker. I would expect the usage of the kw_only flag on field() to be rare, but since it's the underlying mechanism and it's needed for more complex field layouts, it is included in this proposal.
So, what do you think? Is this a horrible idea? Should it be a PEP, or just a 'simple' feature addition to dataclasses? I'm worried that if I have to do a full blown PEP I won't get to this for 3.10.
I don't think it is very controversial, do you? Then again maybe you should ask a SC member if they would object.
mypy and other type checkers would need to be taught about all of this.
Yeah, that's true. But the type checkers have bigger fish to fry (e.g. pattern matching).