Mailman 3 dataclasses: position-only and keyword-only fields - Python-ideas

March 13, 2021

      [I'm sort of loose with the terms field, parameter, and argument here. 
Forgive me: I think it's still understandable. Also I'm not specifying 
types here, I'm using Any everywhere. Use your imagination and 
substitute real types if it helps you.]

There have been many requests to add keyword-only fields to dataclasses. 
These fields would result in __init__ parameters that are keyword-only. 
As long as I'm doing this, I'd like to add positional-only fields as well.

Basically, I want to add a flag, to each field, stating whether the 
field results in a normal parameter, a positional-only parameter, or a 
keyword-only parameter to __init__. Then when I'm generating __init__, 
I'll examine those flags and put the positional-only ones first, 
followed by the normal ones, followed by the keyword-only ones.

The trick becomes: how do you specify what type of parameter each field 
represents?

First, here's what attrs does. There's a parameter to their attr.ib() 
function (the moral equivalent of dataclasses.field()) named kw_only, 
which if set, marks the field as being keyword-only. From 
https://www.attrs.org/en/stable/examples.html#keyword-only-attributes :
...
...
...
@attr.s
... class A:
...     a = attr.ib(kw_only=True)
A()
Traceback (most recent call last):
   ...
TypeError: A() missing 1 required keyword-only argument: 'a'
A(a=1)
A(a=1)
There's also a parameter to attr.s, also named kw_only, which if true 
marks every field as being keyword-only:
...
...
...
@attr.s(kw_only=True)
... class A:
...     a = attr.ib()
...     b = attr.ib()
A(1, 2)
Traceback (most recent call last):
   ...
TypeError: __init__() takes 1 positional argument but 3 were given
A(a=1, b=2)
A(a=1, b=2)
In dataclasses, these example become:
...
...
...
@dataclasses.dataclass
... class A:
...     a: Any = field(kw_only=True)
...
...
...
@dataclasses.dataclass(kw_only=True)
... class A:
...     a: Any
...     b: Any
Aside from the name 'kw_only', which we can bikeshed about, I think 
these features are good, and I'd like to implement them as shown here.

But, I'd like to do two other things: make it easier to use, and support 
positional-only fields.

Since the second one is easier, let's tackle it first. I'd do the same 
thing as kw_only, but name it something like pos_only. Again, we can 
argue about the name. Like kw_only, you can either specify individual 
fields as positional-only, or declare that every field is 
positional-only. It would be an error to specify both kw_only and pos_only.

As far as making it simpler: I dislike needing to use 
field(kw_only=True), although it would certainly work. The problem is 
that if you have 1 normal parameter, and 10 keyword-only ones, you'd be 
forced to say:

@dataclasses.dataclass
class A:
     a: Any
     b: Any = field(kw_only=True, default=0)
     c: Any = field(kw_only=True, default='foo')
     e: Any = field(kw_only=True, default=0.0)
     f: Any = field(kw_only=True)
     g: Any = field(kw_only=True, default=())
     h: Any = field(kw_only=True, default='bar')
     i: Any = field(kw_only=True, default=3+4j)
     j: Any = field(kw_only=True, default=10)
     k: Any = field(kw_only=True)

That's way too verbose for me.

Ideally, I'd like something like this example:

@dataclasses.dataclass
class A:
     a: Any
     # pragma: KW_ONLY
     b: Any
     # pragma: POS_ONLY
     c: Any

And then b would become a keyword-only field and c would be 
positional-only. But we need some way of telling dataclasses.dataclass 
what's going on, since obviously pragmas are out.

I propose the following. I'll add 2 (or 3, keep reading) singletons to 
the dataclasses module: KW_ONLY and POS_ONLY. When scanning the 
__attribute__'s that define fields, fields with these types would be 
ignored, except for assigning the kw_only/pos_only/normal flag to fields 
declared after these singletons are used. So you'd get:

@dataclasses.dataclass
class A:
     a: Any
     _: dataclasses.KW_ONLY
     b: Any
     __: dataclasses.POS_ONLY
     c: Any

This would generate:
def __init__(self, c, /, a, *, b):

The names of the KW_ONLY and POS_ONLY fields don't matter, since they're 
discarded. But as you see above, they still need to be unique. I think _ 
is a fine name, and since KW_ONLY will be used much more than POS_ONLY, 
'_: dataclasses.KW_ONLY' would be the pythonic way of saying "the 
following fields are keyword-only".

I do think I'll add a third singleton to specify that subsequent fields 
are "normal" fields, neither keyword-only or positional-only. I don't 
know that we have a name for such a thing, let's call it NORMAL_ARG here 
and bikeshed it later. Then you could say:

@dataclasses.dataclass
class A:
     a: Any
     _: dataclasses.KW_ONLY
     b: Any
     __: dataclasses.POS_ONLY
     c: Any
     ___: dataclasses.NORMAL_ARG
     d: Any

Then a and d are "normal" fields, while b is keyword-only and c is 
positional-only. This would generate:
def __init__(self, c, /, a, d, *, b):

I normally wouldn't propose adding NORMAL_ARG, but since the order of 
fields matters (for repr, comparisons, etc.) I figure it might be 
desirable to have their order there be different from the order they're 
declared. I could be talked out of NORMAL_ARG and they'd just always 
have to go first (although you could play games with inheritance to 
change that).

My "complex" example above would become:

@dataclasses.dataclass
class A:
     a: Any
     _: dataclasses.KW_ONLY
     b: Any = 0
     c: Any = 'foo'
     e: Any = 0.0
     f: Any
     g: Any = ()
     h: Any = 'bar'
     i: Any = 3+4j
     j: Any = 10
     k: Any

Which I think is a lot better.

There are a few additional quirks involving inheritance, but the 
behavior would follow naturally from how dataclasses already does 
inheritance. I can address that later when I work on the docs for this.

Remember, the only point of all of these hoops is to add a flag to each 
field saying what type of __init__ argument it becomes: positional-only, 
normal, or keyword-only.

So, what do you think? Is this a horrible idea? Should it be a PEP, or 
just a 'simple' feature addition to dataclasses? I'm worried that if I 
have to do a full blown PEP I won't get to this for 3.10.

I should mention another idea that showed up on python-ideas, at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WBL4X4... 
. It would allow you to specify the flag via code like:

@dataclasses.dataclass
class Parent:
     with dataclasses.positional():
         a: int
     c: bool = False
     with dataclasses.keyword():
         e: list

I'm not crazy about it, and it looks like it would require stack 
inspection to get it to work, but I mention it here for completeness.

One last thought: mypy and other type checkers would need to be taught 
about all of this.

--
Eric

dataclasses: position-only and keyword-only fields

tags

participants (9)