dataclasses: position-only and keyword-only fields

[I'm sort of loose with the terms field, parameter, and argument here. Forgive me: I think it's still understandable. Also I'm not specifying types here, I'm using Any everywhere. Use your imagination and substitute real types if it helps you.] There have been many requests to add keyword-only fields to dataclasses. These fields would result in __init__ parameters that are keyword-only. As long as I'm doing this, I'd like to add positional-only fields as well. Basically, I want to add a flag, to each field, stating whether the field results in a normal parameter, a positional-only parameter, or a keyword-only parameter to __init__. Then when I'm generating __init__, I'll examine those flags and put the positional-only ones first, followed by the normal ones, followed by the keyword-only ones. The trick becomes: how do you specify what type of parameter each field represents? First, here's what attrs does. There's a parameter to their attr.ib() function (the moral equivalent of dataclasses.field()) named kw_only, which if set, marks the field as being keyword-only. From https://www.attrs.org/en/stable/examples.html#keyword-only-attributes :
There's also a parameter to attr.s, also named kw_only, which if true marks every field as being keyword-only:
In dataclasses, these example become:
Aside from the name 'kw_only', which we can bikeshed about, I think these features are good, and I'd like to implement them as shown here. But, I'd like to do two other things: make it easier to use, and support positional-only fields. Since the second one is easier, let's tackle it first. I'd do the same thing as kw_only, but name it something like pos_only. Again, we can argue about the name. Like kw_only, you can either specify individual fields as positional-only, or declare that every field is positional-only. It would be an error to specify both kw_only and pos_only. As far as making it simpler: I dislike needing to use field(kw_only=True), although it would certainly work. The problem is that if you have 1 normal parameter, and 10 keyword-only ones, you'd be forced to say: @dataclasses.dataclass class A: a: Any b: Any = field(kw_only=True, default=0) c: Any = field(kw_only=True, default='foo') e: Any = field(kw_only=True, default=0.0) f: Any = field(kw_only=True) g: Any = field(kw_only=True, default=()) h: Any = field(kw_only=True, default='bar') i: Any = field(kw_only=True, default=3+4j) j: Any = field(kw_only=True, default=10) k: Any = field(kw_only=True) That's way too verbose for me. Ideally, I'd like something like this example: @dataclasses.dataclass class A: a: Any # pragma: KW_ONLY b: Any # pragma: POS_ONLY c: Any And then b would become a keyword-only field and c would be positional-only. But we need some way of telling dataclasses.dataclass what's going on, since obviously pragmas are out. I propose the following. I'll add 2 (or 3, keep reading) singletons to the dataclasses module: KW_ONLY and POS_ONLY. When scanning the __attribute__'s that define fields, fields with these types would be ignored, except for assigning the kw_only/pos_only/normal flag to fields declared after these singletons are used. So you'd get: @dataclasses.dataclass class A: a: Any _: dataclasses.KW_ONLY b: Any __: dataclasses.POS_ONLY c: Any This would generate: def __init__(self, c, /, a, *, b): The names of the KW_ONLY and POS_ONLY fields don't matter, since they're discarded. But as you see above, they still need to be unique. I think _ is a fine name, and since KW_ONLY will be used much more than POS_ONLY, '_: dataclasses.KW_ONLY' would be the pythonic way of saying "the following fields are keyword-only". I do think I'll add a third singleton to specify that subsequent fields are "normal" fields, neither keyword-only or positional-only. I don't know that we have a name for such a thing, let's call it NORMAL_ARG here and bikeshed it later. Then you could say: @dataclasses.dataclass class A: a: Any _: dataclasses.KW_ONLY b: Any __: dataclasses.POS_ONLY c: Any ___: dataclasses.NORMAL_ARG d: Any Then a and d are "normal" fields, while b is keyword-only and c is positional-only. This would generate: def __init__(self, c, /, a, d, *, b): I normally wouldn't propose adding NORMAL_ARG, but since the order of fields matters (for repr, comparisons, etc.) I figure it might be desirable to have their order there be different from the order they're declared. I could be talked out of NORMAL_ARG and they'd just always have to go first (although you could play games with inheritance to change that). My "complex" example above would become: @dataclasses.dataclass class A: a: Any _: dataclasses.KW_ONLY b: Any = 0 c: Any = 'foo' e: Any = 0.0 f: Any g: Any = () h: Any = 'bar' i: Any = 3+4j j: Any = 10 k: Any Which I think is a lot better. There are a few additional quirks involving inheritance, but the behavior would follow naturally from how dataclasses already does inheritance. I can address that later when I work on the docs for this. Remember, the only point of all of these hoops is to add a flag to each field saying what type of __init__ argument it becomes: positional-only, normal, or keyword-only. So, what do you think? Is this a horrible idea? Should it be a PEP, or just a 'simple' feature addition to dataclasses? I'm worried that if I have to do a full blown PEP I won't get to this for 3.10. I should mention another idea that showed up on python-ideas, at https://mail.python.org/archives/list/python-ideas@python.org/message/WBL4X4... . It would allow you to specify the flag via code like: @dataclasses.dataclass class Parent: with dataclasses.positional(): a: int c: bool = False with dataclasses.keyword(): e: list I'm not crazy about it, and it looks like it would require stack inspection to get it to work, but I mention it here for completeness. One last thought: mypy and other type checkers would need to be taught about all of this. -- Eric

On Fri, Mar 12, 2021, 11:55 PM Eric V. Smith <eric@trueblade.com> wrote:
I think stack inspection could be avoided if we did something like: ``` @dataclasses.dataclass class Parent: class pos(dataclasses.PositionalOnly): a: int c: bool = False class kw(dataclasses.KeywordOnly): e: list ``` Like your proposal, the names for the two inner classes can be anything, but they must be unique. The metaclass would check if a field in the new class's namespace was a subclass of PositionalOnly or KeywordOnly, and if so recurse into its annotations to collect more fields. This still seems hacky, but it seems to read reasonably nicely, and behaves obviously in the presence of subclassing.

Sorry, Matt: I meant to credit you by name, but forgot to during my final edit. I still don't like the verbosity and the look of 'with' statements or classes. Or the fact that some of the fields are indented relative to others. And should also mention that Ethan Furman suggested using '*' and '/' as the "type", in https://mail.python.org/archives/list/python-ideas@python.org/message/BIAVX4... , although the interaction with typing (specifically get_type_hints) might be an issue: class Hmm: # this: int that: float # pos: '/' # these: str those: str # key: '*' # some: list Anyway, Matt's and Ethan's proposal are on the other thread. I'd like to keep this thread focused on my proposal of dataclasses.KW_ONLY and .POS_ONLY. Not that saying "I'd like focus this thread" has ever worked in the history of python-ideas. Eric On 3/13/2021 2:40 AM, Matt Wozniski wrote:
-- Eric V. Smith

Thanks Eric. That's a really good summary of the ideas thrown around up to this point in the other thread. I'm not overly fussed about the precise syntax we use, though I do have a preference for using an extra level of scope to mark fields as pos/kw only, either with context managers or with nested classes like Matt Woznisky suggested (because I think this makes it more explicit and and less surprising to someone seeing it for the first time). But that's not a hill I'd even remotely consider dying on. As long as we get: 1) positional-only and keyword-only arguments for dataclasses 2) in an inheritance-friendly way that allows subclasses to also specify their own respective normal/positional/kw fields I'll be happy. As I understand it your proposal would allow for the following, right? import dataclasses @dataclasses.dataclass class Parent: pos = dataclasses.POS_ARG a: int normal = dataclasses.NORMAL_ARG c: bool = False kw = dataclasses.KW_ARG e: list @dataclasses.dataclass class Child(Parent): pos = dataclasses.POS_ARG b: float = 3.14 normal = dataclasses.NORMAL_ARG d: dict = dataclasses.field(default_factory=dict) kw = dataclasses.KW_ARG f: set = dataclasses.field(default_factory=set) Producing an __init__ like: def __init__(self, a: int, b: float = 3.14, /, c: bool = False, d: dict = None, *, e: list, f: set = None): That is to say, you can specify these special values *once per class* in the inheritance hierarchy (not just once overall) and the fields in each category get added on at the end of that category (normal/pos/wk) compared to the parent's __init__ signature. If so, I'm happy with this. PS: I do think there will be some confusion around unexpected behaviour when someone who has seen this many times: @dataclasses.dataclass class Example: _ = dataclasses.KW_ARG a: str tries this and is confused as to why it doesn't work as expected: @dataclasses.dataclass class Example: _ = dataclasses.POS_ARG a: str _ = dataclasses.KW_ARG b: int Basically, the requirement that the names these special values are assigned to must be unique is an unfortunate side-effect of the fact that Python doesn't bind free-floating objects to locals in some way, the way it does with for type hints into __annotations__ which would allow the much cleaner: @dataclasses.dataclass class Example: dataclasses.POS_ARG a: str dataclasses.KW_ARG b: int Come to think of it, this would make certain declarative constructs like those found in SQLAlchemy much cleaner as well, but that's a proposal for a different thread :p On Sat, Mar 13, 2021 at 9:50 AM Eric V. Smith <eric@trueblade.com> wrote:

On 3/13/2021 6:06 AM, Matt del Valle wrote: <stuff deleted>
Yes, although: - It needs to be "pos: dataclasses.POS_ARG" (colon not equals) in order for it to make it into __attributes__, which is what dataclass() uses for processing. - I don't think we'd want to restrict KW_ARG, POS_ARG, NORMAL_ARG to only once per class. After all, if you're using "field(kw_only=True)" and "field=(pos_only=True)" you could switch back and forth any number of times. I don't think this would be common, but I don't see a reason to prohibit it. Special rules, and all. When processing your Parent class, dataclass() produces a dict of fields [*]: {'a': Field(type=int, arg_type=POS_ARG, <other stuff>), 'c': Field(type=bool, arg_type=NORMAL_ARG, default=False, <other stuff>), 'e': Field(type=list, arg_type=KW_ARG, <other stuff>), } And for Child: {'b': Field(type=float, arg_type=POS_ARG, default=3.14, <other stuff>), 'd': Field(type=dict, arg_type=NORMAL_ARG, default_factory=dict, <other stuff>), 'f': Field(type=set, arg_type=KW_ARG, default_factory=set, <other stuff>), } Note that your "normal", "pos", and "kw" members don't make it into the dict of fields in either of these classes. As I said in the first email, the only thing these markers are used for is to set the arg_type of each field. They're not retained anywhere (unless I don't delete them from __attributes__, which I haven't given any thought to). Then it merges those (using whatever rules it has if it finds name collisions, which doesn't apply here) to come up with: {'a': Field(type=int, arg_type=POS_ARG, <other stuff>), 'c': Field(type=bool, arg_type=NORMAL_ARG, default=False, <other stuff>), 'e': Field(type=list, arg_type=KW_ARG, <other stuff>), 'b': Field(type=float, arg_type=POS_ARG, default=3.14, <other stuff>), 'd': Field(type=dict, arg_type=NORMAL_ARG, default_factory=dict, <other stuff>), 'f': Field(type=set, arg_type=KW_ARG, default_factory=set, <other stuff>), } Then it puts the POS_ARGs first, then NORMAL_ARGs, then KW_ARGs, to come up with: def __init__(self, a:int, b:float=3.14, /, c:bool=False, d:dict=None, *, e:list, f:set=None): (Except it does something more complicated with defaults, and especially default_factory's.) I expect that - the 80% use case will just be "@dataclasses.dataclass(kw_only=True)" - if using a marker, "_: dataclasses.KW_ARG" will be the only one used, and anything else would be rare - having all three markers will be very rare - almost never would a marker be specified more than once
If so, I'm happy with this.
I'm glad. Thank you for your comments here, it made me write down what I'd been considering for inheritance. Eric [*]: I'm not committing to using arg_type in a Field object, it's just illustrative for this example. I will probably have two flags, kw_only and pos_only so that it matches the field() method.
-- Eric V. Smith

Fwiw I read Eric's entire proposal (I like it) but totally missed the presence of single, double, triple underscores. Which caused me to be very confused reading Matt De Valle's reply, until I went back and noticed them, and the lightbulb went on. Based on that experience, and also Matt's comment about how people might automatically try to add a second signature directive using the same variable name, I would suggest that maybe it would be preferred, when giving examples in documentation etc, to not use underscores like this as the placeholders.... It is easy to miss that the variable names are required to be different. Different comment: in the other thread I liked the idea of mimicking the syntactical way of writing a function signature (although this might cause other problems): @dataclass class C: # positional only arguments required at top a: Any Pos : '/' # normal only after this line, can't go back b: Any Kwd: '*' # kwd only after this line, can't go back c: Any But as Eric pointed out, there could be a lot of value in being able to go back and forth. I know think his idea is better. BIKE SHED: If switching back and forth does win out, I think we should NOT try to use the characters '/' and '*' to specify the signature directives because they would lead the reader to believe they work the same as in a function signature. Aside from the issue if going back and forth, in Eric's proposal the positional directive comes *before* the positional arguments, rather than after like in a function signature. Since this is so different, please don't try to use '/'.

On 3/13/2021 9:33 AM, Ricky Teachey wrote:
Hmm. I just noticed that you can specify a class variable multiple times, without an error. Subsequent ones overwrite the prior ones in __attributes__. That's not good for my proposal, since if you use "_: dataclasses.KW_ONLY" followed by "_: dataclasses.POS_ONLY", the second one overwrites the first and you lose where the second one was:
For some reason I thought this would raise an error. This might be a showstopper for this proposal. I'm aware you could do something with metaclasses, but one core dataclasses principle is to not use metaclasses so that the user is free to use any metaclass for their own purposes, without conflict. And I think changing at this point in the game, just for this feature, won't fly. I'll give it some more thought, but I'm not optimistic. I could still add kw_only arguments to @dataclass() and field(), but I think the best part of the proposal was saying "the rest of the fields are keyword-only". Or, maybe we could just document this? As I said, I don't think specifying multiple KW_ONLY or POS_ONLY (or any combination) would be common. But it's an unfortunate trap waiting for the unexpecting user. Eric
-- Eric V. Smith

Another option is something like this (building on Ricky Teachey's suggestion): from dataclasses import ArgumentMarker, dataclass @dataclass class C: a: Any # positional-only __pos__: ArgumentMarker b: Any # normal __kw_only__: ArgumentMarker c: Any # keyword-only The dataclass machinery would look at the double-underscored names to figure out when to change argument kind. The annotation ("ArgumentMarker") doesn't matter, but we need to put *something* there, so why not a new marker object that clarifies the purpose of the line? El sáb, 13 mar 2021 a las 7:17, Eric V. Smith (<eric@trueblade.com>) escribió:

On 3/13/2021 10:22 AM, Jelle Zijlstra wrote:
Yeah, I thought of that. I guess as long as we use a dunder name and require ArgumentMarker we'd be protected from having collisions for user-named fields. Also, this design would prevent you from switching back and forth between argument types, but I can live with that. Or we could say any dunder field name starting with "__pos" or "__kw_only" would be recognized (if the type is also ArgumentMarker). But that could be added later if there's a clamor for it, which I highly doubt. Although due to the way dataclasses work, you can play all sorts of games with inheritance to get same effect. It's just not very user friendly. Eric
-- Eric V. Smith

On Fri, Mar 12, 2021, 11:55 PM Eric V. Smith <eric@trueblade.com> wrote:
Have there also been requests for positional-only fields?
The more I digest this idea, the more supporting positional-only fields sounds like a bad idea to me. The motivation for adding positional-only arguments to the language was a) that some built-in functions take only positional arguments, and there was no consistent way to document that and no way to match their interface with pure Python functions, b) that some parameters have no semantic meaning and making their names part of the public API forces library authors to maintain backwards compatibility on totally arbitrary names, and c) that functions like `dict.update` that take arbitrary keyword arguments must have positional-only parameters in order to not artificially reduce the set of keyword arguments that may be passed (e.g., `some_dict.update(self=5)`). None of these cases seem to apply to dataclasses. There are no existing dataclasses that take positional-only arguments that we need consistency with. Dataclasses' constructors don't take arbitrary keyword arguments in excess of their declared fields. And most crucially, the field names become part of the public API of the class. Dataclass fields can never be renamed without a risk of breaking existing users. Taking your example from the other thread: ``` @dataclasses.dataclass class Comparator: a: Any b: Any _: dataclasses.KEYWORD_ONLY key: Optional[Callable[whatever]] = None ``` The names `a` and `b` seem arbitrary, but they're not used only in the constructor, they're also available as attributes of the instances of Comparator, and dictionary keys in the `asdict()` return. Even if they were positional-only arguments to the constructor, that would forbid calling comp = Comparator(a=1, b=2, key=operator.lt) but it would still be possible to call comp = Comparator(1, 2, key=operator.lt) print(comp.a, comp.b) Preventing them from being passed by name to the constructor seems to be adding an inconsistency, not removing one. Perhaps it makes sense to be able to make init-only variables be positional-only, since they don't become part of the class's public API, but in that case it seems it could just be a flag passed to `InitVar`. Outside of init-only variables, positional-only arguments seem like a misfeature to me. ~Matt

+1 to Matt's points here. I get the desire for symmetry with / and * in params, but I'm not convinced it's useful enough to warrant the complexity of the approaches being proposes. I think a @dataclass(..., kwonly=True) would solve > 90% of the issues with dataclass usability today. On Sat, 2021-03-13 at 06:41 -0500, Matt Wozniski wrote:

I don’t like the idea of going back and fourth for positional and keyword arguments. The positional arguments have to stay at the top of some variable (be it anything, e.g, __pos: Any). The mixed stays in between the two markers. And the keyword arguments come after yet another predefined variable with leading dunder (e.g., __kw:Any). Please if you are going to do this for dataclasses, it has to match the function signature. This will be much easier for us to use, read, and teach. Going back and forth will cause a lot of confusion, as the order of the arguments in the init method will not follow the same order as the arguments defined in the dataclass. Thanks to whoever mentioned this in this email chain. Abdulla Sent from my iPhone

On 3/13/2021 2:51 PM, Abdulla Al Kathiri wrote:
The thing is, even without being able to switch back and forth within a single dataclass, you could achieve the same thing with inheritance: @dataclass(kw_only=True) class Base: a: int b: int @dataclass class Derived(Base): c: int d: int @dataclass(kw_only=True) class MoreDerived(Derived): e: int f: int Here, a, b, e, and f are keyword-only, and c and d are normal. Likewise, you could do the same thing with: @dataclass class A: a: int = field(kw_only=True) b: int = field(kw_only=True) c: int d: int e: int = field(kw_only=True) f: int = field(kw_only=True) In both cases, you'd get re-ordered fields in __init__, and nowhere else: def __init__(c, d, *, a, b, e, f): repr, comparisons, etc. would still treat them in today's order: a, b, c, d, e, f. Other than putting in logic to call that an error, which I wouldn't want to do, it would be allowable. Why not allow the shortcut version if that's what people want to do? Again, I'm not saying this needs to be a day one feature using "__kw_only__: ArgumentMarker" (or however it's ultimately spelled). I just don't want to rule it out in case we come up with some reason it's important. I just checked and attrs allows that last case: @attr.s class A: a = attr.ib(kw_only=True) b = attr.ib(kw_only=True) c = attr.ib() d = attr.ib() e = attr.ib(kw_only=True) f = attr.ib(kw_only=True) Which generates help like: class A(builtins.object) | A(c, d, *, a, b, e, f) -> None The main reason to allow the switching back and forth is to support subclassing dataclasses that already have normal and keyword-only fields. If you didn't allow this, you'd have to say that the MostDerived class above would be an error because the __init__ looks like the parameters have been rearranged. And the same logic would apply to positional argument fields. I just don't see the need to prohibit it in general. Any tutorial would probably show the fields in the order you describe above: positional, normal, keyword-only. Eric
-- Eric V. Smith

I forgot to add: I'm uncertain about your suggestion of a required order of fields, based on argument type. I'll have to think about it some more. I'm working on a sample implementation, and I'm going to wait and see how it works before putting much more thought into it. Eric On 3/13/2021 3:14 PM, Eric V. Smith wrote:
-- Eric V. Smith

Oops, sent a reply too soon. On Sat, Mar 13, 2021 at 3:14 PM Eric V. Smith <eric@trueblade.com> wrote:
The thing is, even without being able to switch back and forth within a single dataclass, you could achieve the same thing with inheritance:
...
...
And the same logic would apply to positional argument fields
This seems like another disadvantage of allowing positional-only arguments. If positional-only fields show up just like keyword fields in an arbitrary position in the repr, the repr will cease to be a representation of a call to the dataclass's constructor suitable for passing to `eval`, as it is today when init-only parameters are not in use. ~Matt

Speaking as someone who's not into dataclasses: This whole thread seems to be about spelling the initilizer's function signature as a class body. Have you considered going in the opposite direction, i. e. writing something like @dataclass class A: @attributes_from_signature def __init__(self, ham, spam=None): pass or @attributes_from_signature def A(ham, spam=None): pass to autogenerate a class A with with attributes ham and spam?

On 2021-03-12 19:57, Eric V. Smith wrote:
Without getting too much into the details of your proposal, my main reaction to all this is that it's either trying to shoehorn too much into the typing annotations, or being too timid about what to put in the typing annotations. If we want to use typing annotations to document types then we should use them for that, and not try to sneakily also use them to define actual behavior. If, on the other hand, we *are* okay with using type annotations for defining behavior, then there isn't any need to resort to odd tricks like `_: dataclasses.KW_ONLY`. You can just put the behavior constraint directly in the annotation: @dataclass class Foo: a: (Any, dataclasses.KW_ONLY) # or b: {'type': Any, 'kw_only': True} # or (where "Any" and kw_only are is some new objects provided by a library that handles this usage) c: Any + kw_only . . . and then say that it's the job of dataclass and of typecheckers to separate the different sorts of information. I don't see value in trying to force additional information into type annotations while simultaneously trying to obey various conventions established by typechecking libraries that expect types to be specified in certain ways. I was never a big fan of the type annotations and I think this kind of thing illustrates the difficulty of having "advisory-only" annotations that aren't supposed to affect runtime behavior but then gradually begin to do so via things like dataclasses. If type annotations are advisory only they should remain 100% advisory only and for use only by static type checkers with zero effect on the code's runtime behavior. If we're going to use type checking to influence runtime behavior then we should acknowledge that people can put ANYTHING in the type annotations and thus ANY library can make totally arbitrary use of any type annotation to perform arbitrary actions at runtime. So basically, if we want to indicate stuff like keyword-only-ness using type annotations, that's fine, but we should acknowledge that at that point we are no longer annotating types. We are using type annotation syntax to implement arbitrary runtime behavior, and we should accept that doing that may break typecheckers or make life painful for their maintainers. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Fri, Mar 12, 2021, 11:55 PM Eric V. Smith <eric@trueblade.com> wrote:
I think stack inspection could be avoided if we did something like: ``` @dataclasses.dataclass class Parent: class pos(dataclasses.PositionalOnly): a: int c: bool = False class kw(dataclasses.KeywordOnly): e: list ``` Like your proposal, the names for the two inner classes can be anything, but they must be unique. The metaclass would check if a field in the new class's namespace was a subclass of PositionalOnly or KeywordOnly, and if so recurse into its annotations to collect more fields. This still seems hacky, but it seems to read reasonably nicely, and behaves obviously in the presence of subclassing.

Sorry, Matt: I meant to credit you by name, but forgot to during my final edit. I still don't like the verbosity and the look of 'with' statements or classes. Or the fact that some of the fields are indented relative to others. And should also mention that Ethan Furman suggested using '*' and '/' as the "type", in https://mail.python.org/archives/list/python-ideas@python.org/message/BIAVX4... , although the interaction with typing (specifically get_type_hints) might be an issue: class Hmm: # this: int that: float # pos: '/' # these: str those: str # key: '*' # some: list Anyway, Matt's and Ethan's proposal are on the other thread. I'd like to keep this thread focused on my proposal of dataclasses.KW_ONLY and .POS_ONLY. Not that saying "I'd like focus this thread" has ever worked in the history of python-ideas. Eric On 3/13/2021 2:40 AM, Matt Wozniski wrote:
-- Eric V. Smith

Thanks Eric. That's a really good summary of the ideas thrown around up to this point in the other thread. I'm not overly fussed about the precise syntax we use, though I do have a preference for using an extra level of scope to mark fields as pos/kw only, either with context managers or with nested classes like Matt Woznisky suggested (because I think this makes it more explicit and and less surprising to someone seeing it for the first time). But that's not a hill I'd even remotely consider dying on. As long as we get: 1) positional-only and keyword-only arguments for dataclasses 2) in an inheritance-friendly way that allows subclasses to also specify their own respective normal/positional/kw fields I'll be happy. As I understand it your proposal would allow for the following, right? import dataclasses @dataclasses.dataclass class Parent: pos = dataclasses.POS_ARG a: int normal = dataclasses.NORMAL_ARG c: bool = False kw = dataclasses.KW_ARG e: list @dataclasses.dataclass class Child(Parent): pos = dataclasses.POS_ARG b: float = 3.14 normal = dataclasses.NORMAL_ARG d: dict = dataclasses.field(default_factory=dict) kw = dataclasses.KW_ARG f: set = dataclasses.field(default_factory=set) Producing an __init__ like: def __init__(self, a: int, b: float = 3.14, /, c: bool = False, d: dict = None, *, e: list, f: set = None): That is to say, you can specify these special values *once per class* in the inheritance hierarchy (not just once overall) and the fields in each category get added on at the end of that category (normal/pos/wk) compared to the parent's __init__ signature. If so, I'm happy with this. PS: I do think there will be some confusion around unexpected behaviour when someone who has seen this many times: @dataclasses.dataclass class Example: _ = dataclasses.KW_ARG a: str tries this and is confused as to why it doesn't work as expected: @dataclasses.dataclass class Example: _ = dataclasses.POS_ARG a: str _ = dataclasses.KW_ARG b: int Basically, the requirement that the names these special values are assigned to must be unique is an unfortunate side-effect of the fact that Python doesn't bind free-floating objects to locals in some way, the way it does with for type hints into __annotations__ which would allow the much cleaner: @dataclasses.dataclass class Example: dataclasses.POS_ARG a: str dataclasses.KW_ARG b: int Come to think of it, this would make certain declarative constructs like those found in SQLAlchemy much cleaner as well, but that's a proposal for a different thread :p On Sat, Mar 13, 2021 at 9:50 AM Eric V. Smith <eric@trueblade.com> wrote:

On 3/13/2021 6:06 AM, Matt del Valle wrote: <stuff deleted>
Yes, although: - It needs to be "pos: dataclasses.POS_ARG" (colon not equals) in order for it to make it into __attributes__, which is what dataclass() uses for processing. - I don't think we'd want to restrict KW_ARG, POS_ARG, NORMAL_ARG to only once per class. After all, if you're using "field(kw_only=True)" and "field=(pos_only=True)" you could switch back and forth any number of times. I don't think this would be common, but I don't see a reason to prohibit it. Special rules, and all. When processing your Parent class, dataclass() produces a dict of fields [*]: {'a': Field(type=int, arg_type=POS_ARG, <other stuff>), 'c': Field(type=bool, arg_type=NORMAL_ARG, default=False, <other stuff>), 'e': Field(type=list, arg_type=KW_ARG, <other stuff>), } And for Child: {'b': Field(type=float, arg_type=POS_ARG, default=3.14, <other stuff>), 'd': Field(type=dict, arg_type=NORMAL_ARG, default_factory=dict, <other stuff>), 'f': Field(type=set, arg_type=KW_ARG, default_factory=set, <other stuff>), } Note that your "normal", "pos", and "kw" members don't make it into the dict of fields in either of these classes. As I said in the first email, the only thing these markers are used for is to set the arg_type of each field. They're not retained anywhere (unless I don't delete them from __attributes__, which I haven't given any thought to). Then it merges those (using whatever rules it has if it finds name collisions, which doesn't apply here) to come up with: {'a': Field(type=int, arg_type=POS_ARG, <other stuff>), 'c': Field(type=bool, arg_type=NORMAL_ARG, default=False, <other stuff>), 'e': Field(type=list, arg_type=KW_ARG, <other stuff>), 'b': Field(type=float, arg_type=POS_ARG, default=3.14, <other stuff>), 'd': Field(type=dict, arg_type=NORMAL_ARG, default_factory=dict, <other stuff>), 'f': Field(type=set, arg_type=KW_ARG, default_factory=set, <other stuff>), } Then it puts the POS_ARGs first, then NORMAL_ARGs, then KW_ARGs, to come up with: def __init__(self, a:int, b:float=3.14, /, c:bool=False, d:dict=None, *, e:list, f:set=None): (Except it does something more complicated with defaults, and especially default_factory's.) I expect that - the 80% use case will just be "@dataclasses.dataclass(kw_only=True)" - if using a marker, "_: dataclasses.KW_ARG" will be the only one used, and anything else would be rare - having all three markers will be very rare - almost never would a marker be specified more than once
If so, I'm happy with this.
I'm glad. Thank you for your comments here, it made me write down what I'd been considering for inheritance. Eric [*]: I'm not committing to using arg_type in a Field object, it's just illustrative for this example. I will probably have two flags, kw_only and pos_only so that it matches the field() method.
-- Eric V. Smith

Fwiw I read Eric's entire proposal (I like it) but totally missed the presence of single, double, triple underscores. Which caused me to be very confused reading Matt De Valle's reply, until I went back and noticed them, and the lightbulb went on. Based on that experience, and also Matt's comment about how people might automatically try to add a second signature directive using the same variable name, I would suggest that maybe it would be preferred, when giving examples in documentation etc, to not use underscores like this as the placeholders.... It is easy to miss that the variable names are required to be different. Different comment: in the other thread I liked the idea of mimicking the syntactical way of writing a function signature (although this might cause other problems): @dataclass class C: # positional only arguments required at top a: Any Pos : '/' # normal only after this line, can't go back b: Any Kwd: '*' # kwd only after this line, can't go back c: Any But as Eric pointed out, there could be a lot of value in being able to go back and forth. I know think his idea is better. BIKE SHED: If switching back and forth does win out, I think we should NOT try to use the characters '/' and '*' to specify the signature directives because they would lead the reader to believe they work the same as in a function signature. Aside from the issue if going back and forth, in Eric's proposal the positional directive comes *before* the positional arguments, rather than after like in a function signature. Since this is so different, please don't try to use '/'.

On 3/13/2021 9:33 AM, Ricky Teachey wrote:
Hmm. I just noticed that you can specify a class variable multiple times, without an error. Subsequent ones overwrite the prior ones in __attributes__. That's not good for my proposal, since if you use "_: dataclasses.KW_ONLY" followed by "_: dataclasses.POS_ONLY", the second one overwrites the first and you lose where the second one was:
For some reason I thought this would raise an error. This might be a showstopper for this proposal. I'm aware you could do something with metaclasses, but one core dataclasses principle is to not use metaclasses so that the user is free to use any metaclass for their own purposes, without conflict. And I think changing at this point in the game, just for this feature, won't fly. I'll give it some more thought, but I'm not optimistic. I could still add kw_only arguments to @dataclass() and field(), but I think the best part of the proposal was saying "the rest of the fields are keyword-only". Or, maybe we could just document this? As I said, I don't think specifying multiple KW_ONLY or POS_ONLY (or any combination) would be common. But it's an unfortunate trap waiting for the unexpecting user. Eric
-- Eric V. Smith

Another option is something like this (building on Ricky Teachey's suggestion): from dataclasses import ArgumentMarker, dataclass @dataclass class C: a: Any # positional-only __pos__: ArgumentMarker b: Any # normal __kw_only__: ArgumentMarker c: Any # keyword-only The dataclass machinery would look at the double-underscored names to figure out when to change argument kind. The annotation ("ArgumentMarker") doesn't matter, but we need to put *something* there, so why not a new marker object that clarifies the purpose of the line? El sáb, 13 mar 2021 a las 7:17, Eric V. Smith (<eric@trueblade.com>) escribió:

On 3/13/2021 10:22 AM, Jelle Zijlstra wrote:
Yeah, I thought of that. I guess as long as we use a dunder name and require ArgumentMarker we'd be protected from having collisions for user-named fields. Also, this design would prevent you from switching back and forth between argument types, but I can live with that. Or we could say any dunder field name starting with "__pos" or "__kw_only" would be recognized (if the type is also ArgumentMarker). But that could be added later if there's a clamor for it, which I highly doubt. Although due to the way dataclasses work, you can play all sorts of games with inheritance to get same effect. It's just not very user friendly. Eric
-- Eric V. Smith

On Fri, Mar 12, 2021, 11:55 PM Eric V. Smith <eric@trueblade.com> wrote:
Have there also been requests for positional-only fields?
The more I digest this idea, the more supporting positional-only fields sounds like a bad idea to me. The motivation for adding positional-only arguments to the language was a) that some built-in functions take only positional arguments, and there was no consistent way to document that and no way to match their interface with pure Python functions, b) that some parameters have no semantic meaning and making their names part of the public API forces library authors to maintain backwards compatibility on totally arbitrary names, and c) that functions like `dict.update` that take arbitrary keyword arguments must have positional-only parameters in order to not artificially reduce the set of keyword arguments that may be passed (e.g., `some_dict.update(self=5)`). None of these cases seem to apply to dataclasses. There are no existing dataclasses that take positional-only arguments that we need consistency with. Dataclasses' constructors don't take arbitrary keyword arguments in excess of their declared fields. And most crucially, the field names become part of the public API of the class. Dataclass fields can never be renamed without a risk of breaking existing users. Taking your example from the other thread: ``` @dataclasses.dataclass class Comparator: a: Any b: Any _: dataclasses.KEYWORD_ONLY key: Optional[Callable[whatever]] = None ``` The names `a` and `b` seem arbitrary, but they're not used only in the constructor, they're also available as attributes of the instances of Comparator, and dictionary keys in the `asdict()` return. Even if they were positional-only arguments to the constructor, that would forbid calling comp = Comparator(a=1, b=2, key=operator.lt) but it would still be possible to call comp = Comparator(1, 2, key=operator.lt) print(comp.a, comp.b) Preventing them from being passed by name to the constructor seems to be adding an inconsistency, not removing one. Perhaps it makes sense to be able to make init-only variables be positional-only, since they don't become part of the class's public API, but in that case it seems it could just be a flag passed to `InitVar`. Outside of init-only variables, positional-only arguments seem like a misfeature to me. ~Matt

+1 to Matt's points here. I get the desire for symmetry with / and * in params, but I'm not convinced it's useful enough to warrant the complexity of the approaches being proposes. I think a @dataclass(..., kwonly=True) would solve > 90% of the issues with dataclass usability today. On Sat, 2021-03-13 at 06:41 -0500, Matt Wozniski wrote:

I don’t like the idea of going back and fourth for positional and keyword arguments. The positional arguments have to stay at the top of some variable (be it anything, e.g, __pos: Any). The mixed stays in between the two markers. And the keyword arguments come after yet another predefined variable with leading dunder (e.g., __kw:Any). Please if you are going to do this for dataclasses, it has to match the function signature. This will be much easier for us to use, read, and teach. Going back and forth will cause a lot of confusion, as the order of the arguments in the init method will not follow the same order as the arguments defined in the dataclass. Thanks to whoever mentioned this in this email chain. Abdulla Sent from my iPhone

On 3/13/2021 2:51 PM, Abdulla Al Kathiri wrote:
The thing is, even without being able to switch back and forth within a single dataclass, you could achieve the same thing with inheritance: @dataclass(kw_only=True) class Base: a: int b: int @dataclass class Derived(Base): c: int d: int @dataclass(kw_only=True) class MoreDerived(Derived): e: int f: int Here, a, b, e, and f are keyword-only, and c and d are normal. Likewise, you could do the same thing with: @dataclass class A: a: int = field(kw_only=True) b: int = field(kw_only=True) c: int d: int e: int = field(kw_only=True) f: int = field(kw_only=True) In both cases, you'd get re-ordered fields in __init__, and nowhere else: def __init__(c, d, *, a, b, e, f): repr, comparisons, etc. would still treat them in today's order: a, b, c, d, e, f. Other than putting in logic to call that an error, which I wouldn't want to do, it would be allowable. Why not allow the shortcut version if that's what people want to do? Again, I'm not saying this needs to be a day one feature using "__kw_only__: ArgumentMarker" (or however it's ultimately spelled). I just don't want to rule it out in case we come up with some reason it's important. I just checked and attrs allows that last case: @attr.s class A: a = attr.ib(kw_only=True) b = attr.ib(kw_only=True) c = attr.ib() d = attr.ib() e = attr.ib(kw_only=True) f = attr.ib(kw_only=True) Which generates help like: class A(builtins.object) | A(c, d, *, a, b, e, f) -> None The main reason to allow the switching back and forth is to support subclassing dataclasses that already have normal and keyword-only fields. If you didn't allow this, you'd have to say that the MostDerived class above would be an error because the __init__ looks like the parameters have been rearranged. And the same logic would apply to positional argument fields. I just don't see the need to prohibit it in general. Any tutorial would probably show the fields in the order you describe above: positional, normal, keyword-only. Eric
-- Eric V. Smith

I forgot to add: I'm uncertain about your suggestion of a required order of fields, based on argument type. I'll have to think about it some more. I'm working on a sample implementation, and I'm going to wait and see how it works before putting much more thought into it. Eric On 3/13/2021 3:14 PM, Eric V. Smith wrote:
-- Eric V. Smith

Oops, sent a reply too soon. On Sat, Mar 13, 2021 at 3:14 PM Eric V. Smith <eric@trueblade.com> wrote:
The thing is, even without being able to switch back and forth within a single dataclass, you could achieve the same thing with inheritance:
...
...
And the same logic would apply to positional argument fields
This seems like another disadvantage of allowing positional-only arguments. If positional-only fields show up just like keyword fields in an arbitrary position in the repr, the repr will cease to be a representation of a call to the dataclass's constructor suitable for passing to `eval`, as it is today when init-only parameters are not in use. ~Matt

Speaking as someone who's not into dataclasses: This whole thread seems to be about spelling the initilizer's function signature as a class body. Have you considered going in the opposite direction, i. e. writing something like @dataclass class A: @attributes_from_signature def __init__(self, ham, spam=None): pass or @attributes_from_signature def A(ham, spam=None): pass to autogenerate a class A with with attributes ham and spam?

On 2021-03-12 19:57, Eric V. Smith wrote:
Without getting too much into the details of your proposal, my main reaction to all this is that it's either trying to shoehorn too much into the typing annotations, or being too timid about what to put in the typing annotations. If we want to use typing annotations to document types then we should use them for that, and not try to sneakily also use them to define actual behavior. If, on the other hand, we *are* okay with using type annotations for defining behavior, then there isn't any need to resort to odd tricks like `_: dataclasses.KW_ONLY`. You can just put the behavior constraint directly in the annotation: @dataclass class Foo: a: (Any, dataclasses.KW_ONLY) # or b: {'type': Any, 'kw_only': True} # or (where "Any" and kw_only are is some new objects provided by a library that handles this usage) c: Any + kw_only . . . and then say that it's the job of dataclass and of typecheckers to separate the different sorts of information. I don't see value in trying to force additional information into type annotations while simultaneously trying to obey various conventions established by typechecking libraries that expect types to be specified in certain ways. I was never a big fan of the type annotations and I think this kind of thing illustrates the difficulty of having "advisory-only" annotations that aren't supposed to affect runtime behavior but then gradually begin to do so via things like dataclasses. If type annotations are advisory only they should remain 100% advisory only and for use only by static type checkers with zero effect on the code's runtime behavior. If we're going to use type checking to influence runtime behavior then we should acknowledge that people can put ANYTHING in the type annotations and thus ANY library can make totally arbitrary use of any type annotation to perform arbitrary actions at runtime. So basically, if we want to indicate stuff like keyword-only-ness using type annotations, that's fine, but we should acknowledge that at that point we are no longer annotating types. We are using type annotation syntax to implement arbitrary runtime behavior, and we should accept that doing that may break typecheckers or make life painful for their maintainers. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
participants (9)
-
Abdulla Al Kathiri
-
Brendan Barnwell
-
Eric V. Smith
-
Jelle Zijlstra
-
Matt del Valle
-
Matt Wozniski
-
Paul Bryan
-
Peter Otten
-
Ricky Teachey