dataclass field argument to allow converting value on init

The idea is to have a `default_factory` like argument (either in the `field` function, or a new function entirely) that takes a function as an argument, and that function, with the value provided by `__init__`, is called and the return value is used as the value for the respective field. For example: ```py @dataclass class Foo: x: str = field(init_fn=chr) f = Foo(65) f.x # "A" ``` The `chr` function is called, given the value `65` and `x` is set to its return value of `"A"`. I understand that there is both `__init__` and `__post_init__` which can be used for this purpose, but sometimes it isn't ideal to override them. If you overrided `__init__`, and were using `__post_init__`, you would need to manually call it, and in my case, `__post_init__` is implemented on a base class, which all other classes inherit, and so overloading it would require re-implementing the logic from it (and that's ignoring the fact that you also need to type the field with `InitVar` to even have it passed to `__post_init__` in the first place). I've created a proof of concept, shown below: ```py def initfn(fn, default=None): class Inner: def __set_name__(_, owner_cls, owner_name): old_setattr = getattr(owner_cls, "__setattr__") def __setattr__(self, attr_name, value): if attr_name == owner_name: # Bypass `__setattr__` self.__dict__[attr_name] = fac(value) else: old_setattr(self, attr_name, value) setattr(owner_cls, "__setattr__", __setattr__) def fac(value): if isinstance(value, Inner): return default return fn(value) return field(default=Inner()) ``` It makes use of the fact that providing `default` as an argument to `field` means it checks the value for a `__set_name__` function, and calls it with the class and field name as arguments. Overriding `__setattr__` is just used to catch when a value is being assigned to a field, and if that field's name matches the name given to `__set_name__`, it calls the function on the value, at sets the field to that instead. It can be used like so: ```py @dataclass class Foo: x: str = initfn(fn=chr, default="Z") f = Foo(65) f2 = Foo() f.x # "A" f2.x # "Z" ``` It adds a little overhead, especially with having to override `__setattr__` however, I believe it would have very little overhead if directly implemented in the dataclass library. Even in the case of being able to override one of the init functions, I still think it would be nice to have as a quality of life feature as I feel calling a function is too simple to want to override the functions, if that makes sense. Thanks. Dexter

What type hint will be exposed for the __init__ parameter? Clearly, it's not a `str` type in your example; you're passing it an `int` value in your example. Presumably to overcome this, you'd need yet another `field` function parameter to provide the type hint for the `__init__` param? On Wed, 2022-06-22 at 20:43 +0000, Dexter Hill wrote:

Interesting point, it's not something I thought of. One solution as mentioned by Simão, and what I had in mind, is to pull the type from the first parameter of the function. We know that the function is always going to have minumum 1 parameter, and the value is always passed as the first argument. One downside is that it isn't very transparent to the user - they might not understand that the type is being taken from the first argument of the function, and wonder where it is coming from, in which case, the other solution would be to do something like `InitVar` but that takes two types (the return and the init type); something like `var: InitFn[int, str]` for the `chr` example.

Dexter Hill wrote:
What if, instead, the `init` parameter could accept either a boolean (as it does now) or a type? When given a type, that would mean that to created the property and accept the argument but pass the argument ti `__post_init__` rather than using it to initialize the property directly. The type passed to `init` would become the type hint for the argument.

I don't mind that solution although my concern is whether it would be confusing to have `init` have two different purposes depending on the argument. And, if `__post_init__` was overrided, which I would say it commonly is, that would mean the user would have to manually do the conversion, as well as remembering to add an extra argument for the conversion function (assuming I'm understanding what you're saying). If no type was provided to `init` but a conversion function was, it would be a case of getting the type from the function signature, right?

Dexter Hill wrote:
I don't mind that solution although my concern is whether it would be confusing to have `init` have two different purposes depending on the argument. And, if `__post_init__` was overrided, which I would say it commonly is, that would mean the user would have to manually do the conversion, as well as remembering to add an extra argument for the conversion function (assuming I'm understanding what you're saying). If no type was provided to `init` but a conversion function was, it would be a case of getting the type from the function signature, right?
The reason I am saying to use the 'init' argument is that it seems to me to be a variation on what that argument already does. It controls whether the argument is passed to the generated `__init__` method. Passing a type as the value for 'init' would now behave like sort of a cross between `init=False` and `InitVar`. The field would still be created (unlike `InitVar`) but would not be automatically assigned the value passed as its corresponding argument, leaving that responsibility to `__post_init__`. Like with `InitVar`, the argument would be passed to `__post_init__` since it was not processed by `__init__`. The type annotation would continue to specify the type of the field, and the type passed to the 'init' argument would specify the type of its constructor argument.

Do you mind providing a little example of what you mean? I'm not sure I 100% understand what your use of `__post_init__` is. In my mind, it would be something like: ```py @dataclass class Foo: x: str = field(init=int, converter=chr) # which converts to class Foo: def __init__(self, x: int): self.x = chr(x) ``` without any use of `__post_init__`. If it were to be something like: ```py class Foo: def __init__(self, x: int): self.__post_init__(x) def __post_init__(x: int): self.x = chr(x) ``` which, I think is what you are suggesting (please correct me if I'm wrong), then I feel that may be confusing if you were to override `__post_init__`, which is often much easier than overriding `__init__`. For exmple, in a situation like: ```py @dataclass class Foo: x: str = field(init=int, converter=chr) y: InitVar[str] ``` if the user were to override `__post_init__`, would they know that they need to include `x` as the first argument? It's not typed with `InitVar` so it might not be clear that it's passed to `__post_init__`.

Dexter Hill wrote:
@dataclass
class Foo:
x: str = field(init=int)
def __post_init__(self, x: int):
self.x = chr(x)
# converts to
class Foo:
def __init__(self, x: int):
self.__post_init__(x)
def __post_init__(self, x: int):
self.x = chr(x)
Writing that out is helpful because now I see that the argument type can possibly be taken from the `__post_init__` signature, meaning there is no need to use the type as the value for the `init` argument to `field`. In that case, instead of `init=int`, it could maybe be something like `post_init=True`.

Ah right I see what you mean. In my example I avoided the use of `__init__` and specifically `__post_init__` as (and it's probably a fairly uncommon use case), in my actual project, `__post_init__` is defined on a base class, and inherited by all other classes, and I wanted to avoid overriding `__post_init__` (and super-ing). The idea was to have the conversion generated by the dataclass, within the `__init__` no function were required to be defined (similarly to how converters work in attrs). With your suggestion, what do you think about having something similar to `InitVar` so it's more in line with how `__post_init__` currently works? For example, like one of my other suggestions, having a type called `InitFn` which takes two types: the type for `__init__` and the type of the actual field.

Dexter Hill wrote:
Ah right I see what you mean. In my example I avoided the use of `__init__` and specifically `__post_init__` as (and it's probably a fairly uncommon use case), in my actual project, `__post_init__` is defined on a base class, and inherited by all other classes, and I wanted to avoid overriding `__post_init__` (and super-ing). The idea was to have the conversion generated by the dataclass, within the `__init__` no function were required to be defined (similarly to how converters work in attrs). With your suggestion, what do you think about having something similar to `InitVar` so it's more in line with how `__post_init__` currently works? For example, like one of my other suggestions, having a type called `InitFn` which takes two types: the type for `__init__` and the type of the actual field.
Now I see why you wanted to avoid using __post_init__. I had been thinking to try to use __post_init_ instead of adding more ways to initialize, but your reasoning makes a lot of sense. Would we want something more general that could deal with cases where the input does not have a 1-to-1 mapping to the field that differ only, perhaps, in type hint? What if we want 1 argument to initializes 2 properties or vice verse, etc.? In any case, having a new `InitFn` is worth digging into, I don't think it needs to have 2 arguments for type since the type annotation already covers 1 of those cases. I think it makes the most sense for the type annotation to apply to the property and the type of the argument to be provided either through an optional argument to `InitFn` or maybe that can be derived from the signature of the function that `InitFn` refers to.

Steve Jorgensen wrote:
Would we want something more general that could deal with cases where the input does not have a 1-to-1 mapping to the field that differ only, perhaps, in type hint? What if we want 1 argument to initializes 2 properties or vice verse, etc.? That's definitely an improvement that could be made, although I think it would require a large amount of changes. I don't know if you had syntax in mind for it, or an easy way to represent it, but at least from what I understand you would probably a whole new function like `field`, but that handles just that functionality, otherwise it would add a lot of arguments to `field`.
Steve Jorgensen wrote:
In any case, having a new `InitFn` is worth digging into, I don't think it needs to have 2 arguments for type since the type annotation already covers 1 of those cases. I think it makes the most sense for the type annotation to apply to the property and the type of the argument to be provided either through an optional argument to `InitFn` or maybe that can be derived from the signature of the function that `InitFn` refers to. So the use case would be either this:
@dataclass
class Foo:
x: InitFn[str] = field(converter=chr)
where the field `x` has the type string, and the type for the `x` parameter in `__init__` would be derrived from `chr`, or optionally: ```py @dataclass class Foo: x: InitFn[str, int] = field(converter=chr) ``` where you can provide a second type argument that specifies the type parameter for `__init__`?

Dexter Hill wrote:
How about this variation? Use with `init_using` instead of `converter` as the name of the argument to field, allow either a callable or a method name to be supplied, and expect the custom init function to behave like `__post_init__` in that it assigns to properties rather than returning a converted value. That will allow it to initialize more than 1 property. Next, we can say that if the same callable object or the same method name is passed to `init_using`, then it is called only once. Finally, we say that the class' init argument(s) and their type hints are taken from the `init_using` target. ``` @dataclass class DocumentFile: filename: str = field(init_using='_init_name_and_ctype') content_type: str = field(init_using='_init_name_and_ctype') description: str | None = field(default=None) # In this case, the function takes a `file_name` argument which is the same # as one of the property names that it initializes, but it could take an argument # with a completely different name, and the class init would have that as its # an argument instead. def _init_name_and_ctype(self, filename: str | Path = '/tmp/example.txt') -> None: self.filename = str(filename) self.content_type = mimetypes.guess_type(filename) # Roughly translates to class DocumentFile: filename: str content_type: str description: str | None def __init__(self, filename: str | Path = '/tmp/example.txt', description: str | None = None): self.description = description self._init_name_and_ctype(filename) def _init_name_and_ctype(self, file_name: str | Path = '/tmp/example.txt') -> None: self.file_name = str(file_name) self.content_type = mimetypes.guess_type(file_name) ```

That's clever, I'm a fan of that syntax. Quick question on it though - If you provide a `default` or `default_factory` to `field`, as well as `init_using`, how would that be handled? I'm thinking, `default_factory` would be made mutually exclusive, so you couldn't use it but `default` would just replace the default value in the `__init__`.

This is what attrs' converter functionality is, right? https://www.attrs.org/en/stable/init.html#converters

Didn't think about `attrs` but yes, the converter functionality at a glance looks exactly like I'd imagined it would function in dataclasses.

What type hint will be exposed for the __init__ parameter? Clearly, it's not a `str` type in your example; you're passing it an `int` value in your example. Presumably to overcome this, you'd need yet another `field` function parameter to provide the type hint for the `__init__` param? On Wed, 2022-06-22 at 20:43 +0000, Dexter Hill wrote:

Interesting point, it's not something I thought of. One solution as mentioned by Simão, and what I had in mind, is to pull the type from the first parameter of the function. We know that the function is always going to have minumum 1 parameter, and the value is always passed as the first argument. One downside is that it isn't very transparent to the user - they might not understand that the type is being taken from the first argument of the function, and wonder where it is coming from, in which case, the other solution would be to do something like `InitVar` but that takes two types (the return and the init type); something like `var: InitFn[int, str]` for the `chr` example.

Dexter Hill wrote:
What if, instead, the `init` parameter could accept either a boolean (as it does now) or a type? When given a type, that would mean that to created the property and accept the argument but pass the argument ti `__post_init__` rather than using it to initialize the property directly. The type passed to `init` would become the type hint for the argument.

I don't mind that solution although my concern is whether it would be confusing to have `init` have two different purposes depending on the argument. And, if `__post_init__` was overrided, which I would say it commonly is, that would mean the user would have to manually do the conversion, as well as remembering to add an extra argument for the conversion function (assuming I'm understanding what you're saying). If no type was provided to `init` but a conversion function was, it would be a case of getting the type from the function signature, right?

Dexter Hill wrote:
I don't mind that solution although my concern is whether it would be confusing to have `init` have two different purposes depending on the argument. And, if `__post_init__` was overrided, which I would say it commonly is, that would mean the user would have to manually do the conversion, as well as remembering to add an extra argument for the conversion function (assuming I'm understanding what you're saying). If no type was provided to `init` but a conversion function was, it would be a case of getting the type from the function signature, right?
The reason I am saying to use the 'init' argument is that it seems to me to be a variation on what that argument already does. It controls whether the argument is passed to the generated `__init__` method. Passing a type as the value for 'init' would now behave like sort of a cross between `init=False` and `InitVar`. The field would still be created (unlike `InitVar`) but would not be automatically assigned the value passed as its corresponding argument, leaving that responsibility to `__post_init__`. Like with `InitVar`, the argument would be passed to `__post_init__` since it was not processed by `__init__`. The type annotation would continue to specify the type of the field, and the type passed to the 'init' argument would specify the type of its constructor argument.

Do you mind providing a little example of what you mean? I'm not sure I 100% understand what your use of `__post_init__` is. In my mind, it would be something like: ```py @dataclass class Foo: x: str = field(init=int, converter=chr) # which converts to class Foo: def __init__(self, x: int): self.x = chr(x) ``` without any use of `__post_init__`. If it were to be something like: ```py class Foo: def __init__(self, x: int): self.__post_init__(x) def __post_init__(x: int): self.x = chr(x) ``` which, I think is what you are suggesting (please correct me if I'm wrong), then I feel that may be confusing if you were to override `__post_init__`, which is often much easier than overriding `__init__`. For exmple, in a situation like: ```py @dataclass class Foo: x: str = field(init=int, converter=chr) y: InitVar[str] ``` if the user were to override `__post_init__`, would they know that they need to include `x` as the first argument? It's not typed with `InitVar` so it might not be clear that it's passed to `__post_init__`.

Dexter Hill wrote:
@dataclass
class Foo:
x: str = field(init=int)
def __post_init__(self, x: int):
self.x = chr(x)
# converts to
class Foo:
def __init__(self, x: int):
self.__post_init__(x)
def __post_init__(self, x: int):
self.x = chr(x)
Writing that out is helpful because now I see that the argument type can possibly be taken from the `__post_init__` signature, meaning there is no need to use the type as the value for the `init` argument to `field`. In that case, instead of `init=int`, it could maybe be something like `post_init=True`.

Ah right I see what you mean. In my example I avoided the use of `__init__` and specifically `__post_init__` as (and it's probably a fairly uncommon use case), in my actual project, `__post_init__` is defined on a base class, and inherited by all other classes, and I wanted to avoid overriding `__post_init__` (and super-ing). The idea was to have the conversion generated by the dataclass, within the `__init__` no function were required to be defined (similarly to how converters work in attrs). With your suggestion, what do you think about having something similar to `InitVar` so it's more in line with how `__post_init__` currently works? For example, like one of my other suggestions, having a type called `InitFn` which takes two types: the type for `__init__` and the type of the actual field.

Dexter Hill wrote:
Ah right I see what you mean. In my example I avoided the use of `__init__` and specifically `__post_init__` as (and it's probably a fairly uncommon use case), in my actual project, `__post_init__` is defined on a base class, and inherited by all other classes, and I wanted to avoid overriding `__post_init__` (and super-ing). The idea was to have the conversion generated by the dataclass, within the `__init__` no function were required to be defined (similarly to how converters work in attrs). With your suggestion, what do you think about having something similar to `InitVar` so it's more in line with how `__post_init__` currently works? For example, like one of my other suggestions, having a type called `InitFn` which takes two types: the type for `__init__` and the type of the actual field.
Now I see why you wanted to avoid using __post_init__. I had been thinking to try to use __post_init_ instead of adding more ways to initialize, but your reasoning makes a lot of sense. Would we want something more general that could deal with cases where the input does not have a 1-to-1 mapping to the field that differ only, perhaps, in type hint? What if we want 1 argument to initializes 2 properties or vice verse, etc.? In any case, having a new `InitFn` is worth digging into, I don't think it needs to have 2 arguments for type since the type annotation already covers 1 of those cases. I think it makes the most sense for the type annotation to apply to the property and the type of the argument to be provided either through an optional argument to `InitFn` or maybe that can be derived from the signature of the function that `InitFn` refers to.

Steve Jorgensen wrote:
Would we want something more general that could deal with cases where the input does not have a 1-to-1 mapping to the field that differ only, perhaps, in type hint? What if we want 1 argument to initializes 2 properties or vice verse, etc.? That's definitely an improvement that could be made, although I think it would require a large amount of changes. I don't know if you had syntax in mind for it, or an easy way to represent it, but at least from what I understand you would probably a whole new function like `field`, but that handles just that functionality, otherwise it would add a lot of arguments to `field`.
Steve Jorgensen wrote:
In any case, having a new `InitFn` is worth digging into, I don't think it needs to have 2 arguments for type since the type annotation already covers 1 of those cases. I think it makes the most sense for the type annotation to apply to the property and the type of the argument to be provided either through an optional argument to `InitFn` or maybe that can be derived from the signature of the function that `InitFn` refers to. So the use case would be either this:
@dataclass
class Foo:
x: InitFn[str] = field(converter=chr)
where the field `x` has the type string, and the type for the `x` parameter in `__init__` would be derrived from `chr`, or optionally: ```py @dataclass class Foo: x: InitFn[str, int] = field(converter=chr) ``` where you can provide a second type argument that specifies the type parameter for `__init__`?

Dexter Hill wrote:
How about this variation? Use with `init_using` instead of `converter` as the name of the argument to field, allow either a callable or a method name to be supplied, and expect the custom init function to behave like `__post_init__` in that it assigns to properties rather than returning a converted value. That will allow it to initialize more than 1 property. Next, we can say that if the same callable object or the same method name is passed to `init_using`, then it is called only once. Finally, we say that the class' init argument(s) and their type hints are taken from the `init_using` target. ``` @dataclass class DocumentFile: filename: str = field(init_using='_init_name_and_ctype') content_type: str = field(init_using='_init_name_and_ctype') description: str | None = field(default=None) # In this case, the function takes a `file_name` argument which is the same # as one of the property names that it initializes, but it could take an argument # with a completely different name, and the class init would have that as its # an argument instead. def _init_name_and_ctype(self, filename: str | Path = '/tmp/example.txt') -> None: self.filename = str(filename) self.content_type = mimetypes.guess_type(filename) # Roughly translates to class DocumentFile: filename: str content_type: str description: str | None def __init__(self, filename: str | Path = '/tmp/example.txt', description: str | None = None): self.description = description self._init_name_and_ctype(filename) def _init_name_and_ctype(self, file_name: str | Path = '/tmp/example.txt') -> None: self.file_name = str(file_name) self.content_type = mimetypes.guess_type(file_name) ```

That's clever, I'm a fan of that syntax. Quick question on it though - If you provide a `default` or `default_factory` to `field`, as well as `init_using`, how would that be handled? I'm thinking, `default_factory` would be made mutually exclusive, so you couldn't use it but `default` would just replace the default value in the `__init__`.

This is what attrs' converter functionality is, right? https://www.attrs.org/en/stable/init.html#converters

Didn't think about `attrs` but yes, the converter functionality at a glance looks exactly like I'd imagined it would function in dataclasses.
participants (5)
-
Dexter Hill
-
Paul Bryan
-
Simão Afonso
-
Steve Jorgensen
-
Thomas Kehrenberg