Allow dataclasses's auto-generated __init__ to optionally accept **kwargs and pass them to __post_init_

Sorry for the double post, if the first one passed... I typed Enter too soon by accident :( TL;DR: Add a `strict` keyword option to the dataclass constructor which, by default (True), would keep the current behavior, but otherwise (False) would generate an __init__ that accepts arbitrary **kwargs and that passes them to an eventual __post_init__. Use case: I'm developing a client for a public API (that I don't control). I'd like this client to be well typed so my users don't have to constantly refer to the documentation for type information (or just to know which attributes exist in an object). So I turned to dataclasses (because they're fast and lead to super clean/clear code). @dataclass class Language: iso_639_1: Optional[str] name: Optional[str] Then my endpoint can look like this def get_language() -> Language: result = requests.get(...) return Language(**result.json) That's fine but it poses a problem if the API, the one I have no control over, decides overnight to add a field to the Language model, say 'english_name'. No change in the API number because to them, that's not a breaking change (I would agree). Yet every user of my client will see "TypeError: __init__() got an unexpected keyword argument 'english_name'" once this change goes live and until I get a chance to update the client code. Other clients return plain dicts or dict wrappers with __get_attr__ functionality (but without annotations so what's the point). Those wouldn't break. I've looked around for solutions and what I found (https://stackoverflow.com/questions/55099243/python3-dataclass-with-kwargsas...) ranged from "you'll have to redefine the __init__, so really you don't want dataclasses" to "define a 'from_kwargs' classmethod that will sort the input dict into two dicts, one for the __init__ and one of extra kwargs that you can do what you want with". Since I'm pretty sure I _do_ want dataclasses, that leaves me with the second solution: the from_kwargs classmethod. I like the idea but I'm not a fan of the execution. First, it means my dataclasses don't work like regular ones, since they need this special factory. Second, it does something that's pretty trivial to do with **kwargs, as we can use **kwargs unpacking to sort parameters instead of requiring at least 2 additional function calls (from_kwargs() and dataclass.fields()), a loop over the dataclass fields and the construction of yet another dict (all of which has a performance cost). My proposal would be to add a `strict=True` default option to the dataclass constructor. the default wouldn't change a thing to the current behavior. But if I declare: @dataclass(strict=False) class Language: iso_639_1: Optional[str] name: Optional[str] Then the auto-generated __init__ would look like this: def __init__(self, iso_639_1, name, **kwargs): ... self.__post_init__(..., **kwargs) # if dataclass has a __post_init__ This would allow us to achieve the from_kwargs solution in a much less verbose way, I think. @dataclass(strict=False) class Language: iso_639_1: Optional[str] name: Optional[str] extra_info: dict = field(init=False) def __post_init__(self, **kwargs) if kwargs: logger.info( f'The API returned more keys than expected for model {self.__class__.__name__}: {kwargs.keys()}. ' 'Please ensure that you have installed the latest version of the client or post an issue @ ...' ) self.extra_info = kwargs I'm not married to the name `strict` for the option, but I think the feature is interesting, if only to make dataclasses *optionally* more flexible. You don't always have control over the attributes of the data you handle, especially when it comes from external APIs. Having dataclasses that don't break when the attributes evolves can be a great safeguard. Outside of my (somewhat specific, I'll admit) use-case, it would also allow dataclasses to be used for types that are inherently flexible. Imagine: @dataclass(strict=False) class SomeTranslatableEntitiy: name: Optional[str] name_translations: dict[str, str] = field(init=False) def __post_init__(self, **kwargs) self.name_translations = { k: kwargs.pop(k) for k, v in kwargs.keys() if k.startswith('name_') # e.g: 'name_en', 'name_fr' } Thanks for reading :)

Oops, there's an indent error for the `extra_info: dict = field(init=False)` and that last example should be: def __post_init__(self, **kwargs) self.name_translations = { k: kwargs.pop(k) for k in kwargs.keys() if k.startswith('name_') # e.g: 'name_en', 'name_fr' }

You could write your own decorator to add this functionality for you. Something like: ----------------------------------- from dataclasses import dataclass def add_kw(cls): original_init = cls.__init__ def new_init(self, x, y, **kwargs): print('init called with additional args', kwargs) original_init(self, x, y) cls.__init__ = new_init return cls @add_kw @dataclass class C: x: int y: int c = C(1, 2, a='a', b='b') print(c) ----------------------------------- To be general purpose, you'd need to dynamically generate new_init and the call to original_init, but it's all doable. And you'd need to decide what to do with the kwargs. I'm not sure you'd want to use __post_init__, maybe you'd want to define your own method to call, if it exists. Eric On 9/20/2021 10:28 AM, thomas.d.mckay@gmail.com wrote:

Thanks for the reply Eric (and for dataclasses). Right now, my solution is pretty much that, except I do it by monkey-patching dataclasses._init_fn which, I know, isn't the greatest solution I could find. Your double decorator solution is cleaner. I'll try that instead. I still end up copying some meta code from the dataclasses module to generate the new_init signature, which I would have preferred to avoid, hence the feature proposal, but I understand that this may have limited interest and/or is maybe outside the scope of dataclasses.

In 3.10 you can specify keyword-only arguments to dataclasses. See https://docs.python.org/3.10/library/dataclasses.html, search for kw_only. This doesn't address Thomas's issue in any way, but I thought I'd mention it. Eric On 9/20/2021 3:36 PM, Paul Bryan wrote:

My solution, if anyone comes by here looking for one: from dataclasses import _create_fn def patched_create_fn(name: str, args: list[str], body: list[str], **kwargs): if name == '__init__': args.append('**kwargs') if body == ['pass']: body = [] body.append(f'{args[0]}.__extra_kwargs__ = kwargs') return _create_fn(name, args, body, **kwargs) Then you can either apply the patch at the top of a module where you declare dataclasses and restore the original function at the bottom or you can encapsulate the patching logic in a context manager and create your dataclasses inside the scope of the contextmanager. I tried the decorator route but it is not easy or clean. I either ended up importing private stuff from the dataclasses module or duplicating entire code code blocks just to ensure I didn't break stuff (e.g.: if a field has a default_factory, for instance). I don't think the dataclasses module is meant to be easily extended so this was never going to result in clean code but this does the job with minimal hassle and minimal importing of private stuff from the dataclasses module. I do wish the dataclasses module could evolve a bit to either implement more options like the one proposed initially or, even better, give an easier/clearer path for users to customize the generation of dataclasses.

Oops, there's an indent error for the `extra_info: dict = field(init=False)` and that last example should be: def __post_init__(self, **kwargs) self.name_translations = { k: kwargs.pop(k) for k in kwargs.keys() if k.startswith('name_') # e.g: 'name_en', 'name_fr' }

You could write your own decorator to add this functionality for you. Something like: ----------------------------------- from dataclasses import dataclass def add_kw(cls): original_init = cls.__init__ def new_init(self, x, y, **kwargs): print('init called with additional args', kwargs) original_init(self, x, y) cls.__init__ = new_init return cls @add_kw @dataclass class C: x: int y: int c = C(1, 2, a='a', b='b') print(c) ----------------------------------- To be general purpose, you'd need to dynamically generate new_init and the call to original_init, but it's all doable. And you'd need to decide what to do with the kwargs. I'm not sure you'd want to use __post_init__, maybe you'd want to define your own method to call, if it exists. Eric On 9/20/2021 10:28 AM, thomas.d.mckay@gmail.com wrote:

Thanks for the reply Eric (and for dataclasses). Right now, my solution is pretty much that, except I do it by monkey-patching dataclasses._init_fn which, I know, isn't the greatest solution I could find. Your double decorator solution is cleaner. I'll try that instead. I still end up copying some meta code from the dataclasses module to generate the new_init signature, which I would have preferred to avoid, hence the feature proposal, but I understand that this may have limited interest and/or is maybe outside the scope of dataclasses.

In 3.10 you can specify keyword-only arguments to dataclasses. See https://docs.python.org/3.10/library/dataclasses.html, search for kw_only. This doesn't address Thomas's issue in any way, but I thought I'd mention it. Eric On 9/20/2021 3:36 PM, Paul Bryan wrote:

My solution, if anyone comes by here looking for one: from dataclasses import _create_fn def patched_create_fn(name: str, args: list[str], body: list[str], **kwargs): if name == '__init__': args.append('**kwargs') if body == ['pass']: body = [] body.append(f'{args[0]}.__extra_kwargs__ = kwargs') return _create_fn(name, args, body, **kwargs) Then you can either apply the patch at the top of a module where you declare dataclasses and restore the original function at the bottom or you can encapsulate the patching logic in a context manager and create your dataclasses inside the scope of the contextmanager. I tried the decorator route but it is not easy or clean. I either ended up importing private stuff from the dataclasses module or duplicating entire code code blocks just to ensure I didn't break stuff (e.g.: if a field has a default_factory, for instance). I don't think the dataclasses module is meant to be easily extended so this was never going to result in clean code but this does the job with minimal hassle and minimal importing of private stuff from the dataclasses module. I do wish the dataclasses module could evolve a bit to either implement more options like the one proposed initially or, even better, give an easier/clearer path for users to customize the generation of dataclasses.
participants (3)
-
Eric V. Smith
-
Paul Bryan
-
thomas.d.mckay@gmail.com