Add a `dig` method to dictionaries supporting the retrieval of nested keys
I did some searching to see if this has already been proposed before, but didn't find any evidence that it has. If so, let me know and I'll go away :) One of the tasks that I encounter frequently enough is retrieving a nested key from a nested collection of dictionaries (a dictionary of dictionaries that can be any number of layers deep). There are multiple ways of tackling this, normally done with `reduce()`, iterative looping or chained `get()`s and then handling `None`, `KeyError` or `AttributeError` appropriately. I'd like to avoid this extra code and logic and have a built-in method to facilitate this common data access pattern. I'd like to propose that we add a `dig()` method to dictionaries analogous to Ruby's `dig()` method for Ruby Hashes (reference: https://ruby-doc.org/core-2.6.0.preview2/Hash.html#method-i-dig). Initially, I'd suggest that we only support nested dictionaries, but we could also support lists and other collection types as Ruby does if we really want to. Similar to the existing `get()` method on dictionaries, I'd propose that the method return `None` if any of the keys in the chain is not found, avoiding `KeyError`. Thoughts?
Have you tried writing your own helper function `dig(d, *args)`? It should only take a few lines, and you can easily put that in a library of "utility" functions you carry around with you from project to project. Not everything needs to be built in! On Thu, Aug 29, 2019 at 1:53 PM None via Python-ideas < python-ideas@python.org> wrote:
I did some searching to see if this has already been proposed before, but didn't find any evidence that it has. If so, let me know and I'll go away :)
One of the tasks that I encounter frequently enough is retrieving a nested key from a nested collection of dictionaries (a dictionary of dictionaries that can be any number of layers deep). There are multiple ways of tackling this, normally done with `reduce()`, iterative looping or chained `get()`s and then handling `None`, `KeyError` or `AttributeError` appropriately. I'd like to avoid this extra code and logic and have a built-in method to facilitate this common data access pattern.
I'd like to propose that we add a `dig()` method to dictionaries analogous to Ruby's `dig()` method for Ruby Hashes (reference: https://ruby-doc.org/core-2.6.0.preview2/Hash.html#method-i-dig). Initially, I'd suggest that we only support nested dictionaries, but we could also support lists and other collection types as Ruby does if we really want to. Similar to the existing `get()` method on dictionaries, I'd propose that the method return `None` if any of the keys in the chain is not found, avoiding `KeyError`.
Thoughts? _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QFL76S... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Have you considered using collections.ChainMap for this kind of thing? The root/first dictionary keys could each refer to the nested dictionaries, and you could access all the nested keys like this: my_chainmap.parents.keys()
This sounds to me like functionality of a specific type (more specific than `dict`). `dict` can have any key-value pairs, it is not necessarily a nested structure of dicts. Thus having a method for such a specific use case with the general dict type doesn't feel right. I think it's better if such functionality is provided by whatever infrastructure creates those specific data structures (the nested dicts). For example there is this project: [pyhocon](https://github.com/chimpler/pyhocon/) , a HOCON parser for Python which supports exactly that syntax: conf['databases.mysql.host'] # conf is a nested dict of depth 3 Also writing a custom function isn't too much work: def dig(d, *keys, default=None): obj = d.get(keys[0], default) if len(keys) > 1: return dig(obj, *keys[1:], default=default) if isinstance(obj, dict) else default return obj
Thanks everyone for the feedback and suggestions. I agree that there are many ways one could easily implement this (chaining, reduce, looping, etc.). I could continue to maintain a utility function and copy that around to all of code bases where I need this functionality, which is what I do today. I think we could do better, though. It is cumbersome to update and copy around a common function across organizations and code bases. Then I have to educate the rest of the dev team and tell them that there is a function that facilitates nested key retrievals. Another alternative would be to create a common package for this and update all of my projects to use that. Given that it would be 5-15 lines of code, that feels like it would be too small to be a robust dependency and would smell like some of the minuscule dependencies in the Javascript ecosystem (example: https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how-to-program/). I really believe that a nested key retrieval mechanism should be a first-class offering of the standard library. It is extremely common in the Python ecosystem to find developers working with data sets comprised of nested data structures. Serializing and deserializing JSON is one of the most common functions developers do today, too. As this is a common task being performed by hundreds of thousands of developers, wouldn't it better if we had one canonical way to do it (in the spirit of PEP-20 and having one obvious way to do things)?
On Tue, 3 Sep 2019 at 13:38, None via Python-ideas <python-ideas@python.org> wrote:
I really believe that a nested key retrieval mechanism should be a first-class offering of the standard library. It is extremely common in the Python ecosystem to find developers working with data sets comprised of nested data structures. Serializing and deserializing JSON is one of the most common functions developers do today, too. As this is a common task being performed by hundreds of thousands of developers, wouldn't it better if we had one canonical way to do it (in the spirit of PEP-20 and having one obvious way to do things)?
There's a PyPI package, glom (https://pypi.org/project/glom/) that appears to do what you are after, as well as a lot more. Maybe that is something you should look into. There's obviously a judgement call involved, but "not every 3-line function needs to be a builtin" (or by extension a stdlib function) is a common principle used here. In this case, any individual case is probably a simple 3-line function, but trying to write something flexible enough to cater for all (or even just most) of the endless variations of questions like "should it be an error if a nested item isn't a dictionary, or should that be treated as item not found?" is a significant undertaking, and it's likely difficult to come up with a usable design. So custom implementations, tailored to the needs of individual projects, seems like a good way to go (to me, at least). If someone (like the author of the glom package) finds a more general approach that solves a broad class of problems, making it a package on PyPI is a good next step. And if it turns out that the PyPI package becomes a clear example of "best of breed", and has stabilised to the point where the slow pace of change in the stdlib is acceptable, then it might be worth making into a stdlib function. Yes, it's a common enough requirement, but only in a broad sense (the details seem to differ each time). Paul
On 3 Sep 2019, at 14:56, Paul Moore <p.f.moore@gmail.com> wrote:
On Tue, 3 Sep 2019 at 13:38, None via Python-ideas <python-ideas@python.org> wrote:
I really believe that a nested key retrieval mechanism should be a first-class offering of the standard library. It is extremely common in the Python ecosystem to find developers working with data sets comprised of nested data structures. Serializing and deserializing JSON is one of the most common functions developers do today, too. As this is a common task being performed by hundreds of thousands of developers, wouldn't it better if we had one canonical way to do it (in the spirit of PEP-20 and having one obvious way to do things)?
There's a PyPI package, glom (https://pypi.org/project/glom/) that appears to do what you are after, as well as a lot more. Maybe that is something you should look into.
There are many more similar too. There's one in tri.declarative: https://trideclarative.readthedocs.io/en/latest/#get-set-attribute-given-a-p... <https://trideclarative.readthedocs.io/en/latest/#get-set-attribute-given-a-path-string> There's one in pyrsistent: https://github.com/tobgu/pyrsistent#transformations <https://github.com/tobgu/pyrsistent#transformations> (the docs mostly talk about the transformation part because it's the thing you probably want in pyrsistent). I agree fully with your point that use cases differ so the functions will differ. In glom the paths are "a.b.c", in tri.declarative they are "a__b__c" and in pyrsistent they are ['a', 'b', 'c']. All three makes sense from their respective use cases, and they can't be unified. / Anders
It seems most of the folks on this thread have similar feelings on this, so I will drop this idea. We'll probably standardize on using glom for now.
ConfgObj used to have a similar method I believe. From fetching a value from a nested section by supplying the full path (as an iterable I believe). Michael On Tue, 3 Sep 2019 at 15:36, James Livermont via Python-ideas < python-ideas@python.org> wrote:
It seems most of the folks on this thread have similar feelings on this, so I will drop this idea. We'll probably standardize on using glom for now. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/APQMPE... Code of Conduct: http://python.org/psf/codeofconduct/
-- Michael Foord Python Consultant, Contractor and Trainer https://agileabstractions.com/
I recommend you take at look at the "toolz" library, which provides assorted APIs to help in data structure manipulation: https://toolz.readthedocs.io/en/latest/index.html Especially this function: https://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoolz.get_in Regards Antoine. On Thu, 29 Aug 2019 15:43:43 -0000 None via Python-ideas <python-ideas@python.org> wrote:
I did some searching to see if this has already been proposed before, but didn't find any evidence that it has. If so, let me know and I'll go away :)
One of the tasks that I encounter frequently enough is retrieving a nested key from a nested collection of dictionaries (a dictionary of dictionaries that can be any number of layers deep). There are multiple ways of tackling this, normally done with `reduce()`, iterative looping or chained `get()`s and then handling `None`, `KeyError` or `AttributeError` appropriately. I'd like to avoid this extra code and logic and have a built-in method to facilitate this common data access pattern.
I'd like to propose that we add a `dig()` method to dictionaries analogous to Ruby's `dig()` method for Ruby Hashes (reference: https://ruby-doc.org/core-2.6.0.preview2/Hash.html#method-i-dig). Initially, I'd suggest that we only support nested dictionaries, but we could also support lists and other collection types as Ruby does if we really want to. Similar to the existing `get()` method on dictionaries, I'd propose that the method return `None` if any of the keys in the chain is not found, avoiding `KeyError`.
Thoughts? _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QFL76S... Code of Conduct: http://python.org/psf/codeofconduct/
participants (9)
-
Anders Hovmöller
-
Antoine Pitrou
-
Dominik Vilsmeier
-
Guido van Rossum
-
j_livermont@yahoo.com
-
James Livermont
-
Michael Foord
-
Paul Moore
-
Ricky Teachey