namedtuple for dict.items()/collections.abc.Mappings.items()
I have a lot of code that looks like this: def filter(self, it, defs): for x in it: for y in _match_helper(self.key, defs, x[0]): yield (y, x[1]) def filter(self, it): for el in it: try: if self.compiled.search(el[0]): yield el elif not self.skippable: raise ValidationError except TypeError: if not self.skippable: raise ValidationError def filter(self, it, ty): for el in it: # this may TypeError if ty is not a type nor a tuple of types # but that's actually the programmer's error if isinstance(el[1], ty): yield el elif not self.skippable: # and this one is for actual validation raise ValidationError It'd be quite nice if dict.items() returned a namedtuple so all these x[0], x[1], el[0], el[1], etc would instead be x.key, x.value, el.key, el.value, etc. It would be more readable and more maintainable.
On Nov 30, 2019, at 13:19, Soni L. <fakedme+py@gmail.com> wrote:
I have a lot of code that looks like this:
def filter(self, it, defs): for x in it: for y in _match_helper(self.key, defs, x[0]): yield (y, x[1])
Try destructuring it: for key, value in it: And now you can use key instead of x[0] and value instead of x[1]. The loop target in a for loop is like a target in an assignment statement, and can do most of the same things. (This doesn’t work for all bindings—you can’t destructure an argument in a parameter list, or from spam import * as first, *rest—but as a first approximation it works anywhere it wouldn’t be confusing, which is usually all you need to remember.)
It'd be quite nice if dict.items() returned a namedtuple so all these x[0], x[1], el[0], el[1], etc would instead be x.key, x.value, el.key, el.value, etc.
I think dict would want to implement it as a C-level structseq (like stat results) instead of a Python-level namedtuple for convenience and performance, but that’s not a big issue. Anyway, I think this would be a good idea. But I’m not sure it’s feasible The problem is that items() is part of a protocol that many types implement, including many third-party types (sortedcontainers.SortedDict, pyrsistent.PMap, pyobjc.NSDictionary, java.collections.Map, whatever the type is for attribute mappings in beautifulsoup, etc., not to mention project-internal types). A whole lot of code is written to work with ,“any mapping”, and all of that code will have to keep using [0] and [1] instead of .key and .value until every mapping type it might be handed has been updated. (That includes the constructors of most of those types, which can handle any object with an items() method just like dict can.) You could get about 75% of the way there pretty easily. This structseq/namedtuple/whatever type could be given a name, and collections.abc.ItemsView could iterate values of that type, so everyone who inherits the items method from the Mapping mixin or builds their own items view that inherits the ItemsView mixin gets the new behavior. Together with types that inherit itend from dict, or delegate it to dict without transforming the results, that covers a lot of mapping types. But it still doesn’t cover all of them. And I don’t think even a future or deprecation schedule would help here. You could document that, starting in 3.12, items must be an 2-item sequence with key and value attributes that are equal to the first and second element instead of just being an sequence with two values. But that’s not something the ABC can test for you, so I don’t think it would have the desired result of pushing the whole ecosystem to change in 4.5 years; people would have to keep using [0] and [1] for many years to come if they wanted to work with all mapping types.
On Sat, Nov 30, 2019 at 06:16:49PM -0300, Soni L. wrote:
It'd be quite nice if dict.items() returned a namedtuple so all these x[0], x[1], el[0], el[1], etc would instead be x.key, x.value, el.key, el.value, etc. It would be more readable and more maintainable.
If you are doing for item in somedict.items(): process(item[0]) process(item[1]) you could do this instead: for key, value in somedict.items(): process(key) process(value) Does that help? -- Steven
On Sat, 30 Nov 2019 at 22:24, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Nov 30, 2019 at 06:16:49PM -0300, Soni L. wrote:
It'd be quite nice if dict.items() returned a namedtuple so all these x[0], x[1], el[0], el[1], etc would instead be x.key, x.value, el.key, el.value, etc. It would be more readable and more maintainable.
If you are doing
for item in somedict.items(): process(item[0]) process(item[1])
you could do this instead:
for key, value in somedict.items(): process(key) process(value)
You can also make your own function to get the items as namedtuples. That can work now with any class that defines items the current way. from collections import namedtuple Item = namedtuple('Item', ['key', 'value']) def nameditems(d): return (Item(*t) for t in d.items()) d = {'a': 1, 'b': 2} for item in nameditems(d): print(item.key, item.value) Comparing that with Steve's example above though I don't see the advantage of namedtuples here. -- Oscar
On Nov 30, 2019, at 16:36, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On Sat, 30 Nov 2019 at 22:24, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Nov 30, 2019 at 06:16:49PM -0300, Soni L. wrote:
It'd be quite nice if dict.items() returned a namedtuple so all these x[0], x[1], el[0], el[1], etc would instead be x.key, x.value, el.key, el.value, etc. It would be more readable and more maintainable.
If you are doing
for item in somedict.items(): process(item[0]) process(item[1])
you could do this instead:
for key, value in somedict.items(): process(key) process(value)
You can also make your own function to get the items as namedtuples. That can work now with any class that defines items the current way.
from collections import namedtuple
Item = namedtuple('Item', ['key', 'value'])
def nameditems(d): return (Item(*t) for t in d.items())
d = {'a': 1, 'b': 2}
for item in nameditems(d): print(item.key, item.value)
Comparing that with Steve's example above though I don't see the advantage of namedtuples here.
Presumably the main advantage is for cases where you can’t destructure the tuple in-place: sorted(d.items(), key=lambda it: it.value) There’s no nice way to write that today. Maybe this makes it clear? sorted(d.items(), key=(ValueGetter := operator.itemgetter(1))) But normally you don’t bother; you just live with using [1] and assuming your reader will know that [1] on a mapping item is the value. Which isn’t terrible, because it almost always is obvious you’ve got a mapping item, and almost every reader does know what [1] means there. But it’s not as nice as using .value would be. As a secondary advantage, if you’ve been using some other language and accidentally write `for value, key in d.items()` it will appear correct but then do the wrong thing inside the loop. (And if I’m trying to fix your code, I might not even notice that you got it backward until after a couple hours banging my head on the debugger.) With a namedtuple, there’s no way to mix up the names. I don’t think this comes up nearly as often with dict items as with, say, stat struct values, so it’s not a huge issue, but it’s not completely negligible.
On 11/30/2019 8:51 PM, Andrew Barnert via Python-ideas wrote:
On Nov 30, 2019, at 16:36, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On Sat, 30 Nov 2019 at 22:24, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Nov 30, 2019 at 06:16:49PM -0300, Soni L. wrote:
It'd be quite nice if dict.items() returned a namedtuple so all these x[0], x[1], el[0], el[1], etc would instead be x.key, x.value, el.key, el.value, etc. It would be more readable and more maintainable. If you are doing
for item in somedict.items(): process(item[0]) process(item[1])
you could do this instead:
for key, value in somedict.items(): process(key) process(value) You can also make your own function to get the items as namedtuples. That can work now with any class that defines items the current way.
from collections import namedtuple
Item = namedtuple('Item', ['key', 'value'])
def nameditems(d): return (Item(*t) for t in d.items())
d = {'a': 1, 'b': 2}
for item in nameditems(d): print(item.key, item.value)
Comparing that with Steve's example above though I don't see the advantage of namedtuples here. Presumably the main advantage is for cases where you can’t destructure the tuple in-place:
sorted(d.items(), key=lambda it: it.value)
There’s no nice way to write that today. Maybe this makes it clear?
sorted(d.items(), key=(ValueGetter := operator.itemgetter(1)))
But normally you don’t bother; you just live with using [1] and assuming your reader will know that [1] on a mapping item is the value. Which isn’t terrible, because it almost always is obvious you’ve got a mapping item, and almost every reader does know what [1] means there. But it’s not as nice as using .value would be.
How I miss python 2's parameter unpacking:
sorted({1:300, 2:4}.items(), key=lambda (key, value): value) [(2, 4), (1, 300)]
Eric
On 11/30/2019 8:51 PM, Andrew Barnert via Python-ideas wrote:
On Nov 30, 2019, at 16:36, Oscar Benjamin <oscar.j.benjamin@gmail.com>
wrote:
On Sat, 30 Nov 2019 at 22:24, Steven D'Aprano <steve@pearwood.info>
wrote:
On Sat, Nov 30, 2019 at 06:16:49PM -0300, Soni L. wrote:
It'd be quite nice if dict.items() returned a namedtuple so all these x[0], x[1], el[0], el[1], etc would instead be x.key, x.value, el.key, el.value, etc. It would be more readable and more maintainable.
If you are doing
for item in somedict.items(): process(item[0]) process(item[1])
you could do this instead:
for key, value in somedict.items(): process(key) process(value)
You can also make your own function to get the items as namedtuples. That can work now with any class that defines items the current way.
from collections import namedtuple
Item = namedtuple('Item', ['key', 'value'])
def nameditems(d): return (Item(*t) for t in d.items())
d = {'a': 1, 'b': 2}
for item in nameditems(d): print(item.key, item.value)
Comparing that with Steve's example above though I don't see the advantage of namedtuples here.
Presumably the main advantage is for cases where you can’t destructure
What about keys that contain invalid characters for attribute names? items = {'1': 1, 'two-3': 4,} x = object() x.__dict__.update(items) # dangerous x = AttrDict(**items) x.1 # error x.two-3 # error On Saturday, November 30, 2019, Eric V. Smith <eric@trueblade.com> wrote: the tuple in-place:
sorted(d.items(), key=lambda it: it.value)
There’s no nice way to write that today. Maybe this makes it clear?
sorted(d.items(), key=(ValueGetter := operator.itemgetter(1)))
But normally you don’t bother; you just live with using [1] and assuming
your reader will know that [1] on a mapping item is the value. Which isn’t terrible, because it almost always is obvious you’ve got a mapping item, and almost every reader does know what [1] means there. But it’s not as nice as using .value would be.
How I miss python 2's parameter unpacking:
sorted({1:300, 2:4}.items(), key=lambda (key, value): value) [(2, 4), (1, 300)]
Eric _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HRCAVI... Code of Conduct: http://python.org/psf/codeofconduct/
On Nov 30, 2019, at 20:21, Wes Turner <wes.turner@gmail.com> wrote:
What about keys that contain invalid characters for attribute names?
What about them?
items = {'1': 1, 'two-3': 4,} x = object() x.__dict__.update(items) # dangerous x = AttrDict(**items) x.1 # error x.two-3 # error
The message you quoted was about how in Python 2 (but not 3) you could destructure parameters: sorted({1:300, 2:4}.items(), key=lambda (key, value): value) The wider discussion is about how if items() were a view of namedtuples instead of just sequences you could do something even better: sorted({1:300, 2:4}.items(), key=lambda it: it.value) What does either of those have to do with using a dict whose keys are not identifiers as an attribute dictionary for an object? Neither restoring Python 2’s parameter destructuring nor making items namedtuples would in any way affect any of the code you wrote.
30.11.19 23:16, Soni L. пише:
It'd be quite nice if dict.items() returned a namedtuple so all these x[0], x[1], el[0], el[1], etc would instead be x.key, x.value, el.key, el.value, etc. It would be more readable and more maintainable.
It was discussed before. The problem is that creating, using and destroying a tuple is much faster than a named tuple. Many parts of Python have special optimizations for tuples. It is critical for the whole Python. So making dict.items() returning named tuples will harm every Python program. Also unpacking a tuple and using local variables key and value is faster and more convenient than using attribute access. Tuples is one of the best thing in Python. Let enjoy of using them and do not make the life harder.
My mistake, I skimmed and assumed that the question was asking for this: from collections import namedtuple data = {'1': 1, 'two-2': 2} x = namedtuple('x', data.keys()) # ValueError: Type names and field names must be valid identifiers: '1' # But now understand that the request was for this: from collections import namedtuple Item = namedtuple('Item', ['key', 'value']) def items_tuple(self): for key, value in self.items(): yield Item(key, value) # So that this test would pass: def test_items(): data = {'1': 1, 'two-2': 2} for item in data.items(): assert hasattr('key', item) assert hasattr('value', item) key, value = item assert item.key == key assert item.value == value # FWIW, here are rough timings: data = dict.fromkeys(range(10000)) def timeit(code): print(f">>> {code}") get_ipython().run_line_magic('timeit', code) timeit('for item in data.items(): item[0], item[1]') timeit('for key, value in data.items(): key, value') timeit('for item in items_tuple(data): item.key, item.value')
for item in data.items(): item[0], item[1] 874 µs ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for key, value in data.items(): key, value 524 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for item in items_tuple(data): item.key, item.value 5.82 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
On Sun, Dec 1, 2019 at 1:24 AM Andrew Barnert <abarnert@yahoo.com> wrote:
On Nov 30, 2019, at 20:21, Wes Turner <wes.turner@gmail.com> wrote:
What about keys that contain invalid characters for attribute names?
What about them?
items = {'1': 1, 'two-3': 4,} x = object() x.__dict__.update(items) # dangerous x = AttrDict(**items) x.1 # error x.two-3 # error
The message you quoted was about how in Python 2 (but not 3) you could destructure parameters:
sorted({1:300, 2:4}.items(), key=lambda (key, value): value)
The wider discussion is about how if items() were a view of namedtuples instead of just sequences you could do something even better:
sorted({1:300, 2:4}.items(), key=lambda it: it.value)
What does either of those have to do with using a dict whose keys are not identifiers as an attribute dictionary for an object? Neither restoring Python 2’s parameter destructuring nor making items namedtuples would in any way affect any of the code you wrote.
for item in data.items(): item[0], item[1] 874 µs ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for key, value in data.items(): key, value 524 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for item in items_tuple(data): item.key, item.value 5.82 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Thanks for sharing the results, in particular the amount of difference between "for item in data.items(): item[0], item[1]" and "for key in data.items(): key, value" is a bit surprising to me. I'd have assumed they'd be a bit closer in performance. I expected the named tuple to be significantly slower than the other two, but not quite by that much. Good to know. I'm -1 on the proposal overall. It's not a bad idea, but in practice it would likely be too much of a detriment to performance and backwards compatibility to "dict.items()". I wouldn't be opposed to considering a different method though, such as "dict.named_items()" or something similar that allowed usage of "item.key" and "item.value". On Sun, Dec 1, 2019 at 7:56 AM Wes Turner <wes.turner@gmail.com> wrote:
My mistake, I skimmed and assumed that the question was asking for this:
from collections import namedtuple data = {'1': 1, 'two-2': 2} x = namedtuple('x', data.keys()) # ValueError: Type names and field names must be valid identifiers: '1'
# But now understand that the request was for this:
from collections import namedtuple Item = namedtuple('Item', ['key', 'value'])
def items_tuple(self): for key, value in self.items(): yield Item(key, value)
# So that this test would pass:
def test_items(): data = {'1': 1, 'two-2': 2} for item in data.items(): assert hasattr('key', item) assert hasattr('value', item) key, value = item assert item.key == key assert item.value == value
# FWIW, here are rough timings:
data = dict.fromkeys(range(10000))
def timeit(code): print(f">>> {code}") get_ipython().run_line_magic('timeit', code)
timeit('for item in data.items(): item[0], item[1]') timeit('for key, value in data.items(): key, value') timeit('for item in items_tuple(data): item.key, item.value')
for item in data.items(): item[0], item[1] 874 µs ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for key, value in data.items(): key, value 524 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for item in items_tuple(data): item.key, item.value 5.82 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
On Sun, Dec 1, 2019 at 1:24 AM Andrew Barnert <abarnert@yahoo.com> wrote:
On Nov 30, 2019, at 20:21, Wes Turner <wes.turner@gmail.com> wrote:
What about keys that contain invalid characters for attribute names?
What about them?
items = {'1': 1, 'two-3': 4,} x = object() x.__dict__.update(items) # dangerous x = AttrDict(**items) x.1 # error x.two-3 # error
The message you quoted was about how in Python 2 (but not 3) you could
destructure parameters:
sorted({1:300, 2:4}.items(), key=lambda (key, value): value)
The wider discussion is about how if items() were a view of namedtuples
instead of just sequences you could do something even better:
sorted({1:300, 2:4}.items(), key=lambda it: it.value)
What does either of those have to do with using a dict whose keys are
not identifiers as an attribute dictionary for an object? Neither restoring Python 2’s parameter destructuring nor making items namedtuples would in any way affect any of the code you wrote.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OA7NQR... Code of Conduct: http://python.org/psf/codeofconduct/
On 2019-12-01 10:11 a.m., Kyle Stanley wrote:
for item in data.items(): item[0], item[1] 874 µs ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for key, value in data.items(): key, value 524 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for item in items_tuple(data): item.key, item.value 5.82 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Thanks for sharing the results, in particular the amount of difference between "for item in data.items(): item[0], item[1]" and "for key in data.items(): key, value" is a bit surprising to me. I'd have assumed they'd be a bit closer in performance. I expected the named tuple to be significantly slower than the other two, but not quite by that much. Good to know.
I'm -1 on the proposal overall. It's not a bad idea, but in practice it would likely be too much of a detriment to performance and backwards compatibility to "dict.items()". I wouldn't be opposed to considering a different method though, such as "dict.named_items()" or something similar that allowed usage of "item.key" and "item.value".
I see no reason why named items couldn't be optimized on the C side, especially for the common case of destructuring. I'd like to see a run for "for key, value in items_tuple(data): key, value". I wonder how much is the cost of the generator, how much of the namedtuple creation itself, and how much of the attribute access.
Optimizations to namedtuple would likely be welcomed. __slots__ is the optimization for objects that don't need dicts. Ordered by performance: tuple, namedtuple, object, dataclass. A more raw struct would be faster: https://docs.python.org/3/c-api/tuple.html#struct-sequence-objects cProfile/profile module: https://docs.python.org/3/library/profile.html The %prun magic command runs a statement through the profiler. https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-prun %run -p modulename[.py] runs a script through the profiler. https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-run On Sunday, December 1, 2019, Soni L. <fakedme+py@gmail.com> wrote:
On 2019-12-01 10:11 a.m., Kyle Stanley wrote:
for item in data.items(): item[0], item[1] 874 µs ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for key, value in data.items(): key, value 524 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for item in items_tuple(data): item.key, item.value 5.82 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Thanks for sharing the results, in particular the amount of difference
I'm -1 on the proposal overall. It's not a bad idea, but in practice it would likely be too much of a detriment to performance and backwards compatibility to "dict.items()". I wouldn't be opposed to considering a different method though, such as "dict.named_items()" or something similar
between "for item in data.items(): item[0], item[1]" and "for key in data.items(): key, value" is a bit surprising to me. I'd have assumed they'd be a bit closer in performance. I expected the named tuple to be significantly slower than the other two, but not quite by that much. Good to know. that allowed usage of "item.key" and "item.value".
I see no reason why named items couldn't be optimized on the C side,
especially for the common case of destructuring. I'd like to see a run for "for key, value in items_tuple(data): key, value". I wonder how much is the cost of the generator, how much of the namedtuple creation itself, and how much of the attribute access.
participants (8)
-
Andrew Barnert
-
Eric V. Smith
-
Kyle Stanley
-
Oscar Benjamin
-
Serhiy Storchaka
-
Soni L.
-
Steven D'Aprano
-
Wes Turner