[Python-Dev] PEP 469: Restoring the iterkeys/values/items() methods

Sun Apr 20 06:03:58 CEST 2014

On 4/19/2014 10:52 AM, Guido van Rossum wrote:
> Does everyone involved know that "for x in d.iterkeys()" is equivalent
> to "for x in d"

Looking at uses I found by searching code.ohloh.net, the answer is 
either 'No, people sometimes add a redundant .iterkeys()' or 'people are 
writing non-dict mapping classes for which it is not redundant (perhaps 
because their custom class iterates by items rather than keys by 
default)'. I could not tell from the quoted snippet.

> and works the same in Python 2 and 3? Similarly,
> "list(d)" is a simple, fast way to spell the Python 2 semantics of
> "d.keys()"  that works in both versions (but I doubt it is much needed
> -- usually the actual code follows up with sorting, so you should use
> sorted(d)).
>
> This doesn't solve itervalues() and iteritems() but I expect those are
> less common,

ohloh gives about 77,000 python hits for iteritems and 16,000 for 
itervalues. A large fraction of itervalue hits are definitions rather 
than uses, often from a compat.py (is this from six?)

if sys.version_info[0] >= 3:
     text_type = str
     string_types = str,
     iteritems = lambda o: o.items()
     itervalues = lambda o: o.values()
     izip = zip

else:
     text_type = unicode
     string_types = basestring,
     iteritems = lambda o: o.iteritems()
     itervalues = lambda o: o.itervalues()
     from itertools import izip

This is three hits for iteritems and three for itervalues and none for 
the unneeded iterkeys. My guess is that there are 5000 itervalue uses 
and 70000 iteritem uses.

There are 1,500,000 python hits for 'for', some unknown fraction of 
which are 'for key in somedict' or 'for key in somedict.keys()'.  There 
are 13000 for iterkeys. As noted above, this is *not* inflated by 3 hits 
for each use of compat.py. I think 10% or 150000 iterations by key might 
be a reasonable guess.

There are other definition sets that include iterkeys or that define 
functions that wrap all three bound methods for a particular dict.

iterkeys = lambda: d.iterkeys() # py2
iterkeys = lambda: d.keys()  # py3

> and "for x, y in d.iteritems(): <blah>" is rewritten nicely as
>
>    for x in d:
>      y = d[x]
>      <blah>
>
> If there is a measurable slowdown in the latter I would be totally okay
> with some kind of one-element cache for the most recent lookup.

About three weeks ago, Raymond opened http://bugs.python.org/issue21101 
with this claim: "It is reasonably common to make two successive 
dictionary accesses with the same key."  I proposed a specialized 
caching as an alternative to adding new C API functions.

Using the iteritems function, there is one simple, extra function call 
for the entire loop. If the body of the loop takes at lease as long as 
that one call, the extra time is a non-issue if the dict has more than, 
say, 20 items.

-- 
Terry Jan Reedy