Re: [Python-ideas] dict.fromkeys() better as dict().setkeys() ? (and other suggestions)

May 29, 2007

      Ron Adam <rrr@ronadam.com> wrote:
...
Josiah Carlson wrote:
...
Ron Adam <rrr@ronadam.com> wrote:
...
The dictionary fromkeys method seems out of place as well as miss-named.  IMHO
It is perfectly named (IMNSHO ;), create a dictionary from the keys
provided; dict.fromkeys() .
That's ok, I won't hold it against you.  ;-)
What about it's being out of place?  Is this case like 'sorted' vs 'sort' 
for lists?
Sorted returns a list because it is the only mutable ordered sequence in
Python, hence the only object that makes sense to return from a sorted
function.
...
...
...
There are enough correct uses of it in the wild to keep the behavior, but 
it can be done in a better way.
I wasn't terribly convinced by your later arguments, so I'm -1.
Yes, I'm not the most influential writer.
I'm not sure I can convince you it's better if you already think it's not. 
  That has more to do with your personal preference.  So lets look at how 
much it's actually needed in the current (and correct) form.
[snip]
Is 12 cases out of about 315,000 python files a big enough need to keep the 
current behavior?   315,000 is the number returned from google code for all 
python files, 'lang:python'. (I'm sure there are some duplicates)
Is this more convincing.   ;-)
Not to me, as I use dict.fromkeys(), and going from a simple expression
to an assignment then mutate is unnecessary cognitive load.  It would
have been more convincing had you offered...

    dict((i, v) for i in keys)

But then again, basically every one of your additions is a one line
expression.  I would also consider the above myself, if it weren't for
the fact that I'm supporting a Python 2.3 codebase.  Please see my
discussion below of *removing* functionality.
...
...
...
I think this reads better and can be used in a wider variety of situations.
It could be useful for setting an existing dictionary to a default state.
# reset status of items.
     status.set_keys(status.keys(), v=0)
This can be done today:
Of course all of the examples I gave can be done today.  But they nearly 
all require iterating in python in some form.
Premature optimization...  Note that you don't know where you are
getting your data, so the overhead of looping and setting data may be
inconsequential to the overall running of the update.  Since you
basically use the "but it iterates in Python rather than C" for the rest
of your arguments, I'm going to stick with my belief that you are
prematurely optimizing.  Until you can show significant use-cases in the
wild, and show that the slowdown of these functions in Python compared
to C is sufficient to render the addition of the functions in your own
personal library useless, I'm going to stick with my -1.
...
...
status.update((i, 0) for i in status.keys())
    #or
    status.update(dict.fromkeys(status, 0))
The first example requires iterating over the keys.  The second example 
works if you want to initialize all the keys.  In which case, there is no 
reason to use the update method.  dict.fromkeys(status, 0) is enough.
I was pointing out how you would duplicate exactly the functionality you
were proposing for dict.set_keys().  It is very difficult for me to
offer you alternate implementations for your own use, or as reasons why
I don't believe they should be added, if you move the target ;).
...
...
...
Or more likely, resetting a partial sub set of the keys to some initial state.
The reason I started looking at this is I wanted to split a dictionary into 
smaller dictionaries and my first thought was that fromkeys would do that. 
    But of course it doesn't.
Changing the bahvior of dict.fromkeys() is not going to happen. We can
remove it, we can add a new method, but changing will lead to not so
subtle breakage as people who were used to the old behavior try to use
the updated method.
Note that this isn't a matter of "it's ok to break in 3.0", because
dict.fromkeys() is not seen as being a design mistake by any of the
'heavy hitters' in python-dev or python-3000 that I have heard (note
that I am certainly not a 'heavy hitter').
Then lets find a different name.
Usually we find substantial use-cases for which this new functionality
would be useful, _then_ we argue about names (usually for months ;). The
only exception to this is in 3rd party modules posted in the cheeseshop,
but then we don't usually hash out the details of it here, as it is a
3rd party module.
...
...
...
What I wanted was to be able to specify the keys and get the values from 
the existing dictionary into the new dictionary without using a for loop to 
iterate over the keys.
d = dict(1='a', 2='b', 3='c', 4='d', 5='e')
d_odds = d.from_keys([1, 3, 5])      # new dict of items 1, 3, 5
    d_evens = d.from_keys([2, 4])        # new dict of items 2, 4
There currently isn't a way to split a dictionary without iterating it's 
contents even if you know the keys you need before hand.
Um...
def from_keys(d, iterator):
        return dict((i, d[i]) for i in iterator)
(iterating)
Yep as I said just above this.
"""There currently isn't a way to split a dictionary without iterating 
it's contents ..."""
You aren't splitting the dictionary.  You are fetching certain values
from the dictionary based on the contents of a provided iterator.  The
*only* thing you gain from the iterator vs. built-in method is a bit of
speed.  But if speed is your only argument, for a group of functions
that I don't remember anyone having ever asked for before, then you
better check your rationale.

In the standard library there exists the deque type in collections.  Why
does Python have a deque?  Because it was discovered over 10+ years of
Python use that pretty much everyone needs a queue, with a large portion
of those needing a double ended queue (put the just fetched item back at
the front).  Because there were so many users, and because it was used
in *many* performance critical applications, it was implemented in C by
Raymond Hettinger and became the first member of the collections module.
A similar thing happened with default dictionaries and it being faked
many times by many different people, implemented and tossed into the
collections module again.

As for iteration over a sequence to generate a new sequence, you need to
do this regardless of whether it is in C or Python. The *only* difference
between the C and Python versions of this is a difference in speed, but
again, use-cases before naming and optimization.  I like to see things
"in the wild".
...
Lists have __getslice__, __setslice__, and __delslice__.  It could be 
argued that those can be handled just as well with iterators and loops as 
well.  Of course we see them as seq[s:s+x], on both lists and strings.  So 
why not have an equivalent for dictionaries.  We can't slice them, but we 
do have key lists to use in the same way.
Your function examples are a bit like adding set manipulation
functionality through functional programming-like functions. Take your
merge operations as an example.  With sets, it's spelled s1 | s2.  It is
a bit round-about, but your from_keys functionality is a bit like s1 -
(s1 - s2), or really set(s2) because sets have no associated values. 
Anyways.
...
...
...
A del_keys method could replace the clear method.  del_keys would be more 
useful as it could operate on a partial set of keys.
d.delkeys(d.keys())    # The current clear method behavior.
I can't remember ever needing something like this that wasn't handled by
d.clear() .
All or nothing.  d = dict() works just as well.
Not when you want to mutate a dictionary.
...
And I'd prefer to define the function in this case for readability reasons.
...
...
splitdict(d, keys):
          """ Split dictionary d using keys. """
          keys_rest = set(d.keys()) - set(keys)
          return d.from_keys(keys), d.from_keys(keys_rest)
I can't think of a simple one-liner for this one that wouldn't duplicate
work.
:-)
This is one of the main motivators.
I've never needed to do this.  And I've never seen source that needed to
do this either.  So whether this is a main motivator for you doesn't
sway me.

[snip your pointing out that iteration happens in Python and not C]
...
...
...
I think the set_keys, from_keys, and del_keys methods could add both 
performance and clarity benefits to python.
Performance, sometimes, for some use-cases.  Clarity?  Maybe.  Your
split* functions are a bit confusing to me, and I've never really needed
any of the functions that you list.
I think sometime our need is determined by what is available for use.  So 
if it's not available, our minds filter it out from the solutions we 
consider.  That way, we don't need the things we don't have or can't get.
My minds "need filter" seems to be broken in that respect. I often need 
things I don't have.  But sometimes that works out to be good.  ;-)
Yeah, I don't buy your 'need filter' reasoning.  Typically people resist
doing things that are difficult or inconvenient to do.  Take decoration
for example.  Before decorator syntax, decoration was a pain in the butt. 
Yeah, you wrote the same number of lines of code, but there was such a
disconnect from the signature of the function/method (and class in 2.6)
that it was just too inconvenient to write, maintain, and understand.

In the case of dictionaries, all but two or three of the things you
would like to offer is available via a very simple dict(generator
expression). If people aren't thinking of ways to use generator
expressions to make their lives easier (this is the case in multiple
threads daily in comp.lang.python), is that Python's fault, or is it the
developer's?

I like to think of Python's syntax and semantics as just rich enough for
people to write what they want and to understand it quickly, but not so
rich that you need to spend time thinking what something means (the Perl
argument). Adding functionality to existing objects needs to do a few
things, not the least of which is solving a problem that happens in the
wild, but also that it doesn't overly burdon those who implement similar
functionality. Remember, dictionaries are *the* canonical mapping
interface, and anyone who implements a complete mapping interface
necessarily would need to implement the 3 methods you propose. For what?
To clean up the interface?

I'm sorry, but to add 3 methods, even with the assmuption that two
previous methods were going to be removed, in order to "clean up" the
interface doesn't convince me.  Please find me real-world use-cases
where your new methods would improve readability.

...

Also, I develop software for fun and profit.  Since basically everyone
else here probably does some selection of the same, I'm sure that they
will tell you pretty much the same thing: if we restricted our needs to
what we already have, software wouldn't get written, or would only be
proposed by marketing.
...
...
...
So to summarize...
1.  Replace existing fromkeys method with a set_keys method.
     2.  Add a partial copy items from_keys method.
     3.  Replace the clear method with a del_keys method.
Not all X line functions should be builtins.
Of course I knew someone would point this out.
I'm usually the one to invoke it.  Maybe I have less tolerance to
arguably trivial additions to Python than others.
...
I'm not requesting the 
above example functions be builtins.  Only the changes to the dict methods 
be considered.    They would allow those above functions to work in a more 
efficient way and I'd be happy to add those functions to my own library.
With these methods in most cases the functions wouldn't even be needed. 
You would just use the methods in combinations with each other directly and 
the result would still be readable without a lot of 'code' overhead.
My single expression replacements were to show that the functions aren't
needed now, as most are *easily* implemented in Python 2.5 in a
straightforward manner.
...
Also consider this from a larger view.  List has __getslice__, 
__setslice__, and __delslice__.  Set has numerous methods that operate on 
more than one element.
Lists are ordered sequences, dictionaries are not.  Sets are not
mappings, they are sets (which is why they have set operations). 
Dictionaries are a mapping from keys to values, used as both an
arbitrary data store as well as data and method member lookups on
objects. The most common use-cases of dictionaries *don't* call for any
of the additional functionality that you have offered.  If they did,
then it would have already been added.
...
Dictionaries are suppose to be highly efficient, but they only have limited 
methods that can operate on more than one item at a time,  so you end up 
iterating over the keys to do nearly everything.
Iteration is a fundamental building block in Python.  That's why for
loops, iterators, generators, generator expressions, list comprehensions,
etc., all use iteration over an iterator to do their work.  Building
more functionality into dictionaries won't make them easier to use, it
will merely add more methods that you think will help.  Is there anyone
else who likes this idea?  Please speak up.
...
So as an alternative, leave fromkeys and clear alone and add...
getkeys(keys)  ->  dict
     setkeys(keys, v=None)
     delkeys(keys)
Where these offer the equivalent of list slice functionality to dictionaries.
getkeys/setkeys/delkeys seem to me like they should be named
getitems/setitems/delitems, because they are getting/setting/deleting the
entire key->value association, not merely the keys.
...
...
If you find that you are
doing the above more often than you think you should, create a module
with all of the related functionality that automatically patches the
builtins on import and place it in the Python cheeseshop.  If people
find that the functionality helps them, then we should consider it for
inclusion.  As it stands, most of the methods you offer have a very
simple one-line version that is already very efficient.
Iterators and for loops are fairly efficient for small dictionaries, but 
iterating can still be considerable slower than the equivalent C code if 
they are large dictionaries.
Lets find out.

    >>> d = dict.fromkeys(xrange(10000000))
    >>> import time
    >>> if 1:
    ...     t = time.time()
    ...     e = dict(d)
    ...     print time.time()-t
    ...
    1.21899986267
    >>> del e
    >>> if 1:
    ...     t = time.time()
    ...     e = dict(d.iteritems())
    ...     print time.time()-t
    ...
    2.75
    >>> del e
    >>> if 1:
    ...     t = time.time()
    ...     e = dict((i,j) for i,j in d.iteritems())
    ...     print time.time()-t
    ...
    6.95399999619
    >>> del e
    >>> if 1:
    ...     t = time.time()
    ...     e = dict((i, d[i]) for i in d)
    ...     print time.time()-t
    ...
    7.54699993134
    >>>

Those all seem to be pretty reasonable timings to me.  In the best case
you are talking about 6.2 times faster to use the C rather than Python
version.
...
...
...
So this replaces two methods and adds one more.  Overall I think the 
usefulness of these would be very good.
I don't find the current dictionary API to be lacking in any way other
than "what do I really need to override to get functionality X", but
that is a documentation issue more than anything.
...
...
I also think it will work very well with the python 3000 keys method 
returning an iterator.  (And still be two fewer methods than we currently 
have.)
I'm sorry, but I can't really see how your changes would add to Python's
flexibility without cluttering up interfaces and confusing current users.
I think it cleans up the API more than it clutters it up.  It coverts two 
limited use methods to be more general, and adds one more that works with 
the already existing update method nicely.
But you propose a further half dozen functions.  If you aren't proposing
them for inclusion, why bother including them in your proposal,
especially when they have very simple replacements that are, arguably,
easier to understand than the function bodies you provided.
...
In both cases of the two existing methods, fromkeys and clear, your 
arguments, that there all ready exists easy one line functions to do this, 
would be enough of a reason to not have them in the first place.  So do you 
feel they should be removed?
We don't remove functionality in Python unless there is a good reason. 
Typically that reason is because the functionality is broken, the old
functionality is not considered "Pythonic", or generally because a group 
of people believe there is a better way. Guido is more or less happy
with dictionaries as-is (except for the keys(), values(), and items()
methods, which are changing), and no one in python-dev has complained
about dictionary functionalty that I can remember. As such, even if you
think that your changes would clean up dictionary methods, it is
unlikely to happen precisely because *others* aren't mentioning,
"dictionaries need to be cleaned up".
...
I plan on doing a search of places where these things can make a difference 
in making the code more readable and/or faster.
I don't care about faster.  Show me code that is easier to understand.

I will mention that all of your functionality smells very much like a
functional programming approach to Python.  This makes a difference
because some functional programming tools (reduce, map, filter, ...) are
slated for removal in Python 3.0, so adding functional programming tools
(when we are removing others), is unlikely to gain much traction.

 - Josiah