
Ron Adam <rrr@ronadam.com> wrote:
Josiah Carlson wrote:
Ron Adam <rrr@ronadam.com> wrote:
The dictionary fromkeys method seems out of place as well as miss-named. IMHO
It is perfectly named (IMNSHO ;), create a dictionary from the keys provided; dict.fromkeys() .
That's ok, I won't hold it against you. ;-)
What about it's being out of place? Is this case like 'sorted' vs 'sort' for lists?
Sorted returns a list because it is the only mutable ordered sequence in Python, hence the only object that makes sense to return from a sorted function.
There are enough correct uses of it in the wild to keep the behavior, but it can be done in a better way.
I wasn't terribly convinced by your later arguments, so I'm -1.
Yes, I'm not the most influential writer.
I'm not sure I can convince you it's better if you already think it's not. That has more to do with your personal preference. So lets look at how much it's actually needed in the current (and correct) form. [snip] Is 12 cases out of about 315,000 python files a big enough need to keep the current behavior? 315,000 is the number returned from google code for all python files, 'lang:python'. (I'm sure there are some duplicates)
Is this more convincing. ;-)
Not to me, as I use dict.fromkeys(), and going from a simple expression to an assignment then mutate is unnecessary cognitive load. It would have been more convincing had you offered... dict((i, v) for i in keys) But then again, basically every one of your additions is a one line expression. I would also consider the above myself, if it weren't for the fact that I'm supporting a Python 2.3 codebase. Please see my discussion below of *removing* functionality.
I think this reads better and can be used in a wider variety of situations.
It could be useful for setting an existing dictionary to a default state.
# reset status of items. status.set_keys(status.keys(), v=0)
This can be done today:
Of course all of the examples I gave can be done today. But they nearly all require iterating in python in some form.
Premature optimization... Note that you don't know where you are getting your data, so the overhead of looping and setting data may be inconsequential to the overall running of the update. Since you basically use the "but it iterates in Python rather than C" for the rest of your arguments, I'm going to stick with my belief that you are prematurely optimizing. Until you can show significant use-cases in the wild, and show that the slowdown of these functions in Python compared to C is sufficient to render the addition of the functions in your own personal library useless, I'm going to stick with my -1.
status.update((i, 0) for i in status.keys()) #or status.update(dict.fromkeys(status, 0))
The first example requires iterating over the keys. The second example works if you want to initialize all the keys. In which case, there is no reason to use the update method. dict.fromkeys(status, 0) is enough.
I was pointing out how you would duplicate exactly the functionality you were proposing for dict.set_keys(). It is very difficult for me to offer you alternate implementations for your own use, or as reasons why I don't believe they should be added, if you move the target ;).
Or more likely, resetting a partial sub set of the keys to some initial state.
The reason I started looking at this is I wanted to split a dictionary into smaller dictionaries and my first thought was that fromkeys would do that. But of course it doesn't.
Changing the bahvior of dict.fromkeys() is not going to happen. We can remove it, we can add a new method, but changing will lead to not so subtle breakage as people who were used to the old behavior try to use the updated method.
Note that this isn't a matter of "it's ok to break in 3.0", because dict.fromkeys() is not seen as being a design mistake by any of the 'heavy hitters' in python-dev or python-3000 that I have heard (note that I am certainly not a 'heavy hitter').
Then lets find a different name.
Usually we find substantial use-cases for which this new functionality would be useful, _then_ we argue about names (usually for months ;). The only exception to this is in 3rd party modules posted in the cheeseshop, but then we don't usually hash out the details of it here, as it is a 3rd party module.
What I wanted was to be able to specify the keys and get the values from the existing dictionary into the new dictionary without using a for loop to iterate over the keys.
d = dict(1='a', 2='b', 3='c', 4='d', 5='e')
d_odds = d.from_keys([1, 3, 5]) # new dict of items 1, 3, 5 d_evens = d.from_keys([2, 4]) # new dict of items 2, 4
There currently isn't a way to split a dictionary without iterating it's contents even if you know the keys you need before hand.
Um...
def from_keys(d, iterator): return dict((i, d[i]) for i in iterator)
(iterating)
Yep as I said just above this.
"""There currently isn't a way to split a dictionary without iterating it's contents ..."""
You aren't splitting the dictionary. You are fetching certain values from the dictionary based on the contents of a provided iterator. The *only* thing you gain from the iterator vs. built-in method is a bit of speed. But if speed is your only argument, for a group of functions that I don't remember anyone having ever asked for before, then you better check your rationale. In the standard library there exists the deque type in collections. Why does Python have a deque? Because it was discovered over 10+ years of Python use that pretty much everyone needs a queue, with a large portion of those needing a double ended queue (put the just fetched item back at the front). Because there were so many users, and because it was used in *many* performance critical applications, it was implemented in C by Raymond Hettinger and became the first member of the collections module. A similar thing happened with default dictionaries and it being faked many times by many different people, implemented and tossed into the collections module again. As for iteration over a sequence to generate a new sequence, you need to do this regardless of whether it is in C or Python. The *only* difference between the C and Python versions of this is a difference in speed, but again, use-cases before naming and optimization. I like to see things "in the wild".
Lists have __getslice__, __setslice__, and __delslice__. It could be argued that those can be handled just as well with iterators and loops as well. Of course we see them as seq[s:s+x], on both lists and strings. So why not have an equivalent for dictionaries. We can't slice them, but we do have key lists to use in the same way.
Your function examples are a bit like adding set manipulation functionality through functional programming-like functions. Take your merge operations as an example. With sets, it's spelled s1 | s2. It is a bit round-about, but your from_keys functionality is a bit like s1 - (s1 - s2), or really set(s2) because sets have no associated values. Anyways.
A del_keys method could replace the clear method. del_keys would be more useful as it could operate on a partial set of keys.
d.delkeys(d.keys()) # The current clear method behavior.
I can't remember ever needing something like this that wasn't handled by d.clear() .
All or nothing. d = dict() works just as well.
Not when you want to mutate a dictionary.
And I'd prefer to define the function in this case for readability reasons.
splitdict(d, keys): """ Split dictionary d using keys. """ keys_rest = set(d.keys()) - set(keys) return d.from_keys(keys), d.from_keys(keys_rest)
I can't think of a simple one-liner for this one that wouldn't duplicate work.
:-)
This is one of the main motivators.
I've never needed to do this. And I've never seen source that needed to do this either. So whether this is a main motivator for you doesn't sway me. [snip your pointing out that iteration happens in Python and not C]
I think the set_keys, from_keys, and del_keys methods could add both performance and clarity benefits to python.
Performance, sometimes, for some use-cases. Clarity? Maybe. Your split* functions are a bit confusing to me, and I've never really needed any of the functions that you list.
I think sometime our need is determined by what is available for use. So if it's not available, our minds filter it out from the solutions we consider. That way, we don't need the things we don't have or can't get.
My minds "need filter" seems to be broken in that respect. I often need things I don't have. But sometimes that works out to be good. ;-)
Yeah, I don't buy your 'need filter' reasoning. Typically people resist doing things that are difficult or inconvenient to do. Take decoration for example. Before decorator syntax, decoration was a pain in the butt. Yeah, you wrote the same number of lines of code, but there was such a disconnect from the signature of the function/method (and class in 2.6) that it was just too inconvenient to write, maintain, and understand. In the case of dictionaries, all but two or three of the things you would like to offer is available via a very simple dict(generator expression). If people aren't thinking of ways to use generator expressions to make their lives easier (this is the case in multiple threads daily in comp.lang.python), is that Python's fault, or is it the developer's? I like to think of Python's syntax and semantics as just rich enough for people to write what they want and to understand it quickly, but not so rich that you need to spend time thinking what something means (the Perl argument). Adding functionality to existing objects needs to do a few things, not the least of which is solving a problem that happens in the wild, but also that it doesn't overly burdon those who implement similar functionality. Remember, dictionaries are *the* canonical mapping interface, and anyone who implements a complete mapping interface necessarily would need to implement the 3 methods you propose. For what? To clean up the interface? I'm sorry, but to add 3 methods, even with the assmuption that two previous methods were going to be removed, in order to "clean up" the interface doesn't convince me. Please find me real-world use-cases where your new methods would improve readability. ... Also, I develop software for fun and profit. Since basically everyone else here probably does some selection of the same, I'm sure that they will tell you pretty much the same thing: if we restricted our needs to what we already have, software wouldn't get written, or would only be proposed by marketing.
So to summarize...
1. Replace existing fromkeys method with a set_keys method. 2. Add a partial copy items from_keys method. 3. Replace the clear method with a del_keys method.
Not all X line functions should be builtins.
Of course I knew someone would point this out.
I'm usually the one to invoke it. Maybe I have less tolerance to arguably trivial additions to Python than others.
I'm not requesting the above example functions be builtins. Only the changes to the dict methods be considered. They would allow those above functions to work in a more efficient way and I'd be happy to add those functions to my own library.
With these methods in most cases the functions wouldn't even be needed. You would just use the methods in combinations with each other directly and the result would still be readable without a lot of 'code' overhead.
My single expression replacements were to show that the functions aren't needed now, as most are *easily* implemented in Python 2.5 in a straightforward manner.
Also consider this from a larger view. List has __getslice__, __setslice__, and __delslice__. Set has numerous methods that operate on more than one element.
Lists are ordered sequences, dictionaries are not. Sets are not mappings, they are sets (which is why they have set operations). Dictionaries are a mapping from keys to values, used as both an arbitrary data store as well as data and method member lookups on objects. The most common use-cases of dictionaries *don't* call for any of the additional functionality that you have offered. If they did, then it would have already been added.
Dictionaries are suppose to be highly efficient, but they only have limited methods that can operate on more than one item at a time, so you end up iterating over the keys to do nearly everything.
Iteration is a fundamental building block in Python. That's why for loops, iterators, generators, generator expressions, list comprehensions, etc., all use iteration over an iterator to do their work. Building more functionality into dictionaries won't make them easier to use, it will merely add more methods that you think will help. Is there anyone else who likes this idea? Please speak up.
So as an alternative, leave fromkeys and clear alone and add...
getkeys(keys) -> dict setkeys(keys, v=None) delkeys(keys)
Where these offer the equivalent of list slice functionality to dictionaries.
getkeys/setkeys/delkeys seem to me like they should be named getitems/setitems/delitems, because they are getting/setting/deleting the entire key->value association, not merely the keys.
If you find that you are doing the above more often than you think you should, create a module with all of the related functionality that automatically patches the builtins on import and place it in the Python cheeseshop. If people find that the functionality helps them, then we should consider it for inclusion. As it stands, most of the methods you offer have a very simple one-line version that is already very efficient.
Iterators and for loops are fairly efficient for small dictionaries, but iterating can still be considerable slower than the equivalent C code if they are large dictionaries.
Lets find out. >>> d = dict.fromkeys(xrange(10000000)) >>> import time >>> if 1: ... t = time.time() ... e = dict(d) ... print time.time()-t ... 1.21899986267 >>> del e >>> if 1: ... t = time.time() ... e = dict(d.iteritems()) ... print time.time()-t ... 2.75 >>> del e >>> if 1: ... t = time.time() ... e = dict((i,j) for i,j in d.iteritems()) ... print time.time()-t ... 6.95399999619 >>> del e >>> if 1: ... t = time.time() ... e = dict((i, d[i]) for i in d) ... print time.time()-t ... 7.54699993134 >>> Those all seem to be pretty reasonable timings to me. In the best case you are talking about 6.2 times faster to use the C rather than Python version.
So this replaces two methods and adds one more. Overall I think the usefulness of these would be very good.
I don't find the current dictionary API to be lacking in any way other than "what do I really need to override to get functionality X", but that is a documentation issue more than anything.
I also think it will work very well with the python 3000 keys method returning an iterator. (And still be two fewer methods than we currently have.)
I'm sorry, but I can't really see how your changes would add to Python's flexibility without cluttering up interfaces and confusing current users.
I think it cleans up the API more than it clutters it up. It coverts two limited use methods to be more general, and adds one more that works with the already existing update method nicely.
But you propose a further half dozen functions. If you aren't proposing them for inclusion, why bother including them in your proposal, especially when they have very simple replacements that are, arguably, easier to understand than the function bodies you provided.
In both cases of the two existing methods, fromkeys and clear, your arguments, that there all ready exists easy one line functions to do this, would be enough of a reason to not have them in the first place. So do you feel they should be removed?
We don't remove functionality in Python unless there is a good reason. Typically that reason is because the functionality is broken, the old functionality is not considered "Pythonic", or generally because a group of people believe there is a better way. Guido is more or less happy with dictionaries as-is (except for the keys(), values(), and items() methods, which are changing), and no one in python-dev has complained about dictionary functionalty that I can remember. As such, even if you think that your changes would clean up dictionary methods, it is unlikely to happen precisely because *others* aren't mentioning, "dictionaries need to be cleaned up".
I plan on doing a search of places where these things can make a difference in making the code more readable and/or faster.
I don't care about faster. Show me code that is easier to understand. I will mention that all of your functionality smells very much like a functional programming approach to Python. This makes a difference because some functional programming tools (reduce, map, filter, ...) are slated for removal in Python 3.0, so adding functional programming tools (when we are removing others), is unlikely to gain much traction. - Josiah