
Josiah Carlson wrote:
Ron Adam <rrr@ronadam.com> wrote:
The dictionary fromkeys method seems out of place as well as miss-named. IMHO
It is perfectly named (IMNSHO ;), create a dictionary from the keys provided; dict.fromkeys() .
That's ok, I won't hold it against you. ;-) What about it's being out of place? Is this case like 'sorted' vs 'sort' for lists? I'm ok with leaving it names as is if that's a real problem. Another name for the mutate with keys method can be found. That may reduce possible confusion as well.
There are enough correct uses of it in the wild to keep the behavior, but it can be done in a better way.
I wasn't terribly convinced by your later arguments, so I'm -1.
Yes, I'm not the most influential writer. I'm not sure I can convince you it's better if you already think it's not. That has more to do with your personal preference. So lets look at how much it's actually needed in the current (and correct) form. (These are rough estimates, I can try to come up with more accurate statistics if that is desired.) Doing a search on google code turns up 300 hits for "lang:python \.fromkeys\(". Looking at a sample of those, it looks like about 80% use it as a set() constructor to remove duplicates. (For compatibility reason with python 2.3 code, or for pytohn 2.3 and earlier code.) Is there a way to narrow this down to python 2.4 and later? (anyone?) A bit more sampling, it looks like about 8 of 10 of those remaining 20% can be easily converted to the following form without any trouble. d = dict() d.set_keys(keys, v=value) That would leave about 12 cases (YMV) that need the inline functionality. For those a simple function can do it. def dict_from_keys(keys, v=value): d = dict() d.set_keys(keys, v) return d Is 12 cases out of about 315,000 python files a big enough need to keep the current behavior? 315,000 is the number returned from google code for all python files, 'lang:python'. (I'm sure there are some duplicates) Is this more convincing. ;-) (If anyone can come up with better numbers, that would be cool.)
I think this reads better and can be used in a wider variety of situations.
It could be useful for setting an existing dictionary to a default state.
# reset status of items. status.set_keys(status.keys(), v=0)
This can be done today:
Of course all of the examples I gave can be done today. But they nearly all require iterating in python in some form.
status.update((i, 0) for i in status.keys()) #or status.update(dict.fromkeys(status, 0))
The first example requires iterating over the keys. The second example works if you want to initialize all the keys. In which case, there is no reason to use the update method. dict.fromkeys(status, 0) is enough.
Or more likely, resetting a partial sub set of the keys to some initial state.
The reason I started looking at this is I wanted to split a dictionary into smaller dictionaries and my first thought was that fromkeys would do that. But of course it doesn't.
Changing the bahvior of dict.fromkeys() is not going to happen. We can remove it, we can add a new method, but changing will lead to not so subtle breakage as people who were used to the old behavior try to use the updated method.
Note that this isn't a matter of "it's ok to break in 3.0", because dict.fromkeys() is not seen as being a design mistake by any of the 'heavy hitters' in python-dev or python-3000 that I have heard (note that I am certainly not a 'heavy hitter').
Then lets find a different name.
What I wanted was to be able to specify the keys and get the values from the existing dictionary into the new dictionary without using a for loop to iterate over the keys.
d = dict(1='a', 2='b', 3='c', 4='d', 5='e')
d_odds = d.from_keys([1, 3, 5]) # new dict of items 1, 3, 5 d_evens = d.from_keys([2, 4]) # new dict of items 2, 4
There currently isn't a way to split a dictionary without iterating it's contents even if you know the keys you need before hand.
Um...
def from_keys(d, iterator): return dict((i, d[i]) for i in iterator)
(iterating) Yep as I said just above this. """There currently isn't a way to split a dictionary without iterating it's contents ...""" Lists have __getslice__, __setslice__, and __delslice__. It could be argued that those can be handled just as well with iterators and loops as well. Of course we see them as seq[s:s+x], on both lists and strings. So why not have an equivalent for dictionaries. We can't slice them, but we do have key lists to use in the same way.
A del_keys method could replace the clear method. del_keys would be more useful as it could operate on a partial set of keys.
d.delkeys(d.keys()) # The current clear method behavior.
I can't remember ever needing something like this that wasn't handled by d.clear() .
All or nothing. d = dict() works just as well. BTW, google code give 500 hits for "\.clear\(". But it very un-clear how many of those are false positives due to other objects having a clear method. It's probably a significant percentage in this case.
Some potentially *very common* uses:
# This first one works now, but I included it for completeness. ;-)
mergedicts(d1, d2): """ Combine two dictionaries. """ dd = dict(d1) return dd.update(d2)
dict((i, d2.get(i, d1.get(i))) for i in itertools.chain(d1,d2))
(iterating) And I'd prefer to define the function in this case for readability reasons.
splitdict(d, keys): """ Split dictionary d using keys. """ keys_rest = set(d.keys()) - set(keys) return d.from_keys(keys), d.from_keys(keys_rest)
I can't think of a simple one-liner for this one that wouldn't duplicate work.
:-) This is one of the main motivators.
split_from_dict(d, keys): """ Removes and returns a subdict of d with keys. """ dd = d.from_keys(keys) d.del_keys(keys) return dd
dict((i, d.pop(i, None)) for i in keys)
(iterating)
copy_items(d1, d2, keys): """ Copy items from dictionary d1 to d2. """ d2.update(d1.from_keys(keys)) # I really like this!
d2.update((i, d1[i]) for i in keys)
(iterating)
move_items(d1, d2, keys): """ Move items from dictionary d1 to d2. """ d2.update(d1.from_keys(keys)) d1.del_keys(keys)
d2.update((i, d1.pop(i, None)) for i in keys)
(iterating)
I think the set_keys, from_keys, and del_keys methods could add both performance and clarity benefits to python.
Performance, sometimes, for some use-cases. Clarity? Maybe. Your split* functions are a bit confusing to me, and I've never really needed any of the functions that you list.
I think sometime our need is determined by what is available for use. So if it's not available, our minds filter it out from the solutions we consider. That way, we don't need the things we don't have or can't get. My minds "need filter" seems to be broken in that respect. I often need things I don't have. But sometimes that works out to be good. ;-)
So to summarize...
1. Replace existing fromkeys method with a set_keys method. 2. Add a partial copy items from_keys method. 3. Replace the clear method with a del_keys method.
Not all X line functions should be builtins.
Of course I knew someone would point this out. I'm not requesting the above example functions be builtins. Only the changes to the dict methods be considered. They would allow those above functions to work in a more efficient way and I'd be happy to add those functions to my own library. With these methods in most cases the functions wouldn't even be needed. You would just use the methods in combinations with each other directly and the result would still be readable without a lot of 'code' overhead. Also consider this from a larger view. List has __getslice__, __setslice__, and __delslice__. Set has numerous methods that operate on more than one element. Dictionaries are suppose to be highly efficient, but they only have limited methods that can operate on more than one item at a time, so you end up iterating over the keys to do nearly everything. So as an alternative, leave fromkeys and clear alone and add... getkeys(keys) -> dict setkeys(keys, v=None) delkeys(keys) Where these offer the equivalent of list slice functionality to dictionaries. If you find that you are
doing the above more often than you think you should, create a module with all of the related functionality that automatically patches the builtins on import and place it in the Python cheeseshop. If people find that the functionality helps them, then we should consider it for inclusion. As it stands, most of the methods you offer have a very simple one-line version that is already very efficient.
Iterators and for loops are fairly efficient for small dictionaries, but iterating can still be considerable slower than the equivalent C code if they are large dictionaries.
So this replaces two methods and adds one more. Overall I think the usefulness of these would be very good.
I don't find the current dictionary API to be lacking in any way other than "what do I really need to override to get functionality X", but that is a documentation issue more than anything.
I also think it will work very well with the python 3000 keys method returning an iterator. (And still be two fewer methods than we currently have.)
I'm sorry, but I can't really see how your changes would add to Python's flexibility without cluttering up interfaces and confusing current users.
I think it cleans up the API more than it clutters it up. It coverts two limited use methods to be more general, and adds one more that works with the already existing update method nicely. In both cases of the two existing methods, fromkeys and clear, your arguments, that there all ready exists easy one line functions to do this, would be enough of a reason to not have them in the first place. So do you feel they should be removed? I plan on doing a search of places where these things can make a difference in making the code more readable and/or faster. Cheers, Ron