[Python-ideas] Dictionary group key/value modificaton methods.

Ron Adam rrr at ronadam.com
Tue May 29 22:21:05 CEST 2007


Ok,  I'm going to try to summarize this a bit so we don't go around in 
circles on details that are adjacent to the issue I'm trying to address.


+ Adding methods to "copyitems", "seteach", and "delitems"; to do partial 
group operations on dictionaries in C rather than iterating in python can 
possibly have as much as a %500 percent performance increase over iterating 
in python to do the same thing.

- It needs to be shown that these situations occur often enough to result 
in a meaningful benefit.

(It doesn't replace the need to iterate dictionaries as there are many 
cases where that's exactly what you need.)

+ The methods add some improvements to readability over the iterator form.

- There is not a significant reduction in lines of code, so again it needs 
to be shown that this would be useful often enough to be a significant benefit.


Providing there are enough use cases to demonstrate a significant benefit, 
we will then need to address the following issues.

    + What to call them.
    + The details of the implementation.


Most of the arguments against fit into the following categories...

    - Changes the status quo
    - It's premature optimization
    - Adds additional complexity to dictionaries
    - Personal preference

These are subjective but still important issues, and these will need to be 
addressed after it is demonstrated there is sufficient use cases for these 
features, if each of these is relevant and to what degree.


Some examples:

     # Combine two dictionaries.  (works already)
     dd = dict(d1)
     dd.update(d2)

     # Split dictionary d using a key list.
     keys_rest = set(d.keys()) - set(keys)
     d1, d2 = d.getitems(keys), d.getitems(keys_rest)

     # Remove a subdict of d with keys.
     dd = d.getitems(keys)
     d.delitems(keys)

     # Copy items from dictionary d1 to d2.
     #
     # The getitems method returns a dictionary so it will
     # work directly with the update method.
     #
     d2.update(d1.getitems(keys))

     # Move items from dictionary d1 to d2.
     d2.update(d1.getitems(keys))
     d1.del_keys(keys)

     # Setting items to a specified value with a list of keys.
     d.seteach(keys, None)


Use cases:

    ### TODO


>> Josiah Carlson wrote:
>>> Ron Adam <rrr at ronadam.com> wrote:

>> Is 12 cases out of about 315,000 python files a big enough need to keep the 
>> current behavior?   315,000 is the number returned from google code for all 
>> python files, 'lang:python'. (I'm sure there are some duplicates)
>>
>> Is this more convincing.   ;-)
> 
> Not to me, as I use dict.fromkeys(), and going from a simple expression
> to an assignment then mutate is unnecessary cognitive load.  It would
> have been more convincing had you offered...
> 
>     dict((i, v) for i in keys)

Well, there you go. :-)


> But then again, basically every one of your additions is a one line
> expression.  I would also consider the above myself, if it weren't for
> the fact that I'm supporting a Python 2.3 codebase.  Please see my
> discussion below of *removing* functionality.

This is probably something that is better suited for python 3000.  But it's 
possible it could be back ported to 2.6.  It would have no effect on python 
2.5 and earlier.  And probably minimal effect on 2.x in regards to 2.3 
compatibility.

I don't see .fromkeys() being removed in 2.x.


 > Until you can show significant use-cases in the
> wild, and show that the slowdown of these functions in Python compared
> to C is sufficient to render the addition of the functions in your own
> personal library useless, I'm going to stick with my -1.

Your own tests show a maximum speedup of 620%.  My testing shows it is 300% 
to 500% over a range of sizes.  I would still call that sufficient.

And before you point it out... yes, only if it can be shown to be useful in 
a wide range of situations. I fully intend to find use cases.  If I can't 
find any, then none of this will matter.


> I was pointing out how you would duplicate exactly the functionality you
> were proposing for dict.set_keys().  It is very difficult for me to
> offer you alternate implementations for your own use, or as reasons why
> I don't believe they should be added, if you move the target ;).

But programming is full of moving targets.  ;-)

In any case, look at the overall picture and try not to prematurely shoot 
this down based on implementation details that can be changed as needed.

And I'll attempt to do a use case study from the python library.


 > Until you can show significant use-cases

> Usually we find substantial use-cases ....

 > ... that I don't remember anyone having ever asked for before

 > ... but again, use-cases ...

> I've never needed to do this.

 > Please find me real-world use-cases ...

 > Show me code that is easier to understand.


Ok, I get it.  :-)


>> Also consider this from a larger view.  List has __getslice__, 
>> __setslice__, and __delslice__.  Set has numerous methods that operate on 
>> more than one element.
> 
> Lists are ordered sequences, dictionaries are not.  Sets are not
> mappings, they are sets (which is why they have set operations). 
> Dictionaries are a mapping from keys to values, used as both an
> arbitrary data store as well as data and method member lookups on
> objects. The most common use-cases of dictionaries *don't* call for any
> of the additional functionality that you have offered.

 > If they did, then it would have already been added.

This statement isn't true.  It only shows the resistance to these changes 
is greater than the efforts of those who have tried to introduce those 
changes.  (not without good cause)

To be clear, I in no way want the bar dropped to a lower level as to what 
is added to python or not added.  I accept that sufficient benefit needs to 
be demonstrated, and will try to do that.

Quality is more important than quantity in this case.


>> Dictionaries are suppose to be highly efficient, but they only have limited 
>> methods that can operate on more than one item at a time,  so you end up 
>> iterating over the keys to do nearly everything.
> 
> Iteration is a fundamental building block in Python.  That's why for
> loops, iterators, generators, generator expressions, list comprehensions,
> etc., all use iteration over an iterator to do their work.  Building
> more functionality into dictionaries won't make them easier to use, it
> will merely add more methods that you think will help.  Is there anyone
> else who likes this idea?  Please speak up.

Lets rephrase this to be less subjective...

Does anyone think having a approximately 500% improvement in some 
dictionary operations would be good if it can be done in a way that is both 
easier to read, use, and has enough use cases to be worth while?


> getkeys/setkeys/delkeys seem to me like they should be named
> getitems/setitems/delitems, because they are getting/setting/deleting the
> entire key->value association, not merely the keys.

Sounds good, how about...

     getitems, delitems, and seteach ?


The update method corresponds to setitems, where setitems is the inverse 
operations to getitems.  I don't see any reason to change update.

     d1.update(d2.getitems(keys))

So seteach, is a better name for a method that sets each key to a value.



Cheers,
    Ron






More information about the Python-ideas mailing list