[Python-Dev] Proposal: dict.with_values(iterable)
Steve Dower
steve.dower at python.org
Tue Apr 23 01:54:39 EDT 2019
On 22Apr2019 2143, Inada Naoki wrote:
> On Tue, Apr 23, 2019 at 11:30 AM Steve Dower <steve.dower at python.org> wrote:
>>
>> Or possibly just "dict(existing_dict).update(new_items)".
>>
>
> Do you mean .update accepts values tuple?
> I can't think it's
Not sure what you were going to go on to say here, but why not?
If it's a key-sharing dict, then all the keys are strings. We know that
when we go to do the update, so we can intern all the strings (going to
do that anyway) and then it's a quick check if it already exists. If
it's a regular dict, then we calculate hashes as normal. Updating the
value is just a decref, incref and assignment.
If not all these conditions are met, we convert to a regular dict. The
proposed function was going to raise an error in this case, so all we've
done is make it transparent. The biggest downside is now you don't get a
warning that your preferred optimization isn't actually working when you
pass in new_items with different keys from what were in existing_dict.
Note that it .update() would probably require a dict or key/value tuples
here - but if you have the keys in a tuple already then zip() is going
to be good enough for setting it (in fact, zip(existing_dict,
new_values) should be fine, and we can internally special-case that
scenario, too). I'd assumed the benefit was in memory usage after
construction, rather than speed-to-construct, since everyone keeps
talking about "key-sharing dictionaries" and not "arrays" ;)
(Randomizing side note: is this scenario enough to make a case for a
built-in data frame type?)
>> My primary concern is still to avoid making CPython performance
>> characteristics part of the Python language definition. That only makes
>> it harder for alternate implementations.
>
> Note that this proposal is not only for key sharing dict:
>
> * We can avoid rebuilding hash table again and again.
> * We can avoid checking duplicated keys again and again.
>
> These characteristics are not only for Python, but for all mapping
> implementations using hash table.
I believe all of these are met by making d2=dict(d1) construct a dict d2
that shares keys with d1 by default. Can you show how they are not?
* when you only d2.update existing keys, no need to rebuild the table
* a duplicated key overwrites multiple times - what else are you going
to do? This is already easiest, fastest, uses the least memory and is
most consistent with every other form of setting dict items. Why
complicate things by checking them? Let the caller do it
Cheers,
Steve
More information about the Python-Dev
mailing list