[Python-Dev] RFC: PEP 509: Add a private version to dict

Jim J. Jewett jimjjewett at gmail.com
Fri Apr 15 17:45:32 EDT 2016


On Fri, Apr 15, 2016 at 4:41 PM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> 2016-04-15 19:54 GMT+02:00 Jim J. Jewett <jimjjewett at gmail.com>:

>> (2)  Why *promise* not to update the version_tag when replacing a
>> value with itself?

> It's an useful property. For example, let's say that you have a guard
> on globals()['value']. The guard is created with value=3. An unit test
> replaces the value with 50, but then restore the value to its previous
> value (3). Later, the guard is checked to decide if an optimization
> can be used.

> If the dict version is increased, you need a lookup. If the dict
> version is not increased, the guard is cheap.

I would expect the version to be increased twice, and therefore to
require a lookup.  Are you suggesting that unittest should provide an
example of resetting the version back to the original value when it
cleans up after itself?

> In C, it's very cheap to implement the test "new_value == old_value",
> it just compares two pointers.

Yeah, I understand that it is likely a win in terms of performance,
and a good way to start off (given that you're willing to do the
work).

I just worry that you may end up closing off even better optimizations
later, if you make too many promises about exactly how you will do
which ones.

Today, dict only cares about ==, and you (reasonably) think that full
== isn't always worth running ... but when it comes to which tests
*are* worth running, I'm not confident that the answers won't change
over the years.

>> [2A] Do you want to promise that replacing a value with a
>> non-identical object *will* trigger a version_tag update *even*
>> if the objects are equal?

> It's already written in the PEP:

I read that as a description of what the code does, rather than a spec
for what it should do... so it isn't clear whether I could count on
that remaining true.

For example, if I know that my dict values are all 4-digit integers,
can I write:

    d[k]  = d[k] + 0

and be assured that the version_tag will bump?  Or is that something
that a future optimizer might optimize out?

>> (3)  It is worth being explicit on whether empty dicts can share
>> a version_tag of 0.  If this PEP is about dict content, then that
>> seems fine, and it may well be worth optimizing dict creation.

> This is not part of the PEP yet. I'm not sure that I will modify the
> PEP to use the version 0 for empty dictionaries. Antoine doesn't seem
> to be convinced :-)

True.  But do note that "not hitting the global counter an extra time
for every dict creation" is a more compelling reason than "we could
speed up dict.clear(), sometimes".


>> (4)  Please be explicit about the locking around version++; it
>> is enough to say that the relevant methods already need to hold
>> the GIL (assuming that is true).

> I don't think that it's important to mention it in the PEP. It's more
> an implementation detail. The version can be protected by atomic
> operations.

Now I'm the one arguing from a specific implementation.  :D

My thought was that any sort of locking (including atomic operations)
is slow, but if the GIL is already held, then there is no *extra*
locking cost. (Well, a slightly longer hold on the lock, but...)

>> (5)  I'm not sure I understand the arguments around a per-entry
>> version.

>> On the one hand, you never need a strong reference to the value;
>> if it has been collected, then it has obviously been removed from
>> the dict and should trigger a change even with per-dict.
>
> Let's say that you watch the key1 of a dict. The key2 is modified, it
> increases the version. Later, you test the guard: to check if the key1
> was modified, you need to lookup the key and compare the value. You
> need the value to compare it.

And the value for key1 is still there, so you can.

The only reason you would notice that the key2 value had gone away is
if you also care about key2 -- in which case the cached value is out
of date, regardless of what specific value it used to hold.

>> (6)  I'm also not sure why version_tag *doesn't* solve the problem
>> of dicts that fool the iteration guards by mutating without changing
>> size ( https://bugs.python.org/issue19332 ) ... are you just saying
>> that the iterator views aren't allowed to rely on the version-tag
>> remaining stable, because replacing a value (as opposed to a
>> key-value pair) is allowed?

> If the dictionary values are modified during the loop, the dict
> version is increased. But it's allowed to modify values when you
> iterate on *keys*.

Sure.  So?

I see three cases:

(A)  I don't care that the collection changed.  The python
implementation might, but I don't.  (So no bug even today.)

(B)  I want to process exactly the collection that I started with.  If
some of the values get replaced, then I want to complain, even if
python doesn't.  version_tag is what I want.

(C)  I want to process exactly the original keys, but go ahead and use
updated values.  The bug still bites, but ... I don't think this case
is any more common than B.

-jJ


More information about the Python-Dev mailing list