
On Fri, Apr 15, 2016 at 4:41 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
2016-04-15 19:54 GMT+02:00 Jim J. Jewett <jimjjewett@gmail.com>:
(2) Why *promise* not to update the version_tag when replacing a value with itself?
It's an useful property. For example, let's say that you have a guard on globals()['value']. The guard is created with value=3. An unit test replaces the value with 50, but then restore the value to its previous value (3). Later, the guard is checked to decide if an optimization can be used.
If the dict version is increased, you need a lookup. If the dict version is not increased, the guard is cheap.
I would expect the version to be increased twice, and therefore to require a lookup. Are you suggesting that unittest should provide an example of resetting the version back to the original value when it cleans up after itself?
In C, it's very cheap to implement the test "new_value == old_value", it just compares two pointers.
Yeah, I understand that it is likely a win in terms of performance, and a good way to start off (given that you're willing to do the work). I just worry that you may end up closing off even better optimizations later, if you make too many promises about exactly how you will do which ones. Today, dict only cares about ==, and you (reasonably) think that full == isn't always worth running ... but when it comes to which tests *are* worth running, I'm not confident that the answers won't change over the years.
[2A] Do you want to promise that replacing a value with a non-identical object *will* trigger a version_tag update *even* if the objects are equal?
It's already written in the PEP:
I read that as a description of what the code does, rather than a spec for what it should do... so it isn't clear whether I could count on that remaining true. For example, if I know that my dict values are all 4-digit integers, can I write: d[k] = d[k] + 0 and be assured that the version_tag will bump? Or is that something that a future optimizer might optimize out?
(3) It is worth being explicit on whether empty dicts can share a version_tag of 0. If this PEP is about dict content, then that seems fine, and it may well be worth optimizing dict creation.
This is not part of the PEP yet. I'm not sure that I will modify the PEP to use the version 0 for empty dictionaries. Antoine doesn't seem to be convinced :-)
True. But do note that "not hitting the global counter an extra time for every dict creation" is a more compelling reason than "we could speed up dict.clear(), sometimes".
(4) Please be explicit about the locking around version++; it is enough to say that the relevant methods already need to hold the GIL (assuming that is true).
I don't think that it's important to mention it in the PEP. It's more an implementation detail. The version can be protected by atomic operations.
Now I'm the one arguing from a specific implementation. :D My thought was that any sort of locking (including atomic operations) is slow, but if the GIL is already held, then there is no *extra* locking cost. (Well, a slightly longer hold on the lock, but...)
(5) I'm not sure I understand the arguments around a per-entry version.
On the one hand, you never need a strong reference to the value; if it has been collected, then it has obviously been removed from the dict and should trigger a change even with per-dict.
Let's say that you watch the key1 of a dict. The key2 is modified, it increases the version. Later, you test the guard: to check if the key1 was modified, you need to lookup the key and compare the value. You need the value to compare it.
And the value for key1 is still there, so you can. The only reason you would notice that the key2 value had gone away is if you also care about key2 -- in which case the cached value is out of date, regardless of what specific value it used to hold.
(6) I'm also not sure why version_tag *doesn't* solve the problem of dicts that fool the iteration guards by mutating without changing size ( https://bugs.python.org/issue19332 ) ... are you just saying that the iterator views aren't allowed to rely on the version-tag remaining stable, because replacing a value (as opposed to a key-value pair) is allowed?
If the dictionary values are modified during the loop, the dict version is increased. But it's allowed to modify values when you iterate on *keys*.
Sure. So? I see three cases: (A) I don't care that the collection changed. The python implementation might, but I don't. (So no bug even today.) (B) I want to process exactly the collection that I started with. If some of the values get replaced, then I want to complain, even if python doesn't. version_tag is what I want. (C) I want to process exactly the original keys, but go ahead and use updated values. The bug still bites, but ... I don't think this case is any more common than B. -jJ