Re: [Python-ideas] RFC: PEP: Add dict.version

Jan. 10, 2016

      On Sun, Jan 10, 2016 at 3:24 PM, Steven D'Aprano <steve@pearwood.info> wrote:
...
On Sat, Jan 09, 2016 at 05:18:40PM -0500, Terry Reedy wrote:
...
On 1/8/2016 4:27 PM, Victor Stinner wrote:
...
Add a new read-only ``__version__`` property to ``dict`` and
``collections.UserDict`` types, incremented at each change.
I agree with Neil Girdhar that this looks to me like a CPython-specific
implementation detail that should not be imposed on other
implementations.  For testing, perhaps we could add a dict_version
function in test.support that uses ctypes to access the internals.
Another reason to hide __version__ from the Python level is that its use
seems to me rather tricky and bug-prone.
What makes you say that? Isn't it a simple matter of:
v = mydict.__version__
maybe_modify(mydict)
if v != mydict.__version__:
    print("dict has changed")
which doesn't seen tricky or bug-prone to me.
That doesn't. I would, however, expect that __version__ is a read-only
attribute. I can't imagine any justifiable excuse for changing it; if
you want to increment it, just mutate the dict in some unnecessary
way.
...
...
But as near as I can tell, your proposal cannot detect all relevant
changes unless one is *very* careful.  A dict maps hashable objects to
objects.  Objects represent values.  So a dict represents a mapping of
values to values.  If an object is mutated, the object to object mapping
is not changed, but the semantic value to value mapping *is* changed.
In the following example, __version__ twice gives the 'wrong' answer
from a value perspective.
d = {'f': [int]}
d['f'][0] = float # object mapping unchanged, value mapping changed
d['f'] = [float]  # object mapping changed, value mapping unchanged
I don't think that matters for Victor's use-case. Going back to the toy
example above, Victor doesn't need to detect internal modifications to
the len built-in, because as you say it's immutable:
py> len.foo = "spam"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'builtin_function_or_method' object has no attribute
'foo'
He just needs to know if globals()['len'] and/or builtins.len are
different (in any way) from how they were when the function "demo" was
compiled.
There's more to it than that. Yes, a dict maps values to values; but
the keys MUST be immutable (otherwise hashing has problems), and this
optimization doesn't actually care about the immutability of the
value. When you use the name "len" in a Python function, somewhere
along the way, that will resolve to some object. Currently, CPython
knows in advance that it isn't in the function-locals, but checks at
run-time for a global and then a built-in; all FAT Python is doing
differently is snapshotting the object referred to, and then having a
quick check to prove that globals and builtins haven't been mutated.
Consider:

def enumerate_classes():
    return (cls.__name__ for cls in object.__subclasses__())

As long as nobody has *rebound* the name 'object', this will continue
to work - and it'll pick up new subclasses, which means that
something's mutable or non-pure in there. FAT Python should be able to
handle this just as easily as it handles an immutable. The only part
that has to be immutable is the string "len" or "object" that is used
as the key.

The significance of len being immutable and pure comes from the other
optimization, which is actually orthogonal to the non-rebound names
optimization, except that CPython already does this where it doesn't
depend on names.

CPython already constant-folds in situations where no names are
involved. That's how we maintain the illusion that there is such a
thing as a "complex literal":
...
...
...
dis.dis(lambda: 1+2j)
  1           0 LOAD_CONST               3 ((1+2j))
              3 RETURN_VALUE
FAT Python proposes to do the same here:
...
...
...
dis.dis(lambda: len("abc"))
  1           0 LOAD_GLOBAL              0 (len)
              3 LOAD_CONST               1 ('abc')
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 RETURN_VALUE
And that's where it might be important to check more than just the
identity of the object. If len were implemented in Python:
...
...
...
def len(x):
...     l = 0
...     for l, _ in enumerate(x, 1): pass
...     return l
...
len("abc")
3
len
<function len at 0x7fc6111769d8>
then it would be possible to keep the same len object but change its behaviour.
...
...
...
len.__code__ = (lambda x: 5).__code__
len
<function len at 0x7fc6111769d8>
len("abc")
5
Does anyone EVER do this? C compilers often have optimization levels
that can potentially alter the program's operation (eg replacing
division with multiplication by the reciprocal); if FAT Python has an
optimization flag that says "Assume no __code__ objects are ever
replaced", most programs would have no problem with it. (Having it
trigger an immediate exception would mean there's no "what the bleep
is going on" moment, and I still doubt it'll ever happen.)

I think there are some interesting possibilities here. Whether they
actually result in real improvement I don't know; but if FAT Python is
aiming to be fast at the "start program, do a tiny bit of work, and
then terminate" execution model (where JIT compilation can't help),
then it could potentially make Mercurial a *lot* faster to fiddle
with.

ChrisA

Re: [Python-ideas] RFC: PEP: Add dict.__version__

Chris Angelico

Re: [Python-ideas] RFC: PEP: Add dict.version