
On Sun, Jan 10, 2016 at 3:24 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Jan 09, 2016 at 05:18:40PM -0500, Terry Reedy wrote:
On 1/8/2016 4:27 PM, Victor Stinner wrote:
Add a new read-only ``__version__`` property to ``dict`` and ``collections.UserDict`` types, incremented at each change.
I agree with Neil Girdhar that this looks to me like a CPython-specific implementation detail that should not be imposed on other implementations. For testing, perhaps we could add a dict_version function in test.support that uses ctypes to access the internals.
Another reason to hide __version__ from the Python level is that its use seems to me rather tricky and bug-prone.
What makes you say that? Isn't it a simple matter of:
v = mydict.__version__ maybe_modify(mydict) if v != mydict.__version__: print("dict has changed")
which doesn't seen tricky or bug-prone to me.
That doesn't. I would, however, expect that __version__ is a read-only attribute. I can't imagine any justifiable excuse for changing it; if you want to increment it, just mutate the dict in some unnecessary way.
But as near as I can tell, your proposal cannot detect all relevant changes unless one is *very* careful. A dict maps hashable objects to objects. Objects represent values. So a dict represents a mapping of values to values. If an object is mutated, the object to object mapping is not changed, but the semantic value to value mapping *is* changed. In the following example, __version__ twice gives the 'wrong' answer from a value perspective.
d = {'f': [int]} d['f'][0] = float # object mapping unchanged, value mapping changed d['f'] = [float] # object mapping changed, value mapping unchanged
I don't think that matters for Victor's use-case. Going back to the toy example above, Victor doesn't need to detect internal modifications to the len built-in, because as you say it's immutable:
py> len.foo = "spam" Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'builtin_function_or_method' object has no attribute 'foo'
He just needs to know if globals()['len'] and/or builtins.len are different (in any way) from how they were when the function "demo" was compiled.
There's more to it than that. Yes, a dict maps values to values; but the keys MUST be immutable (otherwise hashing has problems), and this optimization doesn't actually care about the immutability of the value. When you use the name "len" in a Python function, somewhere along the way, that will resolve to some object. Currently, CPython knows in advance that it isn't in the function-locals, but checks at run-time for a global and then a built-in; all FAT Python is doing differently is snapshotting the object referred to, and then having a quick check to prove that globals and builtins haven't been mutated. Consider: def enumerate_classes(): return (cls.__name__ for cls in object.__subclasses__()) As long as nobody has *rebound* the name 'object', this will continue to work - and it'll pick up new subclasses, which means that something's mutable or non-pure in there. FAT Python should be able to handle this just as easily as it handles an immutable. The only part that has to be immutable is the string "len" or "object" that is used as the key. The significance of len being immutable and pure comes from the other optimization, which is actually orthogonal to the non-rebound names optimization, except that CPython already does this where it doesn't depend on names. CPython already constant-folds in situations where no names are involved. That's how we maintain the illusion that there is such a thing as a "complex literal":
dis.dis(lambda: 1+2j) 1 0 LOAD_CONST 3 ((1+2j)) 3 RETURN_VALUE
FAT Python proposes to do the same here:
dis.dis(lambda: len("abc")) 1 0 LOAD_GLOBAL 0 (len) 3 LOAD_CONST 1 ('abc') 6 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 9 RETURN_VALUE
And that's where it might be important to check more than just the identity of the object. If len were implemented in Python:
def len(x): ... l = 0 ... for l, _ in enumerate(x, 1): pass ... return l ... len("abc") 3 len <function len at 0x7fc6111769d8>
then it would be possible to keep the same len object but change its behaviour.
len.__code__ = (lambda x: 5).__code__ len <function len at 0x7fc6111769d8> len("abc") 5
Does anyone EVER do this? C compilers often have optimization levels that can potentially alter the program's operation (eg replacing division with multiplication by the reciprocal); if FAT Python has an optimization flag that says "Assume no __code__ objects are ever replaced", most programs would have no problem with it. (Having it trigger an immediate exception would mean there's no "what the bleep is going on" moment, and I still doubt it'll ever happen.) I think there are some interesting possibilities here. Whether they actually result in real improvement I don't know; but if FAT Python is aiming to be fast at the "start program, do a tiny bit of work, and then terminate" execution model (where JIT compilation can't help), then it could potentially make Mercurial a *lot* faster to fiddle with. ChrisA