Mailman 3 Third milestone of FAT Python - Python-Dev

newer
pypi simple index

Third milestone of FAT Python

older
async/await behavior on multiple...

Franklin? Lee

15 Dec 2015 15 Dec '15

11:23 a.m.

On Sat, Dec 04, 2015 at 7:49 AM, Victor Stinner <victor.stinner@gmail.com> wrote:

...

Versionned dictionary =====================

In the previous milestone of FAT Python, the versionned dictionary was a new type inherited from the builtin dict type which added a __version__ read-only (global "version" of dictionary, incremented at each change), a getversion(key) method (version of a one key) and it added support for weak references.

I was thinking (as an alternative to versioning dicts) about a dictionary which would be able to return name/value pairs, which would also be internally used by the dictionary. This would be way less sensitive to irrelevant changes in the scope dictionary, but cost an extra pointer to each key. Here's how it would work: pair = scope.item(name) scope[name] = newval assert pair.value is newval assert pair.key is name assert pair is scope.item(name) # Alternatively, to only create pair objects when `item` is called, have `==` compare the underlying pair. del scope[name] assert pair.key is None # name-dicts can't have `None` keys assert pair.value is None # Alternatively, pair.value is scope.NULL This dict will allow one to hold references to its entries (with the caller promising not to change them, enforced by exceptions). You won't have to keep looking up keys (unless the name is deleted), and functions are allowed to change. For inlining, you can detect whether the function has been redefined by testing the saved pair.value against the saved function, and go into the slow path if needed (or recompile the inlining). I am not sure whether deleting from the dict and then readding the same key should reuse the pair container. I think the only potential issue for the Python version is memory use. There aren't going to be THAT many names being deleted, right? So I say that deleted things in the scope dict should not be removed from the inner dict. I predict that this will simplify a lot of other things, especially when deleting and readding the same name: if you save a pair, and it becomes invalid, you don't have to do another lookup to make sure that it's REALLY gone. If memory is a real concern, deleted pairs can be weakrefed (and saved in a second dict?) until they are reused. This way, pairs which aren't saved by something outside will be removed. For implementation, a Python implementation of the idea has probably already been done. Here are some details: - set: Internally store d._d[k] = k,v. - get: Reject k if d._d[k].key is None. (Names must be strings.) - del: Set d._d[k].key = None and .val = d.NULL to invalidate this entry. For the CPython version, CPython's dict already stores its entries as PyDictKeyEntry (hash, *key, *value), but those entries can move around on resizing. Two possible implementations: 1. Fork dict to store {hash, *kv_pair}. 2. Use an inner dict (like in the Python suggestion) where values are kv_pair. Write the indirection code in C, because scope dicts must be fast. For exposing a pair to Python code, here are two possibilities: 1. Make them Python objects in the first place. 2. Keep a second hash table in lockstep with the first (so that you can do a lookup to find the index in the first, and then use that same index with the second). In this table, store pair objects that have been created. (They can be weakrefed, as before. Unless it's impossible to weakref something you're returning?) This will save memory for pairs that aren't ever exposed. If compact dictionaries are implemented, the second hash table will be a second array instead. To use this kind of scopedict, functions would have to store a list of used names, which is memory overhead. But for what you're doing, some overhead will be necessary anyway.

Show replies by date

Victor Stinner

15 Dec 15 Dec

8:29 p.m.

2015-12-15 12:23 GMT+01:00 Franklin? Lee <leewangzhong+python@gmail.com>:

...

I was thinking (as an alternative to versioning dicts) about a dictionary which would be able to return name/value pairs, which would also be internally used by the dictionary. This would be way less sensitive to irrelevant changes in the scope dictionary, but cost an extra pointer to each key.

Do you have an estimation of the cost of the "extra pointer"? Impact on memory and CPU. dict is really a very important type for the performance of Python. If you make dict slower, I'm sure that Python overall will be slower.

...

del scope[name] assert pair.key is None

It looks tricky to keep the dict and the pair objects consistent, especially in term of atomaticity. You will need to keep a reference to the pair object in the dict entry, which will also make the dict larger (use more memory), right?

...

You won't have to keep looking up keys (unless the name is deleted), and functions are allowed to change. For inlining, you can detect whether the function has been redefined by testing the saved pair.value against the saved function, and go into the slow path if needed (or recompile the inlining).

For builtin functions, I also need to detect when a key is created in the global namespace. How do you handle this case with pairs?

...

If memory is a real concern, deleted pairs can be weakrefed (and saved in a second dict?) until they are reused. This way, pairs which aren't saved by something outside will be removed.

Supporting weak references also has a cost on the memory footprint... For FAT Python, not being able to detect quickly when a new key is created is a blocker point. Victor

Franklin? Lee

9:10 p.m.

More thoughts. (Stealing your style of headers.) Just store a pointer to value ============================= Instead of having the inner dict store k_v pairs. In C, the values in our hash tables will be: struct refcell{ PyObject *value; // NULL if deleted }; It's not necessary to store the key. I think I only had it so I could mark it None in the Python implementation, to denote a deleted key. But a deleted entry could just have `cell.value is ScopeDict.NULL` (C: cell.value == NULL). The scope dict will own all values which don't have exposed references, and all exposed references (which own the value they reference). (Alternatively, store the value directly in the hash table. If something asks for a reference to it, replace the value with a PyObject that refers to it, and flag that entry in the auxilary hash table.) This might be what PyCellObject is, which is how I realized that I didn't need the key: https://docs.python.org/3.5/c-api/cell.html Deleting from scope =================== When deleting a key, don't remove the key from the inner dict, and just set the reference to NULL. Outside code should never get the raw C `refcell`, only a Python object. This makes it possible to clean up unused references when the dict expands or contracts: for each `refcell`, if it has no Pair object or its Pair object is not referenced by anything else, and if its value is NULL (i.e. deleted), don't store it in the new hash table. Get pairs before their keys are defined ======================================= When the interpreter compiles a function, it can request references which _don't exist yet_. The scope dict would simply create the entry in its inner dict and fill it in when needed. That means that each name only needs to be looked up when a function is created. scope = Scopedict() f = scope.ref('f') scope['f'] = float f.value('NaN') This would be a memory issue if many functions are created with typo'd names. But - You're not making a gigantic amount of functions in the first place. - You'll eventually remove these unused entries when you resize the inner dict. (See previous section.) I was concerned about which scope would be responsible for creating the entry, but it turns out that if you use a name in a function, every use of that name has to be for the same scope. So the following causes a NameError: def f(): def g(x): return abs(x) for i in range(30): print(g(i)) if i == 10: def abs(x): return "abs" + str(x) if d == 20: del abs and you can tell which scope to ask for the reference during function compilation. Does this simplify closures? ============================ (I haven't yet looked at Python's closure implementation.) A refcell (C struct) will be exposed as a RefCell (PyObject), which owns it. This means that RefCell is reference-counted, and if something saved a reference to it, it will persist even after its owning dict is deleted. Thus, when a scope dict is deleted, each refcell without a RefCell object is deleted (and its value is DecRef'd), and the other ones just have their RefCell object decrement a reference. Then closures are free: each inner function has refcounted references to the cells that it uses, and it doesn't need to know whether its parent is alive. (The implementation of closures involves cell objects.) Overhead ======== If inner functions are being created a lot, that's extra work. But I guess you should expect a lot of overhead if you're doing such a thing. Readonly refs ============= It might be desirable to have refs that are allowed to write (e.g. from `global` and `nonlocal`) and refs that aren't. The CellObject would just hold a count for the number of writing refs, separate from the number of refs. This might result in some optimizations for values with no writing refs. For example, it's possible to implement copying of dicts as a shallow copy if it's KNOWN that any modification would result in a call to its set/del functions, which would initiate a deep copy. On Tue, Dec 15, 2015 at 3:29 PM, Victor Stinner <victor.stinner@gmail.com> wrote:

...

2015-12-15 12:23 GMT+01:00 Franklin? Lee <leewangzhong+python@gmail.com>:

...
I was thinking (as an alternative to versioning dicts) about a dictionary which would be able to return name/value pairs, which would also be internally used by the dictionary. This would be way less sensitive to irrelevant changes in the scope dictionary, but cost an extra pointer to each key.

Do you have an estimation of the cost of the "extra pointer"? Impact on memory and CPU. dict is really a very important type for the performance of Python. If you make dict slower, I'm sure that Python overall will be slower.

I'm proposing it as a subclass.

...

It looks tricky to keep the dict and the pair objects consistent, especially in term of atomaticity. You will need to keep a reference to the pair object in the dict entry, which will also make the dict larger (use more memory), right?

Yes, but it will be about 25% bigger than the underlying dict's tables. You store an extra pointer, while the underlying tables are (hash, key, value), which is a 64-bit value and two 32-bit values. If Python moves to a compact dict implementation, it will still be 25% bigger, because the secondary table will be kept in lockstep with the compact array instead of the sparse array.

...

...
You won't have to keep looking up keys (unless the name is deleted), and functions are allowed to change. For inlining, you can detect whether the function has been redefined by testing the saved pair.value against the saved function, and go into the slow path if needed (or recompile the inlining).

For builtin functions, I also need to detect when a key is created in the global namespace. How do you handle this case with pairs?

(I realized you don't need to keep a key, so I threw away pairs.) Instead of detecting key insertion, I allow the creation of references before there's anything to reference. In other words, when a function is created that uses a name which isn't yet defined, an entry (to NULL) is created in the scope's inner dict, and a Python object for that entry. I really think this is a good approach to that problem. In the case of global versus __builtin__, the entry would be created in the globals() dict, which initially points to __builtin__'s entry. This would require a double dereference, but I think there's no other way to have nested ambiguous scoping like that (where you don't know where you're looking it up until you need it). If there is, the Python object can hold a "number of indirections". This would allow passing through to __builtin__, but still allow saving CellRefs to Python variables. On deletion, it would re-look-up the builtin version. If you repeatedly create and delete `map` in module scope, it would have to keep looking up the reference to the builtin when you delete. But if you're repeatedly deleting and reusing the same name, you're kind of being a jerk. This can be solved with more overhead. Alternatively, module scope dicts can be even more special, and hold a pair of references: one to the module scope *value*, one to the __builtin__ *reference*. So for a module scope reference, it will try to return its own value, and if it's NULL, it will ask the builtin reference to try. This means each module has the overhead of its own names plus the overhead of referring to module names, and builtins will have a name for... every single module's names. Eh. I'd rather just punish the jerk. In fact, don't have globals() save a reference to __builtin__'s entry unless it exists at some point. `__builtins__.__dict__.ref("argargarg", create_if_none=False) => None`.

Victor Stinner

10:38 p.m.

2015-12-15 22:10 GMT+01:00 Franklin? Lee <leewangzhong+python@gmail.com>:

...

(Stealing your style of headers.)

I'm using reStructured Text, it's not really a new style :-)

...

Overhead ========

If inner functions are being created a lot, that's extra work. But I guess you should expect a lot of overhead if you're doing such a thing.

Sorry, I didn't read carefully your email, but I don't think that it's acceptable to make Python namespaces slower. In FAT mode, we need versionned dictionaries for module namespace, type namespace, global namespace, etc.

...

...
Do you have an estimation of the cost of the "extra pointer"? Impact on memory and CPU. dict is really a very important type for the performance of Python. If you make dict slower, I'm sure that Python overall will be slower.

I'm proposing it as a subclass.

Please read the "Versionned dictionary" section of my email: https://mail.python.org/pipermail/python-dev/2015-December/142397.html I explained why using a subclass doesn't work in practice. Victor

Franklin? Lee

16 Dec 16 Dec

12:59 a.m.

I realized yet another thing, which will reduce overhead: the original array can store values directly, and you maintain the refs by repeatedly updating them when moving refs around. RefCells will point to a pointer to the value cell (which already exists in the table). - `getitem` will be almost the same as a normal dict: it has to check if value is valid (which it already checked, but it will be invalid a lot more often). - `setitem` the same as a normal dict (since the RefCells will just point to the _address_ of the value pointer in the main table), except that the dict will be bigger, and compaction/expansion has more overhead. No creation of refcells here. - `delitem` will just null the value pointer, which shouldn't cost much more, if it doesn't cost less. - Expansion and compaction will cost more, since we have to copy over the RefCell pointers and also check whether they should be copied. - Deletion of the dict will cost more, due to the additional logic for deciding what to delete, and the RefCells can no longer point into the entry table. They would have to point at the value (requiring logic, or the replacement of a function pointer) or at a new allocated object (requiring an allocation of sizeof(PyObject*) bytes). On Tue, Dec 15, 2015 at 5:38 PM, Victor Stinner <victor.stinner@gmail.com> wrote:

...

Sorry, I didn't read carefully your email, but I don't think that it's acceptable to make Python namespaces slower. In FAT mode, we need versionned dictionaries for module namespace, type namespace, global namespace, etc.

It was actually more "it might be a problem" than "it will be a problem". I don't know if the overhead will be high enough to worry about. It might be dominated by whatever savings there would be by not having to look up names more than once. (Unless builtins get mixed with globals? I think that's solvable, though. It's just that the solutions I can think of have different tradeoffs.) I am confident that the time overhead and the savings will beat the versioning dict. The versioning dict method has to save a reference to the variable value and a reference to the name, and regularly test whether the dict has changed. This method only has to save a reference to a reference to the value (though it might need the name to allow debugging), doesn't care if it's changed, will be an identity (to NULL?) test if it's deleted (and only if it's not replaced after), and absolutely doesn't care if the dict had other updates (which might increase the version number).

...

...
...
Do you have an estimation of the cost of the "extra pointer"? Impact on memory and CPU. dict is really a very important type for the performance of Python. If you make dict slower, I'm sure that Python overall will be slower.

I'm proposing it as a subclass.

Please read the "Versionned dictionary" section of my email: https://mail.python.org/pipermail/python-dev/2015-December/142397.html

I explained why using a subclass doesn't work in practice.

I've read it again. By subclass, I mean that it implements the same interface. But at the C level, I want to have it be a fork(?) of the current dict implementation. As for `exec`, I think it might be okay for it to be slower at the early stages of this game. Here's the lookup function for a string-only dict (used both for setting and getting): https://github.com/python/cpython/blob/master/Objects/dictobject.c#L443 I want to break that up into two parts: - Figure out the index of the {hash, *key, *val} entry in the array. - Do whatever to it. (In the original: make *value_addr point to the value pointer.) I want to do this so that I can use that index to point into ANOTHER array, which will be the metadata for the refcells (whatever it ends up being). This will mean that there's no second lookup. This has to be done at the C level, because the dict object doesn't expose the index of the {hash, *key, *val} entries on lookup. If you don't want to make it a subclass, then we can propose a new function `get_ref` (or something) for dict's C API (probably a hard sell), which returns RefCell objects, and an additional pointer in `dict` to the RefCells table (so a total of two pointers). When `get_ref` is first called, it will - calloc the RefCell table (which will be the same length as the entry table) - replace all of the dict's functions with ones that know how to deal with the RefCells, - replace itself with a function that knows how to return these refs. - call its replacement. If the dict never gets RefCells, you only pay a few pointers in size, and a few creation/deletion values. This is possible now that the dictionary itself will store values as normal. There might be more necessary. For example, the replaced functions might need to keep pointers to their originals (so that you can slip additional deep C subclasses in). And it might be nice if the `get_index` function could be internally relied upon by the C-level subclasses, because "keeping a metadata table index-synchronized with the real one" is something I've wanted to do for two different dict subclasses now.

Victor Stinner

7:01 a.m.

Le mercredi 16 décembre 2015, Franklin? Lee <leewangzhong+python@gmail.com> a écrit :

...

I am confident that the time overhead and the savings will beat the versioning dict. The versioning dict method has to save a reference to the variable value and a reference to the name, and regularly test whether the dict has changed.

The performance of guards matters less than the performance of regular usage of dict. If we have to make a choice, I prefer "slow" guard but no impact on regular dict methods. It's very important that enabling FAT mode doesn't kill performances. Remember that FAT python is a static optimizer and so can only optimize some patterns, not all Python code. In my current implementation, a lookup is only needed when aguard is checked if the dict was modified. The dict version doesn't change if a mutable object was modified in place for example. I didn't benchmark, but I expect that the lookup is avoided in most cases. You should try FAT python and implement statistics before going too far in your idea.

...

I've read it again. By subclass, I mean that it implements the same interface. But at the C level, I want to have it be a fork(?) of the current dict implementation. As for `exec`, I think it might be okay for it to be slower at the early stages of this game.

Be careful, dict methods are hardcoded in the C code. If your type is not a subtype, there is risk of crashes. I fixed issues in Python/ceval.c, but it's not enough. You may also have to fix issues in third party C extensions why only expect dict for namespaces. Victor

Franklin? Lee

17 Dec 17 Dec

10:54 a.m.

On Wed, Dec 16, 2015 at 2:01 AM, Victor Stinner <victor.stinner@gmail.com> wrote:

...

Le mercredi 16 décembre 2015, Franklin? Lee <leewangzhong+python@gmail.com> a écrit :

...
I am confident that the time overhead and the savings will beat the versioning dict. The versioning dict method has to save a reference to the variable value and a reference to the name, and regularly test whether the dict has changed.

The performance of guards matters less than the performance of regular usage of dict. If we have to make a choice, I prefer "slow" guard but no impact on regular dict methods. It's very important that enabling FAT mode doesn't kill performances. Remember that FAT python is a static optimizer and so can only optimize some patterns, not all Python code.

In my current implementation, a lookup is only needed when aguard is checked if the dict was modified. The dict version doesn't change if a mutable object was modified in place for example. I didn't benchmark, but I expect that the lookup is avoided in most cases. You should try FAT python and implement statistics before going too far in your idea.

My suggestion should improve *all* function calls which refer to outside names. Each function keeps an indirect, automagically updated reference to the current value of the names they use, and will never need to look things up again.[*] There wouldn't be a need to save global names as default arguments (`def f(x, list=list):`). [*]: Not exactly true. Nested scopes can cause an issue. I'm not sure what happens if you redefine the __builtin__ name after these functions are defined. My suggestion would not know that the __builtin__ was switched out, since it saves a ref into the original __builtin__. I'm not sure about how to deal with nested scopes (for example, class inheritance). I think the "chained RefCells" idea works. Chained Refcell idea: There are three cases where we can have nested scopes: 1. An object's __dict__ nests its class. 2. A class's __dict__ nests its superclasses. 3. globals() nests __builtin__. 4? Does a package nest its modules? 5?. Does `from module import *` nest the module's __dict__? (Nonlocal variables in nested functions, and probably nested classes, are not a case of nested scope, since scope of each name is determined during compilation of the function/class.) RefCells of nested scopes will have a pointer to their value (or NULL), and an array/linkedlist of pointers to refcells from their parent dicts (or to their parent dicts if a refcell wasn't successfully acquired yet). When you request a RefCell from a nested Scope, it will return its value if it exists. Otherwise, it requests refcells from each parent (parents won't create refcells unless there's a value) until it gets one. When you ask a RefCell to resolve, it will check its own value, then ask each parent for a value (creating intermediate refcells *if* value exist). It will not need to do lookups in parents if it got a refcell before (though the refcell might be null). Problem: If you have class A(B, C), and you resolve a refcell for a name which exists in C but not B, you will look things up through B's dict every single time. It will fail, every single time. We can't skip B, since B is allowed to get such a name later, but I don't want to add refs to names that might never exist. This can be solved either through versioning or by checking whether a dict is read-only (such as for built-in types). In fact, in the code I wrote at the end of this email, RefCell.resolve() might even look things up in a shared ancestor multiple times. However, this would be incorrect anyway, since it doesn't respect Python's MRO resolution. So we can just fix that. RefCell.resolve would need a `search_parents: bool` parameter.

...

...
I've read it again. By subclass, I mean that it implements the same interface. But at the C level, I want to have it be a fork(?) of the current dict implementation. As for `exec`, I think it might be okay for it to be slower at the early stages of this game.

Be careful, dict methods are hardcoded in the C code. If your type is not a subtype, there is risk of crashes.

Not exactly, and this is important. Many functions are called via pointer. It's like C++'s virtual methods, but more dynamic, since they can be changed per object. See https://github.com/python/cpython/blob/master/Objects/dict-common.h#L17. For example, the lookup method for str-only dicts swaps itself for a general object lookup method if it finds a non-string key. See https://github.com/python/cpython/blob/master/Objects/dictobject.c#L549. I'm now suggesting that there be additional space in the dict object itself to hold more function pointers (get_ref and get_hash_table_index), which would slightly increase memory cost and creation time. It won't have extra cost for running normal methods. When the dict gets a request for a reference, it will swap in methods that knows how to handle metadata, which WILL make (a few) things (a little) slower upon resizing. You only pay for what you ask for (except the extra dict API methods, which will slightly increase the cost and creation time). A few more pointers shouldn't hurt, since PyObjects are already big (see the overhead of dicts here: https://github.com/python/cpython/blob/master/Objects/dictobject.c#L2748). I'm not sure that get_hash_table_index is necessary. (I misunderstood something when rereading the lookup functions.) It should be possible to calculate the index in the hash table by subtracting the lookup's return value by the base index. == Some pseudocode == Notes: - A lot of repeated lookups are made in the code below. No repeated lookups (in a single call) are necessary in C. - I made a few style choices which wouldn't be Pythonic (e.g. explicitly testing for key) to make it easier to see what the C would do. - I wrote it as a subclass. It doesn't have to be. - We can ask for getref to become standard. It could be useful for a few purposes. (Namely, implementing scope dicts when writing interpreters for other languages, and for pass-by-reference in those interpreters.) - `parents` can be a special thing for nested scope dicts (such as those listed above). - Like I said before, we can plug in the more expensive functions the first time getref is called. A dict can dynamically become a dict_with_refs. - I'm being sloppy with self.refs. Sorry. Sometimes I write `self.refs[key] is NULL` and sometimes I write `key not in self.refs`. It's the same thing. - `NULL` might be `dummy` (which is used in the dict implementation). - `delete` means `Py_XDECREF` or `Py_DECREF`. This is only used when I feel like emphasizing the memory management. - I remembered that class dicts already use shared keys. I should look into that to see if we can leverage the mechanisms there. - We can decide instead that RefCells only own their value if they don't belong to a living scope. Meaning, they don't try to delete anything when they're deleted unless their owning scope is dead. - NestedRefCells can be a subclass. It would save a pointer in some cases. (But it's a PyObject, so it'd not save much.) - In C, the KeyErrors would instead return NULL. The code follows. class ScopeDict(dict): __slots__ = { '__inner__': Dict[Any, Nullable[Any]], # == super(). 'refs': SharedDict[Any, Nullable[RefCells]], # Shares keys with __inner__. 'parents': List[ScopeDict], } class RefCell: __slots__ = { 'key': Nullable[str], # Or not nullable? 'value_ptr': Pointer[Nullable[Any]], # Pointer to the pointer to the value object. 'parents': Nullable[ScopeDict | RefCell], 'indirect': bool, # True: # The owning dict is alive. # value_ptr is reference to pointer to value. # This is the normal case. # False: # The owning dict is gone. # value_ptr is counted reference to value. # The cell owns the reference. # This bit can be packed into the value_ptr. # In fact, this isn't necessary: # The .resolve function can be dynamic. } def ScopeDict.getref(self, key, create_if_none=True): """ Get a ref to a key. Raise KeyError if it doesn't exist and if not create_if_none. """ if self.refs[key] is not NULL: # refcell exists return self.refs[key] if key in self: #value exists, refcell doesn't # Create refcell to the value pointer return self.create_ref(key) # Value does not exist. Search direct parents. # Stop at the first parent cell, even if it doesn't # have a value. One parent cell is enough to justify # the creation of a cell. for i, parent in enumerate(self.parents if self.parents else ()): try: ref = parent.getref(key, create_if_none=False) index = i break except KeyError: pass else: #No parent has the key if create_if_none: # Create ref return self.create_ref(key) else: #Give up raise KeyError(key) # Found a parent with a refcell. # Save a reference to it. cell = self.create_ref(key) cell.parents[index] = ref return cell def ScopeDict.create_ref(self, key): """ Create a refcell. """ # Add key to inner dict if it doesn't exist. if key not in self.__inner__: self.__inner__[key] = NULL # Wrap the address of the value pointer in a refcell. cell = RefCell(&(self.__inner__.value_pointer(key))) self.refs[key] = cell if self.parents: # Save the parents. # This is in case value == NULL # and it needs to look it up in the parents. cell.parents = self.parents.copy() else: cell.parents = NULL # Not necessary if no parents. # (Except for tracebacks?) cell.key = key return cell def RefCell.resolve(self): """ Resolve cell to a value. Will not return NULL. Will raise KeyError if fail. (In C, it would return NULL instead.) """ if not self.indirect: if self.value_ptr is not NULL: return self.value_ptr elif self.value_ptr.deref() is not NULL: return self.value_ptr.deref() # No parents to search if self.parents is NULL: raise KeyError # Search parents for value. for i, parent in enumerate(self.parents): # We want the parent CELL. if not isinstance(parent, RefCell): # Try to ask for a ref from parent. assert isinstance(parent, ScopeDict) try: parent = parent.getref(self.key, create_if_none=False) except KeyError: continue # Give up on this parent. self.parents[i] = parent # Try to get parent cell to resolve. try: return parent.resolve() except KeyError: continue raise KeyError Here are some of the wrapper algorithms for the dict methods. # No change. ScopeDict.__setitem__ = dict.__setitem__ def ScopeDict.keys(self): # Example of iteration. # This is already the algorithm, probably, so there's no extra cost. for key, value in self.__inner__.items(): if value is not NULL: yield key def ScopeDict.__getitem__(self, key): result = self.__inner__.get(key, NULL) # This is an extra check. if result is not NULL: return result if key in self.refs: # This is an extra check. # Only necessary for nested scopes. # So a regular dict doesn't even need this. # In fact, for class dicts, you don't want it, # since it skips the MRO. try: self.refs[key].resolve() except KeyError: pass raise KeyError(key) def ScopeDict.__delitem__(self, key): if self.__inner__.get(key, NULL) is NULL: # extra check? raise KeyError(key) delete self.__inner__[key] self.__inner__[key] = NULL def ScopeDict.__del__(self): """ Delete this dict. """ for key in self.__inner__: ref = self.refs[key] if ref is NULL: # no ref (standard dict case) delete self.__inner__[key] # DecRef the value else: if ref.__refcount > 1: #ref is exposed # Make it point directly to the value. ref.value_ptr = self.__inner__[key] ref.indirect = True self.refs[key] = NULL delete ref # DecRef, not dict removal def ScopeDict.compact(self): """ Compact the dictionary. (Similarly for expanding.) """ new_table = {} new_refs = {} # Remove unnecessary entries # Let dict.__table be the internal entry table for key in self.__inner__: ref = self.refs[key] if ref is not NULL and ref.__refcount == 1: # Ref exists but is not exposed. # Delete unused reference. ref.value_ptr = NULL delete ref ref = NULL if value is not NULL or ref is not NULL: # Add it to the new table using normal dict # compact algorithm. (I don't know it.) new_table[key] = value new_refs[key] = ref # Can add a check here: If there are no live refs, # convert to a regular dict. self.__inner__ = new_table self.refs = new_refs def RefCell.__del__(self): if self.indirect: #pointing at the pointer delete self.value_ptr.deref() else: #pointing at the value delete self.value_ptr delete self.key delete self.parents

3315

Age (days ago)

3317

Last active (days ago)

List overview

Download

6 comments

2 participants

participants (2)

Franklin? Lee
Victor Stinner

Third milestone of FAT Python

Franklin? Lee

Victor Stinner

Franklin? Lee

Victor Stinner

Franklin? Lee

Victor Stinner

Franklin? Lee

tags

participants (2)