PEP 416: Add a frozendict builtin type
As requested, I create a PEP and a related issue: http://www.python.org/dev/peps/pep-0416/ http://bugs.python.org/issue14162 The PEP 416 is different from my previous propositions: frozendict values can be mutable and dict doesn't inherit from frozendict anymore. But it is still possible to use the PyDict C API on frozendict (which is more an implementation detail). TODO: - write the documentation - decide if new functions should be added to the C API, maybe a new PyFrozenDict_New() function? (but would it take a mapping or a list of tuple?) -- PEP: 416 Title: Add a frozendict builtin type Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 29-February-2012 Python-Version: 3.3 Abstract ======== Add a new frozendict builtin type. Rationale ========= A frozendict mapping cannot be changed, but its values can be mutable (not hashable). A frozendict is hashable and so immutable if all values are hashable (immutable). Use cases of frozendict: * hashable frozendict can be used as a key of a mapping or as a member of set * frozendict helps optimization because the mapping is constant * frozendict avoids the need of a lock when the frozendict is shared by multiple threads or processes, especially hashable frozendict Constraints =========== * frozendict has to implement the Mapping abstract base class * frozendict keys and values can be unorderable * a frozendict is hashable if all keys and values are hashable * frozendict hash does not depend on the items creation order Implementation ============== * Add a PyFrozenDictObject structure based on PyDictObject with an extra "Py_hash_t hash;" field * frozendict.__hash__() is implemented using hash(frozenset(self.items())) and caches the result in its private hash attribute * Register frozendict has a collections.abc.Mapping * frozendict can be used with PyDict_GetItem(), but PyDict_SetItem() and PyDict_DelItem() raise a TypeError Recipe: immutable dict ====================== An immutable mapping can be implemented using frozendict:: import itertools class immutabledict(frozendict): def __new__(cls, *args, **kw): # ensure that all values are immutable for key, value in itertools.chain(args, kw.items()): if not isinstance(value, (int, float, complex, str, bytes)): hash(value) # frozendict ensures that all keys are immutable return frozendict.__new__(cls, *args, **kw) def __repr__(self): return 'immutabledict' + frozendict.__repr__(self)[10:] Objections ========== *namedtuple may fit the requiements of a frozendict.* A namedtuple is not a mapping, it does not implement the Mapping abstract base class. *frozendict can be implemented in Python using descriptors" and "frozendict just need to be practically constant.* If frozendict is used to harden Python (security purpose), it must be implemented in C. A type implemented in C is also faster. *The PEP 351 was rejected.* The PEP 351 tries to freeze an object and so may convert a mutable object to an immutable object (using a different type). frozendict doesn't convert anything: hash(frozendict) raises a TypeError if a value is not hashable. Freezing an object is not the purpose of this PEP. Links ===== * PEP 412: Key-Sharing Dictionary (`issue #13903 <http://bugs.python.org/issue13903>`_) * PEP 351: The freeze protocol * `The case for immutable dictionaries; and the central misunderstanding of PEP 351 <http://www.cs.toronto.edu/~tijmen/programming/immutableDictionaries.html>`_ * `Frozen dictionaries (Python recipe 414283) <http://code.activestate.com/recipes/414283-frozen-dictionaries/>`_ by Oren Tirosh Copyright ========= This document has been placed in the public domain.
On Wed, 2012-02-29 at 19:21 +0100, Victor Stinner wrote:
As requested, I create a PEP and a related issue:
[...snip...]
Rationale =========
A frozendict mapping cannot be changed, but its values can be mutable (not hashable). A frozendict is hashable and so immutable if all values are hashable (immutable).
The wording of the above seems very unclear to me. Do you mean "A frozendict has a constant set of keys, and for every key, d[key] has a specific value for the lifetime of the frozendict. However, these values *may* be mutable. The frozendict is hashable iff all of the values are hashable." ? (or somesuch) [...snip...]
* Register frozendict has a collections.abc.Mapping s/has/as/ ?
[...snip...]
If frozendict is used to harden Python (security purpose), it must be implemented in C. A type implemented in C is also faster.
You mention security purposes here, but this isn't mentioned in the Rationale or Use Cases Hope this is helpful Dave
Rationale =========
A frozendict mapping cannot be changed, but its values can be mutable (not hashable). A frozendict is hashable and so immutable if all values are hashable (immutable). The wording of the above seems very unclear to me.
Do you mean "A frozendict has a constant set of keys, and for every key, d[key] has a specific value for the lifetime of the frozendict. However, these values *may* be mutable. The frozendict is hashable iff all of the values are hashable." ? (or somesuch)
[...snip...]
I agree that this sentence needs some clarification. David's formulation is also what I would guess it to mean, but it should be stated more explicitly. Eli
Rationale =========
A frozendict mapping cannot be changed, but its values can be mutable (not hashable). A frozendict is hashable and so immutable if all values are hashable (immutable). The wording of the above seems very unclear to me.
Do you mean "A frozendict has a constant set of keys, and for every key, d[key] has a specific value for the lifetime of the frozendict. However, these values *may* be mutable. The frozendict is hashable iff all of the values are hashable." ? (or somesuch)
New try: "A frozendict is a read-only mapping: a key cannot be added nor removed, and a key is always mapped to the same value. However, frozendict values can be mutable (not hashable). A frozendict is hashable and so immutable if and only if all values are hashable (immutable)."
* Register frozendict has a collections.abc.Mapping s/has/as/ ?
Oops, fixed.
If frozendict is used to harden Python (security purpose), it must be implemented in C. A type implemented in C is also faster.
You mention security purposes here, but this isn't mentioned in the Rationale or Use Cases
I added two use cases: security sandbox and cache.
Hope this is helpful
Yes, thanks. Victor
On 1 March 2012 12:08, Victor Stinner <victor.stinner@gmail.com> wrote:
New try:
"A frozendict is a read-only mapping: a key cannot be added nor removed, and a key is always mapped to the same value. However, frozendict values can be mutable (not hashable). A frozendict is hashable and so immutable if and only if all values are hashable (immutable)."
I'd suggest you don't link immutability and non-hashability so tightly. Misbehaved objects can be mutable but hashable:
class A: ... def __init__(self,a): ... self.a = a ... def __hash__(self): ... return 12 ... a = A(1) hash(a) 12 a.a=19 hash(a) 12
Just avoid using the term "immutable" at all: "A frozendict is a read-only mapping: a key cannot be added nor removed, and a key is always mapped to the same value. However, frozendict values can be mutable. A frozendict is hashable if and only if all values are hashable." I realise this is a weaker statement than you'd like to give (immutability seems to be what people *really* think they want when they talk about frozen objects), but don't promise immutability if that's not what you're providing. More specifically, I'd hate to think that someone for whom security is an issue would see your original description and think they could use a frozendict and get safety, only to find their system breached because of a class like A above. The same could happen to people who want to handle thread safety via immutable objects, who could also end up with errors if misbehaving classes found their way into an application. Paul.
To close the loop, I've rejected the PEP, adding the following rejection notice: """ I'm rejecting this PEP. A number of reasons (not exhaustive): * According to Raymond Hettinger, use of frozendict is low. Those that do use it tend to use it as a hint only, such as declaring global or class-level "constants": they aren't really immutable, since anyone can still assign to the name. * There are existing idioms for avoiding mutable default values. * The potential of optimizing code using frozendict in PyPy is unsure; a lot of other things would have to change first. The same holds for compile-time lookups in general. * Multiple threads can agree by convention not to mutate a shared dict, there's no great need for enforcement. Multiple processes can't share dicts. * Adding a security sandbox written in Python, even with a limited scope, is frowned upon by many, due to the inherent difficulty with ever proving that the sandbox is actually secure. Because of this we won't be adding one to the stdlib any time soon, so this use case falls outside the scope of a PEP. On the other hand, exposing the existing read-only dict proxy as a built-in type sounds good to me. (It would need to be changed to allow calling the constructor.) """ -- --Guido van Rossum (python.org/~guido)
On the other hand, exposing the existing read-only dict proxy as a built-in type sounds good to me. (It would need to be changed to allow calling the constructor.)
I wrote a small patch to implement this request: http://bugs.python.org/issue14386 I also opened the following issue to support other types than dict for __builtins__: http://bugs.python.org/issue14385 This issue is directly related to pysandbox, but it may help other purpose. Victor
Rationale =========
A frozendict mapping cannot be changed, but its values can be mutable (not hashable). A frozendict is hashable and so immutable if all values are hashable (immutable). The wording of the above seems very unclear to me.
Do you mean "A frozendict has a constant set of keys, and for every key, d[key] has a specific value for the lifetime of the frozendict. However, these values *may* be mutable. The frozendict is hashable iff all of the values are hashable." ? (or somesuch)
New try: "A frozendict is a read-only mapping: a key cannot be added nor removed, and a key is always mapped to the same value. However, frozendict values can be mutable (not hashable). A frozendict is hashable and so immutable if and only if all values are hashable (immutable)."
* Register frozendict has a collections.abc.Mapping s/has/as/ ?
Oops, fixed.
If frozendict is used to harden Python (security purpose), it must be implemented in C. A type implemented in C is also faster.
You mention security purposes here, but this isn't mentioned in the Rationale or Use Cases
I added two use cases: security sandbox and cache.
Hope this is helpful
Yes, thanks. Victor
In http://mail.python.org/pipermail/python-dev/2012-February/117113.html Victor Stinner posted:
An immutable mapping can be implemented using frozendict::
class immutabledict(frozendict): def __new__(cls, *args, **kw): # ensure that all values are immutable for key, value in itertools.chain(args, kw.items()): if not isinstance(value, (int, float, complex, str, bytes)): hash(value) # frozendict ensures that all keys are immutable return frozendict.__new__(cls, *args, **kw)
What is the purpose of this? Is it just a hashable frozendict? If it is for security (as some other messages suggest), then I don't think it really helps. class Proxy: def __eq__(self, other): return self.value == other def __hash__(self): return hash(self.value) An instance of Proxy is hashable, and the hash is not object.hash, but it is still mutable. You're welcome to call that buggy, but a secure sandbox will have to deal with much worse. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ
An immutable mapping can be implemented using frozendict::
class immutabledict(frozendict): def __new__(cls, *args, **kw): # ensure that all values are immutable for key, value in itertools.chain(args, kw.items()): if not isinstance(value, (int, float, complex, str, bytes)): hash(value) # frozendict ensures that all keys are immutable return frozendict.__new__(cls, *args, **kw)
What is the purpose of this? Is it just a hashable frozendict?
It's an hashable frozendict or a "really frozen dict" or just "an immutable dict". It helps to detect errors earlier when you need a hashable frozendict. It is faster than hash(frozendict) because it avoids to hash known immutable types. If the recipe is confusion, it can be removed. Or it may be added to collections or somewhere else.
If it is for security (as some other messages suggest), then I don't think it really helps.
class Proxy: def __eq__(self, other): return self.value == other def __hash__(self): return hash(self.value)
An instance of Proxy is hashable, and the hash is not object.hash, but it is still mutable. You're welcome to call that buggy, but a secure sandbox will have to deal with much worse.
Your example looks to be incomplete: where does value come from? Is it supposed to be a read-only view of an object? Such Proxy class doesn't help to implement a sandbox because Proxy.value can be modified. I use closures to implement proxies in pysandbox. Dummy example: def createLengthProxy(secret): class Proxy: def __len__(self): return len(secret) return Proxy() Such proxy is not safe because it is possible to retrieve the secret: secret = "abc" value = createLengthProxy(secret).__len__.__closure__[0].cell_contents assert value is secret pysandbox implements other protections to block access to __closure__. Victor
Le 29/02/2012 19:21, Victor Stinner a écrit :
Rationale =========
(...) Use cases of frozendict: (...)
I updated the PEP to list use cases described in the other related mailing list thread. --- Use cases: * frozendict lookup can be done at compile time instead of runtime because the mapping is read-only. frozendict can be used instead of a preprocessor to remove conditional code at compilation, like code specific to a debug build. * hashable frozendict can be used as a key of a mapping or as a member of set. frozendict can be used to implement a cache. * frozendict avoids the need of a lock when the frozendict is shared by multiple threads or processes, especially hashable frozendict. It would also help to prohibe coroutines (generators + greenlets) to modify the global state. * frozendict helps to implement read-only object proxies for security modules. For example, it would be possible to use frozendict type for __builtins__ mapping or type.__dict__. This is possible because frozendict is compatible with the PyDict C API. * frozendict avoids the need of a read-only proxy in some cases. frozendict is faster than a proxy because getting an item in a frozendict is a fast lookup whereas a proxy requires a function call. * use a frozendict as the default value of function argument: avoid the problem of mutable default argument. --- Victor
On Sat, Mar 3, 2012 at 4:11 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Le 29/02/2012 19:21, Victor Stinner a écrit :
Rationale =========
(...) Use cases of frozendict: (...)
I updated the PEP to list use cases described in the other related mailing list thread. --- Use cases:
* frozendict lookup can be done at compile time instead of runtime because the mapping is read-only. frozendict can be used instead of a preprocessor to remove conditional code at compilation, like code specific to a debug build. * hashable frozendict can be used as a key of a mapping or as a member of set. frozendict can be used to implement a cache. * frozendict avoids the need of a lock when the frozendict is shared by multiple threads or processes, especially hashable frozendict. It would also help to prohibe coroutines (generators + greenlets) to modify the global state. * frozendict helps to implement read-only object proxies for security modules. For example, it would be possible to use frozendict type for __builtins__ mapping or type.__dict__. This is possible because frozendict is compatible with the PyDict C API. * frozendict avoids the need of a read-only proxy in some cases. frozendict is faster than a proxy because getting an item in a frozendict is a fast lookup whereas a proxy requires a function call. * use a frozendict as the default value of function argument: avoid the problem of mutable default argument.
Is your implementation (adapted to a standalone type) something you could put up on the cheeseshop? -eric
Is your implementation (adapted to a standalone type) something you could put up on the cheeseshop?
Short answer: no. My implementation (attached to the issue #14162) reuses most of private PyDict functins which are not exported and these functions have to be modified to accept a frozendict as input. One of the advantage of reusing PyDict functions is also to have a frozendict type compatible with the PyDict (public) API: PyDict_GetItem(), PyDict_SetItem(), etc. This property allows to do further changes like accepting a frozendict for __builtins__ or use freezing a type dict (use frozendict for type.__dict__). If you only want to a frozendict type, you can copy/paste PyDict code or implement it complelty differently. Or you can write a read-only proxy. Victor
participants (8)
-
David Malcolm
-
Eli Bendersky
-
Eric Snow
-
Guido van Rossum
-
Jim J. Jewett
-
Paul Moore
-
Victor Stinner
-
Victor Stinner