Add a frozendict builtin type
Rationale ========= A frozendict type is a common request from users and there are various implementations. There are two main Python implementations: * "blacklist": frozendict inheriting from dict and overriding methods to raise an exception when trying to modify the frozendict * "whitelist": frozendict not inheriting from dict and only implement some dict methods, or implement all dict methods but raise exceptions when trying to modify the frozendict The blacklist implementation has a major issue: it is still possible to call write methods of the dict class (e.g. dict.set(my_frozendict, key, value)). The whitelist implementation has an issue: frozendict and dict are not "compatible", dict is not a subclass of frozendict (and frozendict is not a subclass of dict). I propose to add a new frozendict builtin type and make dict type inherits from it. frozendict would not have methods to modify its content and values must be immutable. Constraints =========== * frozendict values must be immutable, as dict keys * frozendict can be used with the C API of the dict object (e.g. PyDict_GetItem) but write methods (e.g. PyDict_SetItem) would fail with a TypeError ("expect dict, got frozendict") * frozendict.__hash__() has to be determinist * frozendict has not the following methods: clear, __delitem__, pop, popitem, setdefault, __setitem__ and update. As tuple/frozenset has less methods than list/set. * issubclass(dict, frozendict) is True, whereas issubclass(frozendict, dict) is False Implementation ============== * Add an hash field to the PyDictObject structure * Make dict inherits from frozendict * frozendict values are checked for immutability property by calling their __hash__ method, with a fast-path for known immutable types (int, float, bytes, str, tuple, frozenset) * frozendict.__hash__ computes hash(frozenset(self.items())) and caches the result is its private hash attribute Attached patch is a work-in-progress implementation. TODO ==== * Add a frozendict abstract base class to collections? * frozendict may not overallocate dictionary buckets? -- Examples of frozendict implementations: http://bob.pythonmac.org/archives/2005/03/04/frozendict/ http://code.activestate.com/recipes/498072-implementing-an-immutable-diction... http://code.activestate.com/recipes/414283-frozen-dictionaries/ http://corebio.googlecode.com/svn/trunk/apidocs/corebio.utils.frozendict-cla... http://code.google.com/p/lingospot/source/browse/trunk/frozendict/frozendict... http://cmssdt.cern.ch/SDT/doxygen/CMSSW_4_4_2/doc/html/d6/d2f/classfrozendic... See also the recent discussion on python-list: http://mail.python.org/pipermail/python-list/2012-February/1287658.html -- See also the PEP 351. Victor
On 2012-02-27, at 19:53 , Victor Stinner wrote:
Rationale =========
A frozendict type is a common request from users and there are various implementations. There are two main Python implementations:
* "blacklist": frozendict inheriting from dict and overriding methods to raise an exception when trying to modify the frozendict * "whitelist": frozendict not inheriting from dict and only implement some dict methods, or implement all dict methods but raise exceptions when trying to modify the frozendict
The blacklist implementation has a major issue: it is still possible to call write methods of the dict class (e.g. dict.set(my_frozendict, key, value)).
The whitelist implementation has an issue: frozendict and dict are not "compatible", dict is not a subclass of frozendict (and frozendict is not a subclass of dict).
This may be an issue at the C level (I'm not sure), but since this would be a Python 3-only collection, "user" code (in Python) should/would generally be using abstract base classes, so type-checking would not be an issue (as in Python code performing `isinstance(a, dict)` checks naturally failing on `frozendict`) Plus `frozenset` does not inherit from `set`, it's a whitelist reimplementation and I've never known anybody to care. So there's that precedent. And of course there's no inheritance relationship between lists and tuples.
* frozendict has not the following methods: clear, __delitem__, pop, popitem, setdefault, __setitem__ and update. As tuple/frozenset has less methods than list/set.
It'd probably be simpler to define that frozendict is a Mapping (where dict is a MutableMapping). And that's clearer.
* Make dict inherits from frozendict
Isn't that the other way around from the statement above? Not that I'd have an issue with it, it's much cleaner, but there's little gained by doing so since `isinstance(a, dict)` will still fail if `a` is a frozendict.
* Add a frozendict abstract base class to collections?
Why? There's no `dict` ABC, and there are already a Mapping and a MutableMapping ABC which fit the bill no?
This may be an issue at the C level (I'm not sure), but since this would be a Python 3-only collection, "user" code (in Python) should/would generally be using abstract base classes, so type-checking would not be an issue (as in Python code performing `isinstance(a, dict)` checks naturally failing on `frozendict`)
Plus `frozenset` does not inherit from `set`, it's a whitelist reimplementation and I've never known anybody to care. So there's that precedent. And of course there's no inheritance relationship between lists and tuples.
At a second thought, I realized that it does not really matter. frozendict and dict can be "unrelated" (no inherance relation). Victor
In http://mail.python.org/pipermail/python-dev/2012-February/116955.html Victor Stinner proposed:
The blacklist implementation has a major issue: it is still possible to call write methods of the dict class (e.g. dict.set(my_frozendict, key, value)).
It is also possible to use ctypes and violate even more invariants. For most purposes, this falls under "consenting adults".
The whitelist implementation has an issue: frozendict and dict are not "compatible", dict is not a subclass of frozendict (and frozendict is not a subclass of dict).
And because of Liskov substitutability, they shouldn't be; they should be sibling children of a basedict that doesn't have the the mutating methods, but also doesn't *promise* not to mutate.
* frozendict values must be immutable, as dict keys
Why? That may be useful, but an immutable dict whose values might mutate is also useful; by forcing that choice, it starts to feel too specialized for a builtin.
* Add an hash field to the PyDictObject structure
That is another indication that it should really be a sibling class; most of the uses I have had for immutable dicts still didn't need hashing. It might be a worth adding anyhow, but only to immutable dicts -- not to every instance dict or keywords parameter.
* frozendict.__hash__ computes hash(frozenset(self.items())) and caches the result is its private hash attribute
Why? hash(frozenset(selk.keys())) would still meet the hash contract, but it would be approximately twice as fast, and I can think of only one case where it wouldn't work just as well. (That case is wanting to store a dict of alternative configuration dicts (with no defaulting of values), but ALSO wanting to use the configurations themselves (as opposed to their names) as the dict keys.) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ
The blacklist implementation has a major issue: it is still possible to call write methods of the dict class (e.g. dict.set(my_frozendict, key, value)).
It is also possible to use ctypes and violate even more invariants. For most purposes, this falls under "consenting adults".
My primary usage of frozendict would be pysandbox, a security module. Attackers are not consenting adults :-) Read-only dict would also help optimization, in the CPython peephole or the PyPy JIT. In pysandbox, I'm trying to replace __builtins_ and (maybe also type.__dict__) by a frozendict. These objects rely on PyDict API and so expect a type "compatible" with dict. But PyDict_GetItem() and PyDict_SetItem() may use a test like isinstance(obj, (dict, frozendict)), especially if the C strucure is "compatible". But pysandbox should not drive the design of frozendict :-)
The whitelist implementation has an issue: frozendict and dict are not "compatible", dict is not a subclass of frozendict (and frozendict is not a subclass of dict).
And because of Liskov substitutability, they shouldn't be; they should be sibling children of a basedict that doesn't have the the mutating methods, but also doesn't *promise* not to mutate.
As I wrote, I realized that it doesn't matter if dict doesn't inherit from frozendict.
* frozendict values must be immutable, as dict keys
Why? That may be useful, but an immutable dict whose values might mutate is also useful; by forcing that choice, it starts to feel too specialized for a builtin.
If values are mutables, the frozendict cannot be called "immutable". tuple and frozenset can only contain immutables values. All implementations of frozendict that I found expect frozendict to be hashable.
* frozendict.__hash__ computes hash(frozenset(self.items())) and caches the result is its private hash attribute
Why? hash(frozenset(selk.keys())) would still meet the hash contract, but it would be approximately twice as fast, and I can think of only one case where it wouldn't work just as well.
Yes, it would faster but the hash is usually the hash of the whole object content. E.g. the hash of a tuple is not the hash of items with odd index, whereas such hash function would also meet the "hash contract". All implementations of frozendict that I found all use items, and not only values or only keys. Victor
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/27/2012 06:34 PM, Victor Stinner wrote:
tuple and frozenset can only contain immutables values.
Tuples can contain mutables:: $ python Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
({},) ({},) $ python3 Python 3.2 (r32:88445, Mar 10 2011, 10:08:58) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. ({},) ({},)
Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9MFOAACgkQ+gerLs4ltQ5mjQCgi1U7CloZUy0u0+c0mlLlIuko +IIAoLqKGcAb6ZAEY5wpkwvtgRa6S+LV =7Mh5 -----END PGP SIGNATURE-----
On Tue, Feb 28, 2012 at 9:34 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
The blacklist implementation has a major issue: it is still possible to call write methods of the dict class (e.g. dict.set(my_frozendict, key, value)).
It is also possible to use ctypes and violate even more invariants. For most purposes, this falls under "consenting adults".
My primary usage of frozendict would be pysandbox, a security module. Attackers are not consenting adults :-)
Read-only dict would also help optimization, in the CPython peephole or the PyPy JIT.
I'm pretty sure the PyPy jit can already pick up and optimise cases where a dict goes "read-only" (i.e. stops being modified). I think you need to elaborate on your use cases further, and explain what *additional* changes would be needed, such as allowing frozendict instances as __dict__ attributes in order to create truly immutable objects in pure Python code. In fact, that may be a better way to pitch the entire PEP. In current Python, you *can't* create a truly immutable object without dropping down to a C extension:
from decimal import Decimal x = Decimal(1) x Decimal('1') hash(x) 1 x._exp = 10 x Decimal('1E+10') hash(x) 10000000000
Contrast that with the behaviour of a float instance:
1.0.imag = 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: attribute 'imag' of 'float' objects is not writable
Yes, it's arguably covered by the "consenting adults" rule, but really, Decimal instances should be just as immutable as int and float instances. The only reason they aren't is that it's hard enough to set it up in Python code that the Decimal implementation settles for "near enough is good enough" and just uses __slots__ to prevent addition of new attributes, but doesn't introduce the overhead of custom __setattr__ and __delattr__ implementations to actively *prevent* modifications. We don't even need a new container type, we really just need an easy way to tell the __setattr__ and __delattr__ descriptors for "__slots__" that the instance initialisation is complete and further modifications should be disallowed. For example, if Decimal.__new__ could call "self.__lock_slots__()" at the end to set a flag on the instance object, then the slot descriptors could read that new flag and trigger an error:
x._exp = 10 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: attribute '_exp' of 'Decimal' objects is not writable
To be clear, all of this is currently *possible* if you use custom descriptors (such as a property() implementation where setattr and delattr look for such a flag) or override __setattr__/__delattr__. However, for a micro-optimised type like Decimal, that's a hard choice to be asked to make (and the current implementation came down on the side of speed over enforcing correctness). Given that using __slots__ in the first place is, in and of itself, a micro-optimisation, I suspect Decimal is far from the only "immutable" type implemented in pure Python that finds itself having to make that trade-off. (An extra boolean check in C is a *good* trade-off of speed for correctness. Python level descriptor implementations or attribute access overrides, on the other hand... not so much). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan <ncoghlan <at> gmail.com> writes:
I'm pretty sure the PyPy jit can already pick up and optimise cases where a dict goes "read-only" (i.e. stops being modified).
No, it doesn't. We handle cases like a type's dict, or a module's dict, by having them use a different internal implementation (while, of course, still being dicts at the Python level). We do *not* handle the case of trying to figure out whether a Python object is immutable in any way. Alex
I think you need to elaborate on your use cases further, ...
A frozendict can be used as a member of a set or as a key in a dictionary. For example, frozendict is indirectly needed when you want to use an object as a key of a dict, whereas one attribute of this object is a dict. Use a frozendict instead of a dict for this attribute answers to this problem. frozendict helps also in threading and multiprocessing. --
... and explain what *additional* changes would be needed, such as allowing frozendict instances as __dict__ attributes in order to create truly immutable objects in pure Python code. In current Python, you *can't* create a truly immutable object without dropping down to a C extension:
Using frozendict in for type dictionary might be a use case, but please don't focus on this example. There is currently a discussion on python-ideas about this specific use case. I first proposed to use frozendict in type.__new__, but then I proposed something completly different: add a flag to a set to deny any modification of the type. The flag may be set using "__final__ = True" in the class body for example. Victor
On Tue, 28 Feb 2012 12:45:54 +0100 Victor Stinner <victor.stinner@haypocalc.com> wrote:
I think you need to elaborate on your use cases further, ...
A frozendict can be used as a member of a set or as a key in a dictionary.
For example, frozendict is indirectly needed when you want to use an object as a key of a dict, whereas one attribute of this object is a dict.
It isn't. You just have to define __hash__ correctly.
frozendict helps also in threading and multiprocessing.
How so? Regards Antoine.
Antoine Pitrou wrote:
On Tue, 28 Feb 2012 12:45:54 +0100 Victor Stinner <victor.stinner@haypocalc.com> wrote:
I think you need to elaborate on your use cases further, ... A frozendict can be used as a member of a set or as a key in a dictionary.
For example, frozendict is indirectly needed when you want to use an object as a key of a dict, whereas one attribute of this object is a dict.
It isn't. You just have to define __hash__ correctly.
frozendict helps also in threading and multiprocessing.
How so?
Inter process/task communication requires copying. Inter/intra thread communication uses reference semantics. To ensure these are the same, the objects used in communication must be immutable. Cheers, Mark.
On Tue, 28 Feb 2012 12:07:32 +0000 Mark Shannon <mark@hotpy.org> wrote:
Antoine Pitrou wrote:
On Tue, 28 Feb 2012 12:45:54 +0100 Victor Stinner <victor.stinner@haypocalc.com> wrote:
I think you need to elaborate on your use cases further, ... A frozendict can be used as a member of a set or as a key in a dictionary.
For example, frozendict is indirectly needed when you want to use an object as a key of a dict, whereas one attribute of this object is a dict.
It isn't. You just have to define __hash__ correctly.
frozendict helps also in threading and multiprocessing.
How so?
Inter process/task communication requires copying. Inter/intra thread communication uses reference semantics. To ensure these are the same, the objects used in communication must be immutable.
You just need them to be practically constant. No need for an immutable type in the first place. Regards Antoine.
On 28 February 2012 12:07, Mark Shannon <mark@hotpy.org> wrote:
frozendict helps also in threading and multiprocessing.
How so?
Inter process/task communication requires copying. Inter/intra thread communication uses reference semantics. To ensure these are the same, the objects used in communication must be immutable.
Does that imply that in a frozendict, the *values* as well as the *keys* must be immutable? Isn't that a pretty strong limitation (and hence, does it not make frozendicts a lot less useful than they might otherwise be)?
A frozendict can be used as a member of a set or as a key in a dictionary.
For example, frozendict is indirectly needed when you want to use an object as a key of a dict, whereas one attribute of this object is a dict.
It isn't. You just have to define __hash__ correctly.
Define __hash__ on a mutable object can be surprising. Or do you mean that you deny somehow the modification of the dict attribute, and convert the dict to a immutable object before hashing it?
frozendict helps also in threading and multiprocessing.
How so?
For example, you don't need a lock to read the frozendict content, because you cannot modify the content. Victor
Hi, I don't know if an implementation of the frozendict actually exists, but if anyone is planning on writing one then can I suggest that they take a look at my new dict implementation: http://bugs.python.org/issue13903 https://bitbucket.org/markshannon/cpython_new_dict/ Making dicts immutable (at the C level) is quite easy with my new implementation. Cheers, Mark.
Victor Stinner wrote:
The blacklist implementation has a major issue: it is still possible to call write methods of the dict class (e.g. dict.set(my_frozendict, key, value)). It is also possible to use ctypes and violate even more invariants. For most purposes, this falls under "consenting adults".
My primary usage of frozendict would be pysandbox, a security module. Attackers are not consenting adults :-)
Read-only dict would also help optimization, in the CPython peephole or the PyPy JIT.
Not w.r.t. PyPy. It wouldn't do any harm though. One use of frozendict that you haven't mentioned so far is communication between concurrent processes/tasks. These need to be able to copy objects without changing reference semantics, which demands immutability. Cheers, Mark.
* frozendict values must be immutable, as dict keys
Why? That may be useful, but an immutable dict whose values might mutate is also useful; by forcing that choice, it starts to feel too specialized for a builtin.
Hum, I realized that calling hash(my_frozendict) on a frozendict instance is enough to check if a frozendict only contains immutable objects. And it is also possible to check manually that values are immutable *before* creating the frozendict. I also prefer to not check for immutability because it does simplify the code :-) $ diffstat frozendict-3.patch Include/dictobject.h | 9 + Lib/collections/abc.py | 1 Lib/test/test_dict.py | 59 +++++++++++ Objects/dictobject.c | 256 ++++++++++++++++++++++++++++++++++++++++++------- Objects/object.c | 3 Python/bltinmodule.c | 1 6 files changed, 295 insertions(+), 34 deletions(-) The patch is quite small to add a new builtin type. That's because most of the code is shared with the builtin dict type. (But the patch doesn't include the documentation, it didn't write it yet.) Victor
Victor Stinner wrote:
* frozendict values must be immutable, as dict keys Why? That may be useful, but an immutable dict whose values might mutate is also useful; by forcing that choice, it starts to feel too specialized for a builtin.
Hum, I realized that calling hash(my_frozendict) on a frozendict instance is enough to check if a frozendict only contains immutable objects. And it is also possible to check manually that values are immutable *before* creating the frozendict.
I also prefer to not check for immutability because it does simplify the code :-)
$ diffstat frozendict-3.patch Include/dictobject.h | 9 + Lib/collections/abc.py | 1 Lib/test/test_dict.py | 59 +++++++++++ Objects/dictobject.c | 256 ++++++++++++++++++++++++++++++++++++++++++------- Objects/object.c | 3 Python/bltinmodule.c | 1 6 files changed, 295 insertions(+), 34 deletions(-)
The patch is quite small to add a new builtin type. That's because most of the code is shared with the builtin dict type. (But the patch doesn't include the documentation, it didn't write it yet.)
Could you create an issue for this on the tracker, maybe write a PEP. I don't think sending patches to this mailing list is the way to do this. Would you mind taking a look at how your code interacts with PEP 412. Cheers, Mark.
On Mon, Feb 27, 2012 at 19:53, Victor Stinner <victor.stinner@haypocalc.com> wrote:
A frozendict type is a common request from users and there are various implementations. There are two main Python implementations:
Perhaps this should also detail why namedtuple is not a viable alternative. Cheers, Dirkjan
A frozendict type is a common request from users and there are various
implementations. There are two main Python implementations:
Perhaps this should also detail why namedtuple is not a viable alternative.
It doesn't have the same API. Example: frozendict[key] vs namedtuple.attr (namedtuple.key). namedtuple has no .keys() or .items() method. Victor
Updated patch and more justifications. New patch: - dict doesn't inherit from frozendict anymore - frozendict is a subclass of collections.abc.Mutable - more tests
* frozendict.__hash__ computes hash(frozenset(self.items())) and caches the result is its private hash attribute
hash(frozenset(self.items())) is preferred over hash(sorted(self.items())) because keys and values may be unorderable. frozenset() is faster than sorted(): O(n) vs O(n*log(n)). frozendict hash doesn't care of the item order creation:
a=frozendict.fromkeys('ai') a frozendict({'a': None, 'i': None}) b=frozendict.fromkeys('ia') b frozendict({'i': None, 'a': None}) hash(a) == hash(b) True a == b True tuple(a.items()) == tuple(b.items()) False
frozendict supports unorderable keys and values:
hash(frozendict({b'abc': 1, 'abc': 2})) 935669091 hash(frozendict({1: b'abc', 2: 'abc'})) 1319859033
* Add a frozendict abstract base class to collections?
I realized that Mapping already exists and so the following patch is enough: +Mapping.register(frozendict)
See also the PEP 351.
I read the PEP and the email explaining why it was rejected. Just to be clear: the PEP 351 tries to freeze an object, try to convert a mutable or immutable object to an immutable object. Whereas my frozendict proposition doesn't convert anything: it just raises a TypeError if you use a mutable key or value. For example, frozendict({'list': ['a', 'b', 'c']}) doesn't create frozendict({'list': ('a', 'b', 'c')}) but raises a TypeError. Victor
Victor Stinner wrote:
See also the PEP 351.
I read the PEP and the email explaining why it was rejected.
Just to be clear: the PEP 351 tries to freeze an object, try to convert a mutable or immutable object to an immutable object. Whereas my frozendict proposition doesn't convert anything: it just raises a TypeError if you use a mutable key or value.
For example, frozendict({'list': ['a', 'b', 'c']}) doesn't create frozendict({'list': ('a', 'b', 'c')}) but raises a TypeError.
I fail to see the use case you're trying to address with this kind of frozendict(). The purpose of frozenset() is to be able to use a set as dictionary key (and to some extent allow for optimizations and safe iteration). Your implementation can be used as dictionary key as well, but why would you want to do that in the first place ? If you're thinking about disallowing changes to the dictionary structure, e.g. in order to safely iterate over its keys or items, "freezing" the keys is enough. Requiring the value objects not to change is too much of a restriction to make the type useful in practice, IMHO. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 28 2012)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2012-02-13: Released eGenix pyOpenSSL 0.13 http://egenix.com/go26 2012-02-09: Released mxODBC.Zope.DA 2.0.2 http://egenix.com/go25 2012-02-06: Released eGenix mx Base 3.2.3 http://egenix.com/go24 ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
M.-A. Lemburg wrote:
Victor Stinner wrote:
See also the PEP 351. I read the PEP and the email explaining why it was rejected.
Just to be clear: the PEP 351 tries to freeze an object, try to convert a mutable or immutable object to an immutable object. Whereas my frozendict proposition doesn't convert anything: it just raises a TypeError if you use a mutable key or value.
For example, frozendict({'list': ['a', 'b', 'c']}) doesn't create frozendict({'list': ('a', 'b', 'c')}) but raises a TypeError.
I fail to see the use case you're trying to address with this kind of frozendict().
The purpose of frozenset() is to be able to use a set as dictionary key (and to some extent allow for optimizations and safe iteration). Your implementation can be used as dictionary key as well, but why would you want to do that in the first place ?
Because you have a mapping, and want to use a dict for speedy, convenient lookups. Sometimes your mapping involves the key being a string, or an int, or a tuple, or a set, and Python makes it easy to use that in a dict. Sometimes the key is itself a mapping, and Python makes it very difficult. Just google on "python frozendict" or "python immutabledict" and you will find that this keeps coming up time and time again, e.g.: http://www.cs.toronto.edu/~tijmen/programming/immutableDictionaries.html http://code.activestate.com/recipes/498072-implementing-an-immutable-diction... http://code.activestate.com/recipes/414283-frozen-dictionaries/ http://bob.pythonmac.org/archives/2005/03/04/frozendict/ http://python.6.n6.nabble.com/frozendict-td4377791.html http://www.velocityreviews.com/forums/t648910-does-python3-offer-a-frozendic... http://stackoverflow.com/questions/2703599/what-would-be-a-frozen-dict
If you're thinking about disallowing changes to the dictionary structure, e.g. in order to safely iterate over its keys or items, "freezing" the keys is enough.
Requiring the value objects not to change is too much of a restriction to make the type useful in practice, IMHO.
It's no more of a limitation than the limitation that strings can't change. Frozendicts must freeze the value as well as the key. Consider the toy example, mapping food combinations to calories: d = { {appetizer => fried fish, main => double burger, drink => cola}: 5000, {appetizer => None, main => green salad, drink => tea}: 200, } (syntax is only for illustration purposes) Clearly the hash has to take the keys and values into account, which means that both the keys and values have to be frozen. (Values may be mutable objects, but then the frozendict can't be hashed -- just like tuples can't be hashed if any item in them is mutable.) -- Steven
Steven D'Aprano wrote:
M.-A. Lemburg wrote:
Victor Stinner wrote:
See also the PEP 351. I read the PEP and the email explaining why it was rejected.
Just to be clear: the PEP 351 tries to freeze an object, try to convert a mutable or immutable object to an immutable object. Whereas my frozendict proposition doesn't convert anything: it just raises a TypeError if you use a mutable key or value.
For example, frozendict({'list': ['a', 'b', 'c']}) doesn't create frozendict({'list': ('a', 'b', 'c')}) but raises a TypeError.
I fail to see the use case you're trying to address with this kind of frozendict().
The purpose of frozenset() is to be able to use a set as dictionary key (and to some extent allow for optimizations and safe iteration). Your implementation can be used as dictionary key as well, but why would you want to do that in the first place ?
Because you have a mapping, and want to use a dict for speedy, convenient lookups. Sometimes your mapping involves the key being a string, or an int, or a tuple, or a set, and Python makes it easy to use that in a dict. Sometimes the key is itself a mapping, and Python makes it very difficult.
Just google on "python frozendict" or "python immutabledict" and you will find that this keeps coming up time and time again, e.g.:
http://www.cs.toronto.edu/~tijmen/programming/immutableDictionaries.html http://code.activestate.com/recipes/498072-implementing-an-immutable-diction... http://code.activestate.com/recipes/414283-frozen-dictionaries/ http://bob.pythonmac.org/archives/2005/03/04/frozendict/ http://python.6.n6.nabble.com/frozendict-td4377791.html http://www.velocityreviews.com/forums/t648910-does-python3-offer-a-frozendic... http://stackoverflow.com/questions/2703599/what-would-be-a-frozen-dict
Only the first of those links appears to actually discuss reasons for adding a frozendict, but it fails to provide real world use cases and only gives theoretical reasons for why this would be nice to have.
From a practical view, a frozendict would allow thread-safe iteration over a dict and enable more optimizations (e.g. using an optimized lookup function, optimized hash parameters, etc.) to make lookup in static tables more efficient.
OTOH, using a frozendict as key in some other dictionary is, well, not a very realistic use case - programmers should think twice before using such a design :-)
If you're thinking about disallowing changes to the dictionary structure, e.g. in order to safely iterate over its keys or items, "freezing" the keys is enough.
Requiring the value objects not to change is too much of a restriction to make the type useful in practice, IMHO.
It's no more of a limitation than the limitation that strings can't change.
Frozendicts must freeze the value as well as the key. Consider the toy example, mapping food combinations to calories:
d = { {appetizer => fried fish, main => double burger, drink => cola}: 5000, {appetizer => None, main => green salad, drink => tea}: 200, }
(syntax is only for illustration purposes)
Clearly the hash has to take the keys and values into account, which means that both the keys and values have to be frozen.
(Values may be mutable objects, but then the frozendict can't be hashed -- just like tuples can't be hashed if any item in them is mutable.)
Right, but that doesn't mean you have to require that values are hashable. A frozendict could (and probably should) use the same logic as tuples: if the values are hashable, the frozendict is hashable, otherwise not. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 28 2012)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2012-02-13: Released eGenix pyOpenSSL 0.13 http://egenix.com/go26 2012-02-09: Released mxODBC.Zope.DA 2.0.2 http://egenix.com/go25 2012-02-06: Released eGenix mx Base 3.2.3 http://egenix.com/go24 ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Feb 27, 2012, at 10:53 AM, Victor Stinner wrote:
A frozendict type is a common request from users and there are various implementations.
ISTM, this request is never from someone who has a use case. Instead, it almost always comes from "completers", people who see that we have a frozenset type and think the core devs missed the ObviousThingToDo(tm). Frozendicts are trivial to implement, so that is why there are various implementations (i.e. the implementations are more fun to write than they are to use). The frozenset type covers a niche case that is nice-to-have but *rarely* used. Many experienced Python users simply forget that we have a frozenset type. We don't get bug reports or feature requests about the type. When I do Python consulting work, I never see it in a client's codebase. It does occasionally get discussed in questions on StackOverflow but rarely gets offered as an answer (typically on variants of the "how do you make a set-of-sets" question). If Google's codesearch were still alive, we could add another datapoint showing how infrequently this type is used. I wrote the C implementation for frozensets and the tests that demonstrate their use in problems involving sets-of-sets, yet I have *needed* the frozenset once in my career (for a NFA/DFA conversion algorithm). From this experience, I conclude that adding a frozendict type would be a total waste (except that it would inspire more people to request frozen variante of other containers). Raymond P.S. The one advantage I can see for frozensets and frozendicts is that we have an opportunity to optimize them once they are built (optimizing insertion order to minimize collisions, increasing or decreasing density, eliminating dummy entries, etc). That being said, the same could be accomplished for regular sets and dicts by the addition of an optimize() method. I'm not really enamoured of that idea though because it breaks the abstraction and because people don't seem to need it (i.e. it has never been requested).
The frozenset type covers a niche case that is nice-to-have but *rarely* used. Many experienced Python users simply forget that we have a frozenset type. We don't get bug reports or feature requests about the type. When I do Python consulting work, I never see it in a client's codebase. It does occasionally get discussed in questions on StackOverflow but rarely gets offered as an answer (typically on variants of the "how do you make a set-of-sets" question). If Google's codesearch were still alive, we could add another datapoint showing how infrequently this type is used.
<snip>
From a cursory look: quite a bit of the found results are from the various Python implementations, and there is some duplication of projects, but it would be unfair to conclude that frozenset is not being used since many of
There are some alternatives to code.google.com, though. For example: http://www.koders.com/default.aspx?s=frozenset&submit=Search&la=Python&li=* the results do look legitimate. This is not to argue in favor or against frozendict, just stating that there's still a way to search code online :) Eli
On 29 February 2012 19:17, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
From this experience, I conclude that adding a frozendict type would be a total waste (except that it would inspire more people to request frozen variante of other containers).
It would (apparently) help Victor to fix issues in his pysandbox project. I don't know if a secure Python sandbox is an important enough concept to warrant core changes to make it possible. However, if Victor was saying that implementing this PEP was all that is needed to implement a secure sandbox, then that would be a very different claim, and likely much more compelling (to some, at least - I have no personal need for a secure sandbox). Victor quotes 6 implementations. I don't see any rationale (either in the email that started this thread, or in the PEP) to explain why these aren't good enough, and in particular why the implementation has to be in the core. There's the hint in the PEP "If frozendict is used to harden Python (security purpose), it must be implemented in C". But why in the core (as opposed to an extension)? And why and how would frozendict help in hardening Python? As it stands, I don't find the PEP compelling. The hardening use case might be significant but Victor needs to spell it out if it's to make a difference. Paul.
On Thu, Mar 1, 2012 at 7:08 AM, Paul Moore <p.f.moore@gmail.com> wrote:
As it stands, I don't find the PEP compelling. The hardening use case might be significant but Victor needs to spell it out if it's to make a difference.
+1 Avoiding-usenet-nod-syndrome'ly, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Mar 1, 2012 at 8:08 AM, Paul Moore <p.f.moore@gmail.com> wrote:
It would (apparently) help Victor to fix issues in his pysandbox project. I don't know if a secure Python sandbox is an important enough concept to warrant core changes to make it possible.
If a secure Python sandbox had been available last year, we would probably be still using Python at work for end-user scripting, instead of having had to switch to Javascript. At least, that would be the case if this sandbox is what I think it is (we embed a scripting language in our C++ main engine, and allow end users to customize and partly drive our code). But features enabling that needn't be core; I wouldn't object to having to get some third-party add-ons to make it all work. Chris Angelico
On Thu, 01 Mar 2012 10:13:01 +1100, Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Mar 1, 2012 at 8:08 AM, Paul Moore <p.f.moore@gmail.com> wrote:
It would (apparently) help Victor to fix issues in his pysandbox project. I don't know if a secure Python sandbox is an important enough concept to warrant core changes to make it possible.
If a secure Python sandbox had been available last year, we would probably be still using Python at work for end-user scripting, instead of having had to switch to Javascript. At least, that would be the case if this sandbox is what I think it is (we embed a scripting language in our C++ main engine, and allow end users to customize and partly drive our code). But features enabling that needn't be core; I wouldn't object to having to get some third-party add-ons to make it all work.
I likewise am aware of a project where the availability of sandboxing might be make-or-break for continuing to use Python. In this case the idea would be sandboxing plugins called from a Python main program. I *think* that Victor's project would enable that, but I haven't looked at it closely. --David
On Feb 29, 2012, at 1:08 PM, Paul Moore wrote:
As it stands, I don't find the PEP compelling. The hardening use case might be significant but Victor needs to spell it out if it's to make a difference.
If his sandboxing project needs it, the type need not be public. It can join dictproxy and structseq in our toolkit of internal types. Adding frozendict() as a new public type is unnecessary and undesirable -- a proliferation of types makes it harder to decide which tool is the most appropriate for a given problem. The itertools module ran into the issue early. Adding a new itertool tends to make the whole module harder to figure-out. Raymond P.S ISTM that lately Python is growing fatter without growing more powerful or expressive. Generators, context managers, and decorators were honking good ideas -- we need more of those rather than minor variations on things we already have. Plz forgive the typos -- I'm typing with one hand -- the other is holding a squiggling baby :-)
It would (apparently) help Victor to fix issues in his pysandbox project. I don't know if a secure Python sandbox is an important enough concept to warrant core changes to make it possible.
Ok, let's talk about sandboxing and security. The main idea of pysandbox is to reuse most of CPython but hide "dangerous" functions and run untrusted code in a separated namespace. The problem is to create the sandbox and ensure that it is not possible to escape from this sandbox. pysandbox is still a proof-of-concept, even if it works pretty well for short dummy scripts. But pysandbox is not ready for real world programs. pysandbox uses various "tricks" and "hacks" to create a sandbox. But there is a major issue: the __builtins__ dict (or module) is available and used everywhere (in module globals, in frames, in functions globals, etc.), and can be modified. A read-only __builtins__ dict is required to protect the sandbox. If the untrusted can modify __builtins__, it can replace core functions like isinstance(), len(), ... and so modify code outside the sandbox. To implement a frozendict in Python, pysandbox uses the blacklist approach: a class inheriting from dict and override some methods to raise an error. The whitelist approach cannot be used for a type implemented in Python, because the __builtins__ type must inherit from dict: ceval.c expects a type compatible with PyDict_GetItem and PyDict_SetItem. Problem: if you implement a frozendict type inheriting from dict in Python, it is still possible to call dict methods (e.g. dict.__setitem__()). To fix this issue, pysandbox removes all dict methods modifying the dict: __setitem__, __delitem__, pop, etc. This is a problem because untrusted code cannot use these methods on valid dict created in the sandbox.
However, if Victor was saying that implementing this PEP was all that is needed to implement a secure sandbox, then that would be a very different claim, and likely much more compelling (to some, at least - I have no personal need for a secure sandbox).
A builtin frozendict type "compatible" with the PyDict C API is very convinient for pysandbox because using this type for core features like builtins requires very few modification. For example, use frozendict for __builtins__ only requires to modify 3 lines in frameobject.c. I don't see how to solve the pysandbox issue (read-only __builtins__ issue, need to remove dict.__setitem__ & friends) without modifying CPython (so without adding a frozendict type).
As it stands, I don't find the PEP compelling. The hardening use case might be significant but Victor needs to spell it out if it's to make a difference.
I don't know if hardening Python is a compelling argument to add a new builtin type. Victor
On Feb 29, 2012, at 3:52 PM, Victor Stinner wrote:
I don't know if hardening Python is a compelling argument to add a new builtin type.
It isn't. Builtins are for general purpose use. It is not something most people should use; however, if it is a builtin, people will be drawn to frozendicts like moths to a flame. The tuple-as-frozenlist anti-pattern shows what we're up against. Another thought: if pypy is successful at providing sandboxing, the need for sandboxing in CPython is substantially abated. Raymond
Raymond Hettinger wrote:
On Feb 29, 2012, at 3:52 PM, Victor Stinner wrote:
I don't know if hardening Python is a compelling argument to add a new builtin type.
It isn't.
Builtins are for general purpose use. It is not something most people should use; however, if it is a builtin, people will be drawn to frozendicts like moths to a flame. The tuple-as-frozenlist anti-pattern shows what we're up against.
Perhaps I'm a little slow today, but I don't get this. Could you elaborate on tuple-as-frozenlist anti-pattern please? i.e. what it is, why it is an anti-pattern, and examples of it in real life? -- Steven
On Wed, Feb 29, 2012 at 3:52 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:
It would (apparently) help Victor to fix issues in his pysandbox project. I don't know if a secure Python sandbox is an important enough concept to warrant core changes to make it possible.
Ok, let's talk about sandboxing and security.
The main idea of pysandbox is to reuse most of CPython but hide "dangerous" functions and run untrusted code in a separated namespace. The problem is to create the sandbox and ensure that it is not possible to escape from this sandbox. pysandbox is still a proof-of-concept, even if it works pretty well for short dummy scripts. But pysandbox is not ready for real world programs.
I hope you have studied (recent) history. Sandboxes in Python traditionally have not been secure. Read the archives for details.
pysandbox uses various "tricks" and "hacks" to create a sandbox. But there is a major issue: the __builtins__ dict (or module) is available and used everywhere (in module globals, in frames, in functions globals, etc.), and can be modified. A read-only __builtins__ dict is required to protect the sandbox. If the untrusted can modify __builtins__, it can replace core functions like isinstance(), len(), ... and so modify code outside the sandbox.
To implement a frozendict in Python, pysandbox uses the blacklist approach: a class inheriting from dict and override some methods to raise an error. The whitelist approach cannot be used for a type implemented in Python, because the __builtins__ type must inherit from dict: ceval.c expects a type compatible with PyDict_GetItem and PyDict_SetItem.
Problem: if you implement a frozendict type inheriting from dict in Python, it is still possible to call dict methods (e.g. dict.__setitem__()). To fix this issue, pysandbox removes all dict methods modifying the dict: __setitem__, __delitem__, pop, etc. This is a problem because untrusted code cannot use these methods on valid dict created in the sandbox.
However, if Victor was saying that implementing this PEP was all that is needed to implement a secure sandbox, then that would be a very different claim, and likely much more compelling (to some, at least - I have no personal need for a secure sandbox).
A builtin frozendict type "compatible" with the PyDict C API is very convinient for pysandbox because using this type for core features like builtins requires very few modification. For example, use frozendict for __builtins__ only requires to modify 3 lines in frameobject.c.
I don't see how to solve the pysandbox issue (read-only __builtins__ issue, need to remove dict.__setitem__ & friends) without modifying CPython (so without adding a frozendict type).
As it stands, I don't find the PEP compelling. The hardening use case might be significant but Victor needs to spell it out if it's to make a difference.
I don't know if hardening Python is a compelling argument to add a new builtin type.
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
The main idea of pysandbox is to reuse most of CPython but hide "dangerous" functions and run untrusted code in a separated namespace. The problem is to create the sandbox and ensure that it is not possible to escape from this sandbox. pysandbox is still a proof-of-concept, even if it works pretty well for short dummy scripts. But pysandbox is not ready for real world programs.
I hope you have studied (recent) history. Sandboxes in Python traditionally have not been secure. Read the archives for details.
The design of pysandbox makes it difficult to implement. It is mostly based on blacklist, so any omission would lead to a vulnerability. I read the recent history of sandboxes and see other security modules for Python, and I don't understand your reference to "Sandboxes in Python traditionally have not been secure." There is no known vulnerability in pysandbox, did I miss something? (there is only a limitation on the dict API because of the lack of frozendict.) Are you talking about rexec/Bastion? (which cannot be qualified as "recent" :-)) pysandbox limitations are documented in its README file: << pysandbox is a sandbox for the Python namespace, not a sandbox between Python and the operating system. It doesn't protect your system against Python security vulnerabilities: vulnerabilities in modules/functions available in your sandbox (depend on your sandbox configuration). By default, only few functions are exposed to the sandbox namespace which limits the attack surface. pysandbox is unable to limit the memory of the sandbox process: you have to use your own protection. >> Hum, I am also not sure that pysandbox "works" with threads :-) I mean that enabling pysandbox impacts all running threads, not only one thread, which can cause issues. It should also be mentioned. PyPy sandbox has a different design: it uses a process with no priviledge, all syscalls are redirected to another process which apply security checks to each syscall. http://doc.pypy.org/en/latest/sandbox.html See also the seccomp-nurse project, a generic sandbox using Linux SECCOMP: http://chdir.org/~nico/seccomp-nurse/ See also pysandbox README for a list of other Python security modules. Victor
On Thu, Mar 1, 2012 at 2:01 AM, Victor Stinner <victor.stinner@haypocalc.com> wrote:
The main idea of pysandbox is to reuse most of CPython but hide "dangerous" functions and run untrusted code in a separated namespace. The problem is to create the sandbox and ensure that it is not possible to escape from this sandbox. pysandbox is still a proof-of-concept, even if it works pretty well for short dummy scripts. But pysandbox is not ready for real world programs.
I hope you have studied (recent) history. Sandboxes in Python traditionally have not been secure. Read the archives for details.
The design of pysandbox makes it difficult to implement. It is mostly based on blacklist, so any omission would lead to a vulnerability. I read the recent history of sandboxes and see other security modules for Python, and I don't understand your reference to "Sandboxes in Python traditionally have not been secure." There is no known vulnerability in pysandbox, did I miss something? (there is only a limitation on the dict API because of the lack of frozendict.)
Are you talking about rexec/Bastion? (which cannot be qualified as "recent" :-))
pysandbox limitations are documented in its README file:
<< pysandbox is a sandbox for the Python namespace, not a sandbox between Python and the operating system. It doesn't protect your system against Python security vulnerabilities: vulnerabilities in modules/functions available in your sandbox (depend on your sandbox configuration). By default, only few functions are exposed to the sandbox namespace which limits the attack surface.
pysandbox is unable to limit the memory of the sandbox process: you have to use your own protection. >>
Hum, I am also not sure that pysandbox "works" with threads :-) I mean that enabling pysandbox impacts all running threads, not only one thread, which can cause issues. It should also be mentioned.
PyPy sandbox has a different design: it uses a process with no priviledge, all syscalls are redirected to another process which apply security checks to each syscall. http://doc.pypy.org/en/latest/sandbox.html
See also the seccomp-nurse project, a generic sandbox using Linux SECCOMP: http://chdir.org/~nico/seccomp-nurse/
See also pysandbox README for a list of other Python security modules.
Hm. I can't tell what the purpose of a sandbox is from what you quote from your own README here (and my cellphone tethering is slow enough that clicking on the links doesn't work right now). The sandboxes I'm familiar with (e.g. Google App Engine) are intended to allow untrusted third parties to execute (more or less) arbitrary code while strictly controlling which resources they can access. In App Engine's case, an attacker who broke out of the sandbox would have access to the inside of Google's datacenter, which would obviously be bad -- that's why Google has developed its own sandboxing technologies. I do know that I don't feel comfortable having a sandbox in the Python standard library or even recommending a 3rd party sandboxing solution -- if someone uses the sandbox to protect a critical resource, and a hacker breaks out of the sandbox, the author of the sandbox may be held responsible for more than they bargained for when they made it open source. (Doesn't an open source license limit your responsibility? Who knows. AFAIK this question has not gotten to court yet. I wouldn't want to have to go to court over it.) I wasn't just referring of rexec/Bastion (though that definitely shaped my thinking about this issue; much more recently someone (Tal, I think was his name?) tried to come up with a sandbox and every time he believed he had a perfect solution, somebody found a loophole. (Hm..., you may have been involved that time yourself. :-) -- --Guido van Rossum (python.org/~guido)
In App Engine's case, an attacker who broke out of the sandbox would have access to the inside of Google's datacenter, which would obviously be bad -- that's why Google has developed its own sandboxing technologies.
This is not specific to Google: if an attacker breaks a sandbox, he/she has access to everything. Depending on how the sandbox is implemented, you have more or less code to audit. pysandbox disables introspection in Python and create an empty namespace to reduce as much as possible the attack surface. You are to be very careful when you add a new feature/function and it is complex.
I do know that I don't feel comfortable having a sandbox in the Python standard library or even recommending a 3rd party sandboxing solution
frozendict would help pysandbox but also any security Python module, not security, but also (many) other use cases ;-)
I wasn't just referring of rexec/Bastion (though that definitely shaped my thinking about this issue; much more recently someone (Tal, I think was his name?) tried to come up with a sandbox and every time he believed he had a perfect solution, somebody found a loophole. (Hm..., you may have been involved that time yourself. :-)
pysandbox is based on tav's approach, but it is more complete and implement more protections. It is also more functional (you have more available functions and features). I challenge anyone to try to break pysandbox! Victor
I challenge anyone to try to break pysandbox!
Can you explain precisely how a frozendict will help pysandbox? Then I'll be able to beat this challenge :-)
See this email: http://mail.python.org/pipermail/python-dev/2012-February/117011.html The issue #14162 has also two patches: one to make it possible to use frozendict for __builtins__, and another one to create read-only types (which is more a proof-of-concept). http://bugs.python.org/issue14162 Victor
On Thu, Mar 1, 2012 at 9:44 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
frozendict would help pysandbox but also any security Python module, not security, but also (many) other use cases ;-)
Well, let's focus on the other use cases, because to me the sandbox use case is too controversial (never mind how confident you are :-). I like thinking through the cache use case a bit more, since this is a common pattern. But I think it would be sufficient there to prevent accidental modification, so it should be sufficient to have a dict subclass that overrides the various mutating methods: __setitem__, __delitem__, pop(), popitem(), clear(), setdefault(), update(). Technically also __init__() -- although calling __init__() on an existing object can hardly be called an accident. As was pointed out this is easy to circumvent, but (together with a reminder in the documentation) should be sufficient to avoid mistakes. I imagine someone who actively wants to mess with the cache can probably also reach into the cache implementation directly. Also don't underestimate the speed of a shallow dict copy. What other use cases are there? (I have to agree with the folks pushing back hard. Even demonstrated repeated requests for a certain feature do not prove a need -- it's quite common for people who are trying to deal with some problem to go down the wrong rabbit hole in their quest for a solution, and ending up thinking they need a certain feature while completely overlooking a much simpler solution.) -- --Guido van Rossum (python.org/~guido)
* Guido van Rossum wrote:
On Thu, Mar 1, 2012 at 9:44 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
frozendict would help pysandbox but also any security Python module, not security, but also (many) other use cases ;-)
Well, let's focus on the other use cases, because to me the sandbox use case is too controversial (never mind how confident you are :-).
I like thinking through the cache use case a bit more, since this is a common pattern. But I think it would be sufficient there to prevent accidental modification, so it should be sufficient to have a dict subclass that overrides the various mutating methods: __setitem__, __delitem__, pop(), popitem(), clear(), setdefault(), update().
For the caching part, simply making the dictproxy type public would already help a lot.
What other use cases are there?
dicts as keys or as set members. I do run into this from time to time and always get tuple(sorted(items()) or something like that. nd -- s s^saaaaaoaaaoaaaaooooaaoaaaomaaaa a alataa aaoat a a a maoaa a laoata a oia a o a m a o alaoooat aaool aaoaa matooololaaatoto aaa o a o ms;s;\s;s;g;y;s;:;s;y#mailto: # \51/\134\137| http://www.perlig.de #;print;# > nd@perlig.de
On Thu, Mar 1, 2012 at 12:35 PM, André Malo <nd@perlig.de> wrote:
* Guido van Rossum wrote:
On Thu, Mar 1, 2012 at 9:44 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
frozendict would help pysandbox but also any security Python module, not security, but also (many) other use cases ;-)
Well, let's focus on the other use cases, because to me the sandbox use case is too controversial (never mind how confident you are :-).
I like thinking through the cache use case a bit more, since this is a common pattern. But I think it would be sufficient there to prevent accidental modification, so it should be sufficient to have a dict subclass that overrides the various mutating methods: __setitem__, __delitem__, pop(), popitem(), clear(), setdefault(), update().
For the caching part, simply making the dictproxy type public would already help a lot.
Heh, that's a great idea. Can you file a bug for that?
What other use cases are there?
dicts as keys or as set members. I do run into this from time to time and always get tuple(sorted(items()) or something like that.
I know I've done that once or twice in my life too, but it's a pretty rare use case and as you say the solution is simple enough. An alternative is frozenset(d.items()) -- someone should compare the timing of these for large dicts. -- --Guido van Rossum (python.org/~guido)
Le 01/03/2012 19:07, Guido van Rossum a écrit :
What other use cases are there?
frozendict could be used to implement "read-only" types: it is not possible to add or remove an attribute or set an attribute value, but attribute value can be a mutable object. Example of an enum with my type_final.patch (attached to issue #14162).
class Color: ... red=1 ... green=2 ... blue=3 ... __final__=True ... Color.red 1 Color.red=2 TypeError: 'frozendict' object does not support item assignment Color.yellow=4 TypeError: 'frozendict' object does not support item assignment Color.__dict__ frozendict({...})
The implementation avoids the private PyDictProxy for read-only types, type.__dict__ gives directly access to the frozendict (but type.__dict__=newdict is still blocked). The "__final__=True" API is just a proposition, it can be anything else, maybe a metaclass. Using a frozendict for type.__dict__ is not the only possible solution to implement read-only types. There are also Python implementation using properties. Using a frozendict is faster than using properties because getting an attribute is just a fast dictionary lookup, whereas reading a property requires to execute a Python function. The syntax to declare a read-only class is also more classic using the frozendict approach. Victor
On Thu, Mar 1, 2012 at 4:39 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Le 01/03/2012 19:07, Guido van Rossum a écrit :
What other use cases are there?
frozendict could be used to implement "read-only" types: it is not possible to add or remove an attribute or set an attribute value, but attribute value can be a mutable object. Example of an enum with my type_final.patch (attached to issue #14162).
class Color: ... red=1 ... green=2 ... blue=3 ... __final__=True ... Color.red 1 Color.red=2
TypeError: 'frozendict' object does not support item assignment
Color.yellow=4
TypeError: 'frozendict' object does not support item assignment
Color.__dict__ frozendict({...})
The implementation avoids the private PyDictProxy for read-only types, type.__dict__ gives directly access to the frozendict (but type.__dict__=newdict is still blocked).
The "__final__=True" API is just a proposition, it can be anything else, maybe a metaclass.
Using a frozendict for type.__dict__ is not the only possible solution to implement read-only types. There are also Python implementation using properties. Using a frozendict is faster than using properties because getting an attribute is just a fast dictionary lookup, whereas reading a property requires to execute a Python function. The syntax to declare a read-only class is also more classic using the frozendict approach.
I think you should provide stronger arguments in each case why the data needs to be truly immutable or read-only, rather than just using a convention or an "advisory" API (like __private can be circumvented but clearly indicates intent to the reader). -- --Guido van Rossum (python.org/~guido)
On Thu, 01 Mar 2012 16:50:06 -0800, Guido van Rossum <guido@python.org> wrote:
On Thu, Mar 1, 2012 at 4:39 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
frozendict could be used to implement "read-only" types: it is not possible to add or remove an attribute or set an attribute value, but attribute value can be a mutable object. Example of an enum with my type_final.patch (attached to issue #14162). [...]
I think you should provide stronger arguments in each case why the data needs to be truly immutable or read-only, rather than just using a convention or an "advisory" API (like __private can be circumvented but clearly indicates intent to the reader).
+1. Except in very limited circumstances (such as a security sandbox) I would *much* rather have the code I'm interacting with use advisory means rather than preventing me from being a consenting adult. (Having to name mangle by hand when someone has used a __ method is painful enough, thank you...good thing the need to do that doesn't dome up often (mostly only in unit tests)). --David
On Fri, Mar 2, 2012 at 11:50 AM, R. David Murray <rdmurray@bitdance.com> wrote:
+1. Except in very limited circumstances (such as a security sandbox) I would *much* rather have the code I'm interacting with use advisory means rather than preventing me from being a consenting adult. (Having to name mangle by hand when someone has used a __ method is painful enough, thank you...good thing the need to do that doesn't dome up often (mostly only in unit tests)).
The main argument I'm aware of in favour of this kind of enforcement is that it means you get exceptions at the point of *error* (trying to modify the "read-only" dict), rather than having a strange action-at-a-distance data mutation bug to track down. However, in that case, it's just fine (and in fact better) if there is a way around the default enforcement via a more verbose spelling. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 2012-03-01, at 7:50 PM, Guido van Rossum wrote:
I think you should provide stronger arguments in each case why the data needs to be truly immutable or read-only, rather than just using a convention or an "advisory" API (like __private can be circumvented but clearly indicates intent to the reader).
Here's one more argument to support frozendicts. For last several months I've been thinking about prohibiting coroutines (generators + greenlets in our framework) to modify the global state. If there is a guarantee that all coroutines of the whole application, modules and framework are 100% safe from that, it's possible to do some interesting stuff. For instance, dynamically balance jobs across all application processes: @coroutine def on_generate_report(context): data = yield fetch_financial_data(context) ... In the above example, 'fetch_financial_data' may be executed in the different process, or even on the different server, if the coroutines' scheduler of current process decides so (based on its load, or a low priority of the coroutine being scheduled). With built-in frozendict it will be easier to secure modules or functions' __globals__ that way, allowing to play with features closer to the ones Erlang and other concurrent languages provide. - Yury
On Thu, Mar 1, 2012 at 6:13 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
On 2012-03-01, at 7:50 PM, Guido van Rossum wrote:
I think you should provide stronger arguments in each case why the data needs to be truly immutable or read-only, rather than just using a convention or an "advisory" API (like __private can be circumvented but clearly indicates intent to the reader).
Here's one more argument to support frozendicts.
For last several months I've been thinking about prohibiting coroutines (generators + greenlets in our framework) to modify the global state. If there is a guarantee that all coroutines of the whole application, modules and framework are 100% safe from that, it's possible to do some interesting stuff. For instance, dynamically balance jobs across all application processes:
@coroutine def on_generate_report(context): data = yield fetch_financial_data(context) ...
In the above example, 'fetch_financial_data' may be executed in the different process, or even on the different server, if the coroutines' scheduler of current process decides so (based on its load, or a low priority of the coroutine being scheduled).
With built-in frozendict it will be easier to secure modules or functions' __globals__ that way, allowing to play with features closer to the ones Erlang and other concurrent languages provide.
That sounds *very* far-fetched. You're pretty much designing a new language variant. It's not an argument for burdening the original language with a data type it doesn't need for itself. You should be able to prototype what you want using an advisory subclass (if you subclass dict and add __slots__=[] to it, it will cost very little overhead) or using a custom extension that implements the flavor of frozendict that works best for you -- given that you're already using greenlets, another extension can't be a bid burden. -- --Guido van Rossum (python.org/~guido)
On 2012-03-01, at 9:31 PM, Guido van Rossum wrote:
That sounds *very* far-fetched. You're pretty much designing a new language variant. It's not an argument for burdening the original
Yeah, that's what we do ;)
You should be able to prototype what you want using an advisory subclass (if you subclass dict and add __slots__=[] to it, it will cost very little overhead) or using a custom extension that implements the flavor of frozendict that works best for you -- given that you're already using greenlets, another extension can't be a bid burden.
I understand. The only reason I wrote about it is to give an idea of how frozendicts may be used besides just sandboxing. I'm not strongly advocating for it, though. - Yury
I think you should provide stronger arguments in each case why the data needs to be truly immutable or read-only, rather than just using a convention or an "advisory" API (like __private can be circumvented but clearly indicates intent to the reader).
I only know one use case for "truly immutable or read-only" object (frozendict, "read-only" type, read-only proxy, etc.): security. I know three modules using a C extension to implement read only objects: zope.proxy, zope.security and mxProxy. pysandbox uses more ugly tricks to implement read-only proxies :-) Such modules are used to secure web applications for example. A frozendict type doesn't replace these modules but help to implement security modules. http://www.egenix.com/products/python/mxBase/mxProxy/ http://pypi.python.org/pypi/zope.proxy http://pypi.python.org/pypi/zope.security Victor
On Thu, Mar 1, 2012 at 10:00 AM, Guido van Rossum <guido@python.org> wrote:
I do know that I don't feel comfortable having a sandbox in the Python standard library or even recommending a 3rd party sandboxing solution -- if someone uses the sandbox to protect a critical resource, and a hacker breaks out of the sandbox, the author of the sandbox may be held responsible for more than they bargained for when they made it open source. (Doesn't an open source license limit your responsibility? Who knows. AFAIK this question has not gotten to court yet. I wouldn't want to have to go to court over it.)
Since there's no way (even theoretical way) to completely secure anything (remember the DVD protection wars?), there's no way there should be any liability if reasonable diligence is performed to provide security where expected (which is probably calculable to some %-age of assets protected). It's like putting a lock on the door of your house -- you can't expect to be held liable is someone has a crowbar. Open sourcing code could be said to be a disclaimer on any liability as your letting people know that you've got nothing your trying to conceal. It's like a dog who plays dead: by being totally open you're actually more secure.... mark
Mark Janssen writes:
Since there's no way (even theoretical way) to completely secure anything (remember the DVD protection wars?), there's no way there should be any liability if reasonable diligence is performed to provide security where expected (which is probably calculable to some %-age of assets protected).
That's not how the law works, sorry. Look up "consequential damages," "contributory negligence," and "attractive nuisance." I'm not saying that anybody will lose *in* court, but one can surely be taken *to* court. If that happens to you, you've already lost (even if the other side can't win).
Open sourcing code could be said to be a disclaimer on any liability as your letting people know that you've got nothing your trying to conceal.
Again, you seem to be revealing your ignorance of the law (not to mention security -- a safe is supposed to be secure even if the burglar has the blueprints). A comprehensive and presumably effective disclaimer is part of the license, but it's not clear that even that works. AFAIK such disclaimers are not well-tested in court. Guido is absolutely right. There is a risk here (not in the frozendict type, of course), but in distributing an allegedly effective sandbox. I doubt Victor as an individual doing research has a problem; the PSF is another matter. BTW, Larry Rosen's book on Open Source Licensing is a good reference. Andrew St. Laurent also has a book out, I like Larry's better but YMMV.
01.03.12 01:52, Victor Stinner написав(ла):
Problem: if you implement a frozendict type inheriting from dict in Python, it is still possible to call dict methods (e.g. dict.__setitem__()). To fix this issue, pysandbox removes all dict methods modifying the dict: __setitem__, __delitem__, pop, etc. This is a problem because untrusted code cannot use these methods on valid dict created in the sandbox.
You can redefine dict.__setitem__. oldsetitem = dict.__setitem__ def newsetitem(self, value): # check if self is not frozendict ... oldsetitem(self, value) .... dict.__setitem__ = newsetitem
Problem: if you implement a frozendict type inheriting from dict in Python, it is still possible to call dict methods (e.g. dict.__setitem__()). To fix this issue, pysandbox removes all dict methods modifying the dict: __setitem__, __delitem__, pop, etc. This is a problem because untrusted code cannot use these methods on valid dict created in the sandbox.
You can redefine dict.__setitem__.
Ah? It doesn't work here.
dict.__setitem__=lambda key, value: None Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't set attributes of built-in/extension type 'dict'
Victor
01.03.12 11:11, Victor Stinner написав(ла):
You can redefine dict.__setitem__. Ah? It doesn't work here.
dict.__setitem__=lambda key, value: None Traceback (most recent call last): File "<stdin>", line 1, in<module> TypeError: can't set attributes of built-in/extension type 'dict'
Hmm, yes, it's true. It was too presumptuous of me to believe that you have not considered such simple approach. But I will try to suggest another approach. `frozendict` inherits from `dict`, but data is not stored in the parent, but in the internal dictionary. And even if dict.__setitem__ is used, it will have no visible effect. class frozendict(dict): def __init__(self, values={}): self._values = dict(values) def __getitem__(self, key): return self._values[key] def __setitem__(self, key, value): raise TypeError ("expect dict, got frozendict") ...
a = frozendict({1: 2, 3: 4}) a[1] 2 a[5] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 5, in __getitem__ KeyError: 5 a[5] = 6 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 7, in __setitem__ TypeError: expect dict, got frozendict dict.__setitem__(a, 5, 6) a[5] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 5, in __getitem__ KeyError: 5
But I will try to suggest another approach. `frozendict` inherits from `dict`, but data is not stored in the parent, but in the internal dictionary. And even if dict.__setitem__ is used, it will have no visible effect.
class frozendict(dict): def __init__(self, values={}): self._values = dict(values) def __getitem__(self, key): return self._values[key] def __setitem__(self, key, value): raise TypeError ("expect dict, got frozendict") ...
I would like to implement frozendict in C to be able to pass it to PyDict_GetItem(), PyDict_SetItem() and PyDict_DelItem(). Using such Python implementation, you would get surprising result: d = frozendict() dict.__setitem__(d, 'x', 1) # this is what Python does internally when it expects a dict (e.g. in ceval.c for __builtins__) 'x' in d => False (Python is not supposed to use the PyDict API if the object is a dict subclass, but PyObject_Get/SetItem.) Victor
A builtin frozendict type "compatible" with the PyDict C API is very convinient for pysandbox because using this type for core features like builtins requires very few modification. For example, use frozendict for __builtins__ only requires to modify 3 lines in frameobject.c.
See the frozendict_builtins.patch attached to the issue #14162. Last version: http://bugs.python.org/file24690/frozendict_builtins.patch Victor
A frozendict type is a common request from users and there are various implementations.
ISTM, this request is never from someone who has a use case.
One of my colleagues implemented recently its own frozendict class (which the "frozendict" name ;-)). He tries to implement something like the PEP 351, not a generic freeze() function but a specialized function for his use case (only support list/tuple and dict/frozendict if I remember correctly). It remembers me the question: why does Python not provide a frozendict type? Even if it is not possible to write a perfect freeze() function, it looks like some developers need sort of this function and I hope that frozendict would be a first step in the good direction. Ruby has a freeze method. On a dict, it provides the same behaviour than frozendict: the mapping cannot be modified anymore, but values are still mutable. http://ruby-doc.org/core-1.9.3/Object.html#method-i-freeze
Many experienced Python users simply forget that we have a frozenset type. We don't get bug reports or feature requests about the type.
I used it in my previous work to declare the access control list (ACL) on services provided by XML-RPC object. To be honest, set could also be used, but I chose frozenset to ensure that my colleagues don't try to modify it without understanding the consequences of such change. It was not a protecting against evil hackers from the Internet, but from my colleagues :-) Sorry, I didn't find any bug in frozenset :-) My usage was just to declare a frozendict and then check if an item is in the set, and it works pretty well!
P.S. The one advantage I can see for frozensets and frozendicts is that we have an opportunity to optimize them once they are built (optimizing insertion order to minimize collisions, increasing or decreasing density, eliminating dummy entries, etc). That being said, the same could be accomplished for regular sets and dicts by the addition of an optimize() method.
You can also implement more optimizations in Python peephole or PyPy JIT because the mapping is constant and so you can do the lookup at compilation, instead of doing it at runtime. Dummy example: --- config = frozendict(debug=False) if config['debug']: enable_debug() --- config['debug'] is always False and so you can just drop the call to enable_debug() while compiling this code. It would avoid the need of a preprocessor in some cases (especially conditional code, like the C #ifdef). Victor
On Feb 29, 2012, at 4:23 PM, Victor Stinner wrote:
One of my colleagues implemented recently its own frozendict class (which the "frozendict" name ;-)
I write new collection classes all the time. That doesn't mean they warrant inclusion in the library or builtins. There is a use case for ListenableSets and ListenableDicts -- do we need them in the library? I think not. How about case insensitive variants? I think not. There are tons of recipes on ASPN and on PyPI. That doesn't make them worth adding in to the core group of types. As core developers, we need to place some value on language compactness and learnability. The language has already gotten unnecessarily fat -- it is the rare Python programmer who knows set operations on dict views, new-style formatting, abstract base classes, contextlib/functools/itertools, how the with-statement works, how super() works, what properties/staticmethods/classmethods are for, differences between new and old-style classes, Exception versus BaseException, weakreferences, __slots__, chained exceptions, etc. If we were to add another collections type, it would need to be something that powerfully adds to the expressivity of the language. Minor variants on what we already have just makes that language harder to learn and remember but not providing much of a payoff in return. Raymond
On 01.03.2012 02:45, Raymond Hettinger wrote:
On Feb 29, 2012, at 4:23 PM, Victor Stinner wrote:
One of my colleagues implemented recently its own frozendict class (which the "frozendict" name ;-)
I write new collection classes all the time. That doesn't mean they warrant inclusion in the library or builtins. There is a use case for ListenableSets and ListenableDicts -- do we need them in the library? I think not. How about case insensitive variants? I think not. There are tons of recipes on ASPN and on PyPI. That doesn't make them worth adding in to the core group of types.
+1. Georg
Il 01 marzo 2012 02:45, Raymond Hettinger <raymond.hettinger@gmail.com> ha scritto:
On Feb 29, 2012, at 4:23 PM, Victor Stinner wrote:
One of my colleagues implemented recently its own frozendict class (which the "frozendict" name ;-)
I write new collection classes all the time. That doesn't mean they warrant inclusion in the library or builtins. There is a use case for ListenableSets and ListenableDicts -- do we need them in the library? I think not. How about case insensitive variants? I think not. There are tons of recipes on ASPN and on PyPI. That doesn't make them worth adding in to the core group of types.
As core developers, we need to place some value on language compactness and learnability. The language has already gotten unnecessarily fat -- it is the rare Python programmer who knows set operations on dict views, new-style formatting, abstract base classes, contextlib/functools/itertools, how the with-statement works, how super() works, what properties/staticmethods/classmethods are for, differences between new and old-style classes, Exception versus BaseException, weakreferences, __slots__, chained exceptions, etc.
If we were to add another collections type, it would need to be something that powerfully adds to the expressivity of the language. Minor variants on what we already have just makes that language harder to learn and remember but not providing much of a payoff in return.
Raymond
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com
+1 --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/
Actually I find fronzendict concept quite useful. We also have an implementation in our framework, and we use it, for instance, in http request object, for parsed arguments and parsed forms, which values shouldn't be ever modified once parsed. Of course everybody can live without it, but given the fact of how easy it is to implement it I think its OK to have it. +1. On 2012-02-29, at 8:45 PM, Raymond Hettinger wrote:
On Feb 29, 2012, at 4:23 PM, Victor Stinner wrote:
One of my colleagues implemented recently its own frozendict class (which the "frozendict" name ;-)
I write new collection classes all the time. That doesn't mean they warrant inclusion in the library or builtins. There is a use case for ListenableSets and ListenableDicts -- do we need them in the library? I think not. How about case insensitive variants? I think not. There are tons of recipes on ASPN and on PyPI. That doesn't make them worth adding in to the core group of types.
As core developers, we need to place some value on language compactness and learnability. The language has already gotten unnecessarily fat -- it is the rare Python programmer who knows set operations on dict views, new-style formatting, abstract base classes, contextlib/functools/itertools, how the with-statement works, how super() works, what properties/staticmethods/classmethods are for, differences between new and old-style classes, Exception versus BaseException, weakreferences, __slots__, chained exceptions, etc.
If we were to add another collections type, it would need to be something that powerfully adds to the expressivity of the language. Minor variants on what we already have just makes that language harder to learn and remember but not providing much of a payoff in return.
Raymond
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com
On 1 March 2012 12:37, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Actually I find fronzendict concept quite useful. We also have an implementation in our framework, and we use it, for instance, in http request object, for parsed arguments and parsed forms, which values shouldn't be ever modified once parsed.
The question isn't so much whether it's useful, as whether it's of sufficiently general use to warrant putting it into the core language (not even the stdlib, but the C core!). The fact that you have an implementation of your own, actually indicates that not having it in the core didn't cause you any real problems. Remember - the bar for core acceptance is higher than just "it is useful". I'm not even sure I see a strong enough case for frozendict being in the standard library yet, let alone in the core. Paul.
Raymond Hettinger wrote:
On Feb 27, 2012, at 10:53 AM, Victor Stinner wrote:
A frozendict type is a common request from users and there are various implementations.
ISTM, this request is never from someone who has a use case. Instead, it almost always comes from "completers", people who see that we have a frozenset type and think the core devs missed the ObviousThingToDo(tm). Frozendicts are trivial to implement, so that is why there are various implementations (i.e. the implementations are more fun to write than they are to use).
They might be trivial for *you*, but the fact that people keep asking for help writing a frozendict, or stating that their implementation sucks, demonstrates that for the average Python coder they are not trivial at all. And the implementations I've seen don't seem to be so much fun as *tedious*. E.g. google on "python frozendict" and the second link is from somebody who had tried for "a couple of days" and is still not happy: http://python.6.n6.nabble.com/frozendict-td4377791.html You may dismiss him as a "completer", but what is asserted without evidence can be rejected without evidence, and so we may just as well declare that he has a brilliantly compelling use-case, if only we knew what it was... <wink> I see one implementation on ActiveState that has at least one serious problem, reported by you: http://code.activestate.com/recipes/414283-frozen-dictionaries/ So I don't think we can dismiss frozendict as "trivial". -- Steven
On Wednesday 29 February 2012 20:17:05 Raymond Hettinger wrote:
On Feb 27, 2012, at 10:53 AM, Victor Stinner wrote:
A frozendict type is a common request from users and there are various implementations.
ISTM, this request is never from someone who has a use case. Instead, it almost always comes from "completers", people who see that we have a frozenset type and think the core devs missed the ObviousThingToDo(tm). Frozendicts are trivial to implement, so that is why there are various implementations (i.e. the implementations are more fun to write than they are to use).
The frozenset type covers a niche case that is nice-to-have but *rarely* used. Many experienced Python users simply forget that we have a frozenset type. We don't get bug reports or feature requests about the type. When I do Python consulting work, I never see it in a client's codebase. It does occasionally get discussed in questions on StackOverflow but rarely gets offered as an answer (typically on variants of the "how do you make a set-of-sets" question). If Google's codesearch were still alive, we could add another datapoint showing how infrequently this type is used.
Here are my real-world use cases. Not for security, but for safety and performance reasons (I've built by own RODict and ROList modeled after dictproxy): - Global, but immutable containers, e.g. as class members - Caching. My data container objects (say, resultsets from a db or something) usually inherit from list or dict (sometimes also set) and are cached heavily. In order to ensure that they are not modified (accidentially), I have to choices: deepcopy or immutability. deepcopy is so expensive, that it's often cheaper to just leave out the cache. So I use immutability. (oh well, the objects are further restricted with __slots__) I agree, these are not general purpose issues, but they are not *rare*, I'd think. nd
Here are my real-world use cases. Not for security, but for safety and performance reasons (I've built by own RODict and ROList modeled after dictproxy):
- Global, but immutable containers, e.g. as class members
I attached type_final.patch to the issue #14162 to demonstrate how frozendict can be used to implement a "read-only" type. Last version: http://bugs.python.org/file24696/type_final.patch Example:
class FinalizedType: ... __final__=True ... attr = 10 ... def hello(self): ... print("hello") ... FinalizedType.attr=12 TypeError: 'frozendict' object does not support item assignment FinalizedType.hello=print TypeError: 'frozendict' object does not support item assignment
(instance do still have a mutable dict) My patch checks for the __final__ class attribute, but the conversion from dict to frozendict may be done by a function or a type method. Creating a read-only type is a different issue, it's just another example of frozendict usage. Victor
On Thursday 01 March 2012 14:07:10 Victor Stinner wrote:
Here are my real-world use cases. Not for security, but for safety and performance reasons (I've built by own RODict and ROList modeled after dictproxy):
- Global, but immutable containers, e.g. as class members
I attached type_final.patch to the issue #14162 to demonstrate how frozendict can be used to implement a "read-only" type. Last version: http://bugs.python.org/file24696/type_final.patch
Oh, hmm. I rather meant something like that: """ class Foo: some_mapping = frozendict( blah=1, blub=2 ) or as a variant: def zonk(some_default=frozendict(...)): ... or simply a global object: baz = frozendict(some_immutable_mapping) """ I'm not sure about your final types. I'm using __slots__ = () for such things (?) nd
Here are my real-world use cases. Not for security, but for safety and performance reasons (I've built by own RODict and ROList modeled after dictproxy):
- Global, but immutable containers, e.g. as class members
I attached type_final.patch to the issue #14162 to demonstrate how frozendict can be used to implement a "read-only" type. Last version: http://bugs.python.org/file24696/type_final.patch
Oh, hmm. I rather meant something like that:
""" class Foo: some_mapping = frozendict( blah=1, blub=2 ) or as a variant:
def zonk(some_default=frozendict(...)): ... or simply a global object:
baz = frozendict(some_immutable_mapping) """
Ah yes, frozendict is useful for such cases.
I'm not sure about your final types. I'm using __slots__ = () for such things
You can still replace an attribute value if a class defines __slots__:
class A: ... __slots__=('x',) ... x = 1 ... A.x=2 A.x 2
Victor
On Thursday 01 March 2012 15:54:01 Victor Stinner wrote:
I'm not sure about your final types. I'm using __slots__ = () for such things
You can still replace an attribute value if a class defines __slots__:
class A:
... __slots__=('x',) ... x = 1 ...
A.x=2 A.x
2
Ah, ok, I missed that. It should be fixable with a metaclass. Not very nicely, though. nd
On Thu, Mar 1, 2012 at 7:29 PM, André Malo <nd@perlig.de> wrote:
- Caching. My data container objects (say, resultsets from a db or something) usually inherit from list or dict (sometimes also set) and are cached heavily. In order to ensure that they are not modified (accidentially), I have to choices: deepcopy or immutability. deepcopy is so expensive, that it's often cheaper to just leave out the cache. So I use immutability. (oh well, the objects are further restricted with __slots__)
Speaking of caching - functools.lru_cache currently has to do a fair bit of work in order to correctly cache keyword arguments. It's obviously a *solvable* problem even without frozendict in the collections module (it just stores the dict contents as a sorted tuple of 2-tuples), but it would still be interesting to compare the readability, speed and memory consumption differences of a version of lru_cache that used frozendict to cache the keyword arguments instead. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
01.03.12 11:29, André Malo написав(ла):
- Caching. My data container objects (say, resultsets from a db or something) usually inherit from list or dict (sometimes also set) and are cached heavily. In order to ensure that they are not modified (accidentially), I have to choices: deepcopy or immutability. deepcopy is so expensive, that it's often cheaper to just leave out the cache. So I use immutability. (oh well, the objects are further restricted with __slots__)
This is the first rational use of frozendict that I see. However, a deep copy is still necessary to create the frozendict. For this case, I believe, would be better to "freeze" dict inplace and then copy-on-write it.
On Thursday 01 March 2012 15:17:35 Serhiy Storchaka wrote:
01.03.12 11:29, André Malo написав(ла):
- Caching. My data container objects (say, resultsets from a db or something) usually inherit from list or dict (sometimes also set) and are cached heavily. In order to ensure that they are not modified (accidentially), I have to choices: deepcopy or immutability. deepcopy is so expensive, that it's often cheaper to just leave out the cache. So I use immutability. (oh well, the objects are further restricted with __slots__)
This is the first rational use of frozendict that I see. However, a deep copy is still necessary to create the frozendict. For this case, I believe, would be better to "freeze" dict inplace and then copy-on-write it.
In my case it's actually a half one. The data mostly comes from memcache ;) I'm populating the object and then I'm done with it. People wanting to modify it, need to copy it, yes. OTOH usually a shallow copy is enough (here). Funnily my ROList actually provides a "sorted" method instead of "sort" in order to create a sorted copy of the list. nd
01.03.12 16:47, André Malo написав(ла):
On Thursday 01 March 2012 15:17:35 Serhiy Storchaka wrote:
This is the first rational use of frozendict that I see. However, a deep copy is still necessary to create the frozendict. For this case, I believe, would be better to "freeze" dict inplace and then copy-on-write it. In my case it's actually a half one. The data mostly comes from memcache ;) I'm populating the object and then I'm done with it. People wanting to modify it, need to copy it, yes. OTOH usually a shallow copy is enough (here).
What if people modify dicts in deep? a = frozendict({1: {2: 3}}) b = a.copy() c = a.copy() assert b[1][2] == 3 c[1][2] = 4 assert b[1][2] == 4 You need to copy incoming dict in depth. def frozencopy(value): if isinstance(value, list): return tuple(frozencopy(x) for x in value) if isinstance(value, dict): return frozendict((frozencopy(k), frozencopy(v)) for k, v in value.items()) return value # I'm lucky And when client wants to modify the result in depth it should call "unfrozencopy". Using frozendict profitable only when multiple clients are reading the result, but not modify it. Copy-on-write would help in all cases and would simplify the code. But this is a topic for python-ideas, sorry.
* Serhiy Storchaka wrote:
01.03.12 16:47, André Malo написав(ла):
On Thursday 01 March 2012 15:17:35 Serhiy Storchaka wrote:
This is the first rational use of frozendict that I see. However, a deep copy is still necessary to create the frozendict. For this case, I believe, would be better to "freeze" dict inplace and then copy-on-write it.
In my case it's actually a half one. The data mostly comes from memcache ;) I'm populating the object and then I'm done with it. People wanting to modify it, need to copy it, yes. OTOH usually a shallow copy is enough (here).
What if people modify dicts in deep?
that's the "here" part. They can't [1]. These objects are typically ROLists of RODicts. Maybe nested deeper, but all RO* or other immutable types. I cheated, by deepcopying always in the cache, but defining __deepcopy__ for those RO* objects as "return self". nd [1] Well, an attacker could, because it's still based on regular dicts and lists. But thatswhy it's not a security feature, but a safety net (here). -- "Solides und umfangreiches Buch" -- aus einer Rezension <http://pub.perlig.de/books.html#apache2>
participants (25)
-
Alex Gaynor
-
André Malo
-
Antoine Pitrou
-
Chris Angelico
-
Dirkjan Ochtman
-
Eli Bendersky
-
Georg Brandl
-
Giampaolo Rodolà
-
Guido van Rossum
-
Jim J. Jewett
-
M.-A. Lemburg
-
Mark Janssen
-
Mark Shannon
-
Nick Coghlan
-
Paul Moore
-
R. David Murray
-
Raymond Hettinger
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Tres Seaver
-
Victor Stinner
-
Victor Stinner
-
Xavier Morel
-
Yury Selivanov