Make HAMT available to python script

Hi, HAMT is a very useful immutable mapping type. Currently CPython use it internally to implement contextvar. Considering immutable data structure is very useful I hope we can make it available to python script(maybe via collections module). Immutable data structures are fundamental parts of our project. Currently we use a full-featured python immutable data library called pyrsistent. Pyrsistent is very powerful, however the map type in it is implemented in python script not in C. It becomes slow when the data set is relatively large. On the other hand, CPython now already has an immutable mapping type in it. I think maybe it’s a good idea to make it public? Many projects can benefit from it I believe. Here is a talk given by the author of javascript immutable-js library explain why immutable data structures are powerful: https://www.youtube.com/watch?v=I7IdS-PbEgI Pyristent: https://github.com/tobgu/pyrsistent What do you think? Cheers, Kai

Hi, In 2019, Yury Selivanov, who added HAMT and contextvars to Python, wrote PEP 603 "Adding a frozenmap type to collections": https://peps.python.org/pep-0603/ Sadly, the PEP was stuck in discussions: * https://discuss.python.org/t/pep-603-adding-a-frozenmap-type-to-collections/... * https://discuss.python.org/t/pep-603-frozenmap-vs-my-frozendict/2473 I don't think that differences between dict and frozenmap was a blocker issue. In fact, if I recall correctly, there was no blocker issue. Anyway, Yury is also maintaining the "immutables" 3rd party module which implements frozenmap: https://pypi.org/project/immutables/ Victor On Fri, Apr 1, 2022 at 11:37 AM zhang kai <kylerzhang11@gmail.com> wrote:
Hi,
HAMT is a very useful immutable mapping type. Currently CPython use it internally to implement contextvar. Considering immutable data structure is very useful I hope we can make it available to python script(maybe via collections module).
Immutable data structures are fundamental parts of our project. Currently we use a full-featured python immutable data library called pyrsistent. Pyrsistent is very powerful, however the map type in it is implemented in python script not in C. It becomes slow when the data set is relatively large.
On the other hand, CPython now already has an immutable mapping type in it. I think maybe it’s a good idea to make it public? Many projects can benefit from it I believe.
Here is a talk given by the author of javascript immutable-js library explain why immutable data structures are powerful: https://www.youtube.com/watch?v=I7IdS-PbEgI
Pyristent: https://github.com/tobgu/pyrsistent
What do you think?
Cheers, Kai _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2WYPX44W... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.

You may want to check PEP 603, which more or less proposes this (the author of the pep is the author of the HAMT code) check https://peps.python.org/pep-0603/ Alternatively, there is already a pypi package with this code: https://pypi.org/project/immutables/ Regards from cloudy London, Pablo On Fri, 1 Apr 2022, 10:34 zhang kai, <kylerzhang11@gmail.com> wrote:
Hi,
HAMT is a very useful immutable mapping type. Currently CPython use it internally to implement contextvar. Considering immutable data structure is very useful I hope we can make it available to python script(maybe via collections module).
Immutable data structures are fundamental parts of our project. Currently we use a full-featured python immutable data library called pyrsistent. Pyrsistent is very powerful, however the map type in it is implemented in python script not in C. It becomes slow when the data set is relatively large.
On the other hand, CPython now already has an immutable mapping type in it. I think maybe it’s a good idea to make it public? Many projects can benefit from it I believe.
Here is a talk given by the author of javascript immutable-js library explain why immutable data structures are powerful: https://www.youtube.com/watch?v=I7IdS-PbEgI
Pyristent: https://github.com/tobgu/pyrsistent
What do you think?
Cheers, Kai _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2WYPX44W... Code of Conduct: http://python.org/psf/codeofconduct/

Thanks Victor and Pablo. I will check the discussion of PEP 603. It's a little weird to use the immutables library when it's code in already in CPython but I'm glad it's an option. Kai On Fri, Apr 1, 2022 at 6:14 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
You may want to check PEP 603, which more or less proposes this (the author of the pep is the author of the HAMT code) check https://peps.python.org/pep-0603/
Alternatively, there is already a pypi package with this code:
https://pypi.org/project/immutables/
Regards from cloudy London, Pablo
On Fri, 1 Apr 2022, 10:34 zhang kai, <kylerzhang11@gmail.com> wrote:
Hi,
HAMT is a very useful immutable mapping type. Currently CPython use it internally to implement contextvar. Considering immutable data structure is very useful I hope we can make it available to python script(maybe via collections module).
Immutable data structures are fundamental parts of our project. Currently we use a full-featured python immutable data library called pyrsistent. Pyrsistent is very powerful, however the map type in it is implemented in python script not in C. It becomes slow when the data set is relatively large.
On the other hand, CPython now already has an immutable mapping type in it. I think maybe it’s a good idea to make it public? Many projects can benefit from it I believe.
Here is a talk given by the author of javascript immutable-js library explain why immutable data structures are powerful: https://www.youtube.com/watch?v=I7IdS-PbEgI
Pyristent: https://github.com/tobgu/pyrsistent
What do you think?
Cheers, Kai _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2WYPX44W... Code of Conduct: http://python.org/psf/codeofconduct/

On 4/1/2022 11:48 AM, zhang kai wrote:
Thanks Victor and Pablo. I will check the discussion of PEP 603. It's a little weird to use the immutables library when it's code in already in CPython but I'm glad it's an option.
The main difference is that 'immutables' offers you a stable/versioned interface to use it, while the one that's in CPython is an internal implementation detail. If one day we find a better design, we can just switch to it, while 'immutables' probably can't. If we've exposed as a public interface in the core runtime, it's much more complicated. (For what it's worth, the major thing that held up contextvars in the first place was justifying why it needed a new data structure that wasn't already in the core runtime. So we didn't adopt it lightly, and making sure we kept the freedom to change it was an important compromise.) There are plenty of other things in this same category, and because we want to keep things as stable as possible while also improving performance and reliability, we have to keep pretty tight limits on what we promise will remain stable. Most of our discussions are about finding this balance ;) Cheers, Steve

Hi, Steve. Thanks for your detailed explanation. Indeed I already saw the API discussion in PEP 603. It's much easier to make the decision in a third-party library. I think we will be fine with the immutables library. On Fri, Apr 1, 2022 at 7:05 PM Steve Dower <steve.dower@python.org> wrote:
On 4/1/2022 11:48 AM, zhang kai wrote:
Thanks Victor and Pablo. I will check the discussion of PEP 603. It's a little weird to use the immutables library when it's code in already in CPython but I'm glad it's an option.
The main difference is that 'immutables' offers you a stable/versioned interface to use it, while the one that's in CPython is an internal implementation detail. If one day we find a better design, we can just switch to it, while 'immutables' probably can't. If we've exposed as a public interface in the core runtime, it's much more complicated.
(For what it's worth, the major thing that held up contextvars in the first place was justifying why it needed a new data structure that wasn't already in the core runtime. So we didn't adopt it lightly, and making sure we kept the freedom to change it was an important compromise.)
There are plenty of other things in this same category, and because we want to keep things as stable as possible while also improving performance and reliability, we have to keep pretty tight limits on what we promise will remain stable. Most of our discussions are about finding this balance ;)
Cheers, Steve

On Fri, Apr 1, 2022 at 4:06 AM Steve Dower <steve.dower@python.org> wrote:
The main difference is that 'immutables' offers you a stable/versioned interface to use it, while the one that's in CPython is an internal implementation detail. If one day we find a better design, we can just switch to it, while 'immutables' probably can't. If we've exposed as a public interface in the core runtime, it's much more complicated.
I don't understand the issue here: If we expose a "frozendict" as built in python object then only the API of that object needs to remain stable, not the implementation. And it seems that's an API that is already clearly defined. + 1 from me -- just the other day I was wishing it was there. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, 2 Apr 2022 at 02:30, Christopher Barker <pythonchb@gmail.com> wrote:
On Fri, Apr 1, 2022 at 4:06 AM Steve Dower <steve.dower@python.org> wrote:
The main difference is that 'immutables' offers you a stable/versioned interface to use it, while the one that's in CPython is an internal implementation detail. If one day we find a better design, we can just switch to it, while 'immutables' probably can't. If we've exposed as a public interface in the core runtime, it's much more complicated.
I don't understand the issue here:
If we expose a "frozendict" as built in python object then only the API of that object needs to remain stable, not the implementation.
And it seems that's an API that is already clearly defined.
+ 1 from me -- just the other day I was wishing it was there.
There would presumably need to be be a C API as well, and that would probably expose more of the implementation unless handled carefully. Without seeing an actual implementation, it's hard to know for sure. Paul

On 02Apr2022 0925, Paul Moore wrote:
On Sat, 2 Apr 2022 at 02:30, Christopher Barker <pythonchb@gmail.com> wrote:
On Fri, Apr 1, 2022 at 4:06 AM Steve Dower <steve.dower@python.org> wrote:
The main difference is that 'immutables' offers you a stable/versioned interface to use it, while the one that's in CPython is an internal implementation detail. If one day we find a better design, we can just switch to it, while 'immutables' probably can't. If we've exposed as a public interface in the core runtime, it's much more complicated.
I don't understand the issue here:
If we expose a "frozendict" as built in python object then only the API of that object needs to remain stable, not the implementation.
And it seems that's an API that is already clearly defined.
+ 1 from me -- just the other day I was wishing it was there.
There would presumably need to be be a C API as well, and that would probably expose more of the implementation unless handled carefully. Without seeing an actual implementation, it's hard to know for sure.
Yes, and the request in this thread was "expose the HAMT", not "create a new data type that uses it under the covers". My opposition to the new data type was that it *wasn't* frozendict, primarily because we decided that dicts had to preserve insertion order. So it was proposed as "frozenmap", but was more like a dict than map(). It's also slightly slower for lookups (as well as technically not O(1), though close enough for moderate N), and the primary benefit is reduced memory usage when you mutate it (but it's frozen...? ;) ). All of that kind of adds up to make it a distraction that's likely worse than a dict subclass with __setitem__ overridden (if that's the behaviour you want). It's a fantastic data structure for when you need it, but it's one that deserves to be researched and understood before simply dropping it into your code. An optional package is a perfectly good place for it. Cheers, Steve

All of that kind of adds up to make it a distraction that's likely worse than a dict subclass with __setitem__ overridden (if that's the behaviour you want). It's a fantastic data structure for when you need it, but it's one that deserves to be researched and understood before simply dropping it into your code. An optional package is a perfectly good place for it.
I agree with this. I just want to explain a little bit about why we want to mutate an immutable map. Immutable data structures have some nice features. Because of its immutable nature, immutable data structures are inherently thread-safe so much safer to use in a multi-thread environment. And it can be much faster to find the differences between two immutable data structures because we can replace != with is not , therefore quickly rule out all unchanged data. The immutable nature is actually very powerful when you want to change a map(especially with nested maps in it) often, then need to find the differences in other places. It sometimes reminds me of git. With immutable data, we can keep every snapshot of data history in memory. Reduced memory usage is also important. I think immutable map and frozendict are actually two different data types because they have different intentions. The author of immutable.js gives a great talk about how immutable data structure is implemented and why it’s powerful. React.js actually relies a lot on immutable data structures. Here is the youtube link to the talk: <https://www.youtube.com/watch?v=I7IdS-PbEgI&t=1005s> https://www.youtube.com/watch?v=I7IdS-PbEgI&t=1005s There is also a CppCon talk about immutable data structure: <https://www.youtube.com/watch?v=sPhpelUfu8Q&t=412s> https://www.youtube.com/watch?v=sPhpelUfu8Q&t=412s

On Sat, Apr 2, 2022 at 6:30 AM Steve Dower <steve.dower@python.org> wrote:
+ 1 from me -- just the other day I was wishing it was there.
There would presumably need to be be a C API as well, and that would probably expose more of the implementation unless handled carefully. Without seeing an actual implementation, it's hard to know for sure.
Sure -- as i understand it, there is increasing effort to define the various levels and stabilities of the C API -- it doesn't seem an intractable problem to define a new type as having an unstable API for a period of time. And as HAMT is there, it's gotten at least a little use. And if we want to hammer out the details, giving it wider use would be a way to do that.
Yes, and the request in this thread was "expose the HAMT", not "create a new data type that uses it under the covers".
OK, sure, but we were also pointed to PEP 603 -- which seems to have stalled. I'm supportive of the general idea -- even if a few details need to be resolved. My opposition to the new data type was that it *wasn't* frozendict,
primarily because we decided that dicts had to preserve insertion order. So it was proposed as "frozenmap", but was more like a dict than map().
That's the challenge of naming anything ;-) For my part, yes, I'd rather an immutable Mapping that is as much like dict as possible. I just went and read the whole thread in discuss, and this one quote reflects my take as well: """ I’m not sure whether the specific implementation and API of this frozenmap is a good one, but some kind of immutable, hashable mapping seems to be commonly requested and commonly re-invented. """ Then there's Raymond's point: "But the PEP is about something different – it is algorithm focused." So yes, I get what the issue is now. Then there is the end of the thread: "I think the hold-up is Yury having time to continue pushing the PEP forward." So it's stalled out, rather than being rejected due to any intractable issues. For my part: I think there is substantial demand for an immutable Mapping type. And having it in the stdlib would be a real benefit -- as it happens, last week I wanted it not for my own code, but for a little prototype of a "anonymous names tuple" as discussed on python-ideas -- no idea if that is going anywhere, but there have got to be multiple potential uses in the stdlib. Another thing to keep in mind is that for most uses, the exact performance characteristics really don't matter much. For me, that points to an implementation that mimics the dict API (i.e. preserves order). If someone needs an implementation that has specific performance characteristics that are different, and doesn't need key order preservation, then THAT can be a special case. Which, IIUC, is what the current HAMT in cPython is about. Anyway, it would be nice for someone to take up the mantle of getting an immutable Mapping into the stdlib. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, Apr 02, 2022 at 09:28:55AM -0700, Christopher Barker wrote:
Anyway, it would be nice for someone to take up the mantle of getting an immutable Mapping into the stdlib.
Here is a "FrozenMapping" class that could be added to collections. I have tested it ~~extensively~~ for nearly two minutes and haven't been able to break it, so it must be good *wink* ``` from collections.abc import Mapping from types import MappingProxyType class FrozenMapping(Mapping): __slots__ = ('_proxy', '_hash') def __new__(cls, *args, **kwargs): a = dict(*args, **kwargs) obj = super().__new__(cls) obj._proxy = MappingProxyType(a) obj._hash = None return obj def __getitem__(self, key): return self._proxy[key] def __iter__(self): yield from self._proxy def __len__(self): return len(self._proxy) def copy(self): return self # We're immutable. def __reversed__(self): return reversed(self._proxy) def __or__(self, other): return type(self)(self._proxy | other) def __ror__(self, other): return type(self)(other | self._proxy) def __hash__(self): if self._hash is None: self._hash = hash(frozenset(self.items())) return self._hash def __repr__(self): items = ', '.join(['%r: %r' % t for t in self.items()]) return '%s({%s})' % (type(self).__name__, items) ``` I haven't considered pickling, or deep-copying. I don't think there is any way to get access to the underlying dict and modify it, except perhaps via ctypes, so I think it is as immutable as it is possible to get from Python code. Feel free to take that as a challenge to break it. Does anyone want to champion adding this to collections? I guess it will need at least a short PEP to justify that there are use-cases for frozen mappings. -- Steve

On Sun, 3 Apr 2022, 12:58 pm Steven D'Aprano, <steve@pearwood.info> wrote:
I haven't considered pickling, or deep-copying. I don't think there is any way to get access to the underlying dict and modify it, except perhaps via ctypes, so I think it is as immutable as it is possible to get from Python code. Feel free to take that as a challenge to break it.
MappingProxyType does expose the underlying mapping to the other operand during type coercion for binary operators (and trying to prevent that causes enough problems that we decided the problem wasn't worth fixing): https://bugs.python.org/issue43838 For a lot of use cases, MappingProxyType is already a "good enough" immutable dict, but it does have a definite discoverability problem. Cheers, Nick.

PEP 603 adds a frozenmap type to collections, implemented as HAMT. For a read-only *dict*, I proposed PEP 416 "Add a frozendict builtin type" in 2012. It was rejected. https://peps.python.org/pep-0416/ The outcome of this PEP was the addition of the types.MappingProxy type (previously, it already existed but it was somehow hidden: type(int.__dict__)). Victor

Thanks for the reminder Victor. For a read-only *dict*, I proposed PEP 416 "Add a frozendict builtin
type" in 2012. It was rejected. https://peps.python.org/pep-0416/
Note that that was ten years ago -- so maybe worth revisiting. The outcome of this PEP was the addition of the types.MappingProxy
type (previously, it already existed but it was somehow hidden: type(int.__dict__)).
It's interesting to read the justification for the rejection, among other things: "According to Raymond Hettinger, use of frozendict is low." Then it goes on to provide examples. And of course, the MappingProxy type is used for class dicts, and exposed for other code to use. However, what strikes me is that the list of reasons we don't need an immutable dict is jsu that -- it isn't strictly necessary -- which is not the same as it not being useful. And the MappingProxy is heavier weight and can't be hashed (i.e. used as a key in a dict or put in a set) In the last ten years, we have seen this brought up over and over again -- there really are quite a few use cases. I would love to see PEP 416 reconsidered. (note: it would need a touch of editing to cover the fact that dicts are now ordered) -CHB
Victor _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IQMBELHL... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
participants (8)
-
Christopher Barker
-
Nick Coghlan
-
Pablo Galindo Salgado
-
Paul Moore
-
Steve Dower
-
Steven D'Aprano
-
Victor Stinner
-
zhang kai