anyone need a frozenset or bytearray literal?

Completely obvious what it does, but it irritates my aesthetic sensibilities every time I see: frozenset({spam, eggs}) Why? Because I assume under the hood that creates a set of spam and eggs before calling frozenset to copy it into a new frozenset object before the original set is garbage collected. Wasteful. This is in fact what happens in CPython 3.7 today. I'd love to avoid this. I have no rational measured reason to believe it even matters (thus seeding this on python-ideas and not elsewhere), even though it would technically speed up frozenset creation. (a) detecting frozenset({}) as syntax to encode a frozenset in the python bytecode would be somewhat magical. it could break the person unfortunate enough to monkeypatch out the frozenset builtin (really? don't do that!). (b) abusing the concept of letter prefixes as we already have for strings on {} syntax would be possible but not at all obvious to a reader: f{} or c{} or r{} perhaps. but then someone would want a frozendict. (c) adding a .freeze() method to sets which would raise an exception if the set's refcount were > 1 and would mutate the type of the set object into a frozenset object in place. refcount assertions are bad, not all VMs need refcounts. The concept of a method that can mutate the type of the underlying object in place is... unpythonic. even though technically possible to implement within CPython. I'm -1 on all of my ideas above. But figured I'd toss them out there as food for thought for the concept. We lived for years without even having a set literal in the first place. So this isn't a big deal. frozenset is not the only base type that lacks a literals leading to loading values into these types involving creation of an intermediate throwaway object: bytearray. bytearray(b'short lived bytes object') I was going to suggest complex was in this boat as well, but upon investigation we appear to do constant folding (or amazing parsingon that so 0.3+0.6j does turn into a single LOAD_CONST instead of two consts and an addition. Nice! Not that I expect practical code to use complex numbers. -gps

I completely get your pain, the Copy seems like a waste of ressource. However I think making an optimisation on the C-Level is better than introducing the litteral, because Python is a general purpose langauge and most of the appplication don't need frozenset or bytearrays and that would clutter the base elements one must know and introduce questions on beginners. So yes, I'm +1 on your (a) solution if it is completely a C-Level optimisation. Your (c) solution is an idea. Again, C-Level optimisation, because usuability-wise, what's wrong with a copy of a hash table ? The problem with the f{} syntax is that the letters come from nowhere, frozenset{'hello', 'world'} would be a better syntax but it looks like "{} is a slicing operator like frozenset[''hello', 'world'] would call frozetset.__getitem__( ('hello', 'world') ). Le jeu. 12 juil. 2018 à 01:26, Gregory P. Smith <greg@krypto.org<mailto:greg@krypto.org>> a écrit : Completely obvious what it does, but it irritates my aesthetic sensibilities every time I see: frozenset({spam, eggs}) Why? Because I assume under the hood that creates a set of spam and eggs before calling frozenset to copy it into a new frozenset object before the original set is garbage collected. Wasteful. This is in fact what happens in CPython 3.7 today. I'd love to avoid this. I have no rational measured reason to believe it even matters (thus seeding this on python-ideas and not elsewhere), even though it would technically speed up frozenset creation. (a) detecting frozenset({}) as syntax to encode a frozenset in the python bytecode would be somewhat magical. it could break the person unfortunate enough to monkeypatch out the frozenset builtin (really? don't do that!). (b) abusing the concept of letter prefixes as we already have for strings on {} syntax would be possible but not at all obvious to a reader: f{} or c{} or r{} perhaps. but then someone would want a frozendict. (c) adding a .freeze() method to sets which would raise an exception if the set's refcount were > 1 and would mutate the type of the set object into a frozenset object in place. refcount assertions are bad, not all VMs need refcounts. The concept of a method that can mutate the type of the underlying object in place is... unpythonic. even though technically possible to implement within CPython. I'm -1 on all of my ideas above. But figured I'd toss them out there as food for thought for the concept. We lived for years without even having a set literal in the first place. So this isn't a big deal. frozenset is not the only base type that lacks a literals leading to loading values into these types involving creation of an intermediate throwaway object: bytearray. bytearray(b'short lived bytes object') I was going to suggest complex was in this boat as well, but upon investigation we appear to do constant folding (or amazing parsingon that so 0.3+0.6j does turn into a single LOAD_CONST instead of two consts and an addition. Nice! Not that I expect practical code to use complex numbers. -gps _______________________________________________ Python-ideas mailing list Python-ideas@python.org<mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On Wed, Jul 11, 2018 at 4:45 PM Jelle Zijlstra <jelle.zijlstra@gmail.com> wrote:
Neat optimization. I hadn't considered that. We do know for sure it is a builtin type at that point. If that were implemented, bytes objects could gain a to_bytearray() (along the lines of the int.to_bytes() API) method that could be optimized away in literal circumstances. -gps

On Thu, Jul 12, 2018 at 10:13 AM, Gregory P. Smith <greg@krypto.org> wrote:
Be careful: a bytearray is mutable, so this isn't open to very many optimizations. A .freeze() method on sets would allow a set display to become a frozenset "literal", stored as a constant on the corresponding function object, the way a tuple is; but that's safe because the frozenset doesn't need to concern itself with identity, only value. Example: def f(x): a = (1, 2, 3) # can be optimized b = (x, 4, 5) # cannot c = [6, 7, 8] # cannot Disassemble this or look at f.__code__.co_consts and you'll see (1, 2, 3) as a single constant; but the others have to be built. +1 on set.freeze(); +0 on bytes.to_bytearray(). ChrisA

{1,2,7}.freeze() or {1,2,7}.to_frozenset() seem very natural and if this can be optimized to avoid the copy, it's perfect. For bytearray, one use case would be to optimise bytearray([1,2,7,2]) in something like [1,2,7,2].to_byterray(). About bytes, one could have (1,2,7,2).to_bytes() instead of bytes((1,2,7,2)) because b'\x01\x02\x07\x02' is long and boring. What about variables in the values {1,2,x}.freeze() should work too ? bytes((1,2,7,x)) is not writable as a b string and creates a copy. Le jeu. 12 juil. 2018 à 02:24, Chris Angelico <rosuav@gmail.com<mailto:rosuav@gmail.com>> a écrit : On Thu, Jul 12, 2018 at 10:13 AM, Gregory P. Smith <greg@krypto.org<mailto:greg@krypto.org>> wrote:
Be careful: a bytearray is mutable, so this isn't open to very many optimizations. A .freeze() method on sets would allow a set display to become a frozenset "literal", stored as a constant on the corresponding function object, the way a tuple is; but that's safe because the frozenset doesn't need to concern itself with identity, only value. Example: def f(x): a = (1, 2, 3) # can be optimized b = (x, 4, 5) # cannot c = [6, 7, 8] # cannot Disassemble this or look at f.__code__.co_consts and you'll see (1, 2, 3) as a single constant; but the others have to be built. +1 on set.freeze(); +0 on bytes.to_bytearray(). ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org<mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

I know of many use cases for frozenset({...}) and I think a hack along those lines is fine -- but what's the common use for bytearray(b"...") or bytearray((...))? Almost invariably a bytearray is created empty or as a given number of zeros. I certainly wouldn't want to burden the tuple type with a to_bytearray() method, the types are unrelated (unlike set and frozenset). On Wed, Jul 11, 2018 at 6:03 PM, Robert Vanden Eynde < robertvandeneynde@hotmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Wed, Jul 11, 2018 at 6:13 PM Guido van Rossum <guido@python.org> wrote:
Agreed, bytearray(b'...') should be way less common. I don't immediately have a use for that beyond merely than disliking the copy from temporary bytes object and gc behind the scenes. I could find a practical use for it in micropython where ram is extremely limited, but that VM could already implement such a compile time optimization on its own. The concept of the optimization that'd be required just seemed similar to that of frozenset to me. frozenset is the one that led me down this train of thought as I was looking at code declaring a bunch on constants. -gps

Hi all, While we are at it, could {"a":1, "b":2}.freeze() perhaps create a MappingProxyType? I've a gist at https://gist.github.com/stephanh42/d277170dd8a3a2f026c272a4fda15396 with a stand-alone freeze function which attempts to convert objects to a read-only version, and dict -> MappingProxyType is one of the transforms. Stephan 2018-07-12 7:33 GMT+02:00 Gregory P. Smith <greg@krypto.org>:

12.07.18 08:33, Gregory P. Smith пише:
You can't avoid this since bytearray is mutable. The constant bytes argument can be shared, but the content of a new bytearray needs to be copied. a = b'abc' b = bytearray(b'abc') # should make a copy c = bytearray(b'abc') # should make a copy b[0] = 0 assert c[0] == 97 Although there is a possibility to apply in bytearray the same optimization as was made in BytesIO. The bytearray object can use an internal mutable bytes object for storing a content instead of a raw array. The constructor can save a reference to the passed bytes object, this is O(1) operation. bytes(bytearray) could just return a reference to that bytes object, it is O(1) too. Any mutating operation should check the refcount and make a copy if it is not 1. This will complicate the code, and I'm not sure if it is worth.

12.07.18 02:25, Gregory P. Smith пише:
It is just an implementation detail that set and frozenset use the same internal representation, and just have different types. But it is possible to implement more efficient representation for frozensets. I have some ideas and could implement them if Raymond allows.

On Wed, Jul 11, 2018 at 04:25:56PM -0700, Gregory P. Smith wrote:
+1 to the idea of a new set method, freeze, returning a frozenset, and allowing the interpreter to optimize calls like: {a, b, c}.freeze() to skip making the temporary set. I seem to have a vague recollection that there was already a CPython optimization in place that substitutes a frozenset for certain set displays... ah yes, here it is: py> dis.dis("x in {2, 3}") 1 0 LOAD_NAME 0 (x) 3 LOAD_CONST 2 (frozenset({2, 3})) 6 COMPARE_OP 6 (in) 9 RETURN_VALUE Nice! -- Steve

I completely get your pain, the Copy seems like a waste of ressource. However I think making an optimisation on the C-Level is better than introducing the litteral, because Python is a general purpose langauge and most of the appplication don't need frozenset or bytearrays and that would clutter the base elements one must know and introduce questions on beginners. So yes, I'm +1 on your (a) solution if it is completely a C-Level optimisation. Your (c) solution is an idea. Again, C-Level optimisation, because usuability-wise, what's wrong with a copy of a hash table ? The problem with the f{} syntax is that the letters come from nowhere, frozenset{'hello', 'world'} would be a better syntax but it looks like "{} is a slicing operator like frozenset[''hello', 'world'] would call frozetset.__getitem__( ('hello', 'world') ). Le jeu. 12 juil. 2018 à 01:26, Gregory P. Smith <greg@krypto.org<mailto:greg@krypto.org>> a écrit : Completely obvious what it does, but it irritates my aesthetic sensibilities every time I see: frozenset({spam, eggs}) Why? Because I assume under the hood that creates a set of spam and eggs before calling frozenset to copy it into a new frozenset object before the original set is garbage collected. Wasteful. This is in fact what happens in CPython 3.7 today. I'd love to avoid this. I have no rational measured reason to believe it even matters (thus seeding this on python-ideas and not elsewhere), even though it would technically speed up frozenset creation. (a) detecting frozenset({}) as syntax to encode a frozenset in the python bytecode would be somewhat magical. it could break the person unfortunate enough to monkeypatch out the frozenset builtin (really? don't do that!). (b) abusing the concept of letter prefixes as we already have for strings on {} syntax would be possible but not at all obvious to a reader: f{} or c{} or r{} perhaps. but then someone would want a frozendict. (c) adding a .freeze() method to sets which would raise an exception if the set's refcount were > 1 and would mutate the type of the set object into a frozenset object in place. refcount assertions are bad, not all VMs need refcounts. The concept of a method that can mutate the type of the underlying object in place is... unpythonic. even though technically possible to implement within CPython. I'm -1 on all of my ideas above. But figured I'd toss them out there as food for thought for the concept. We lived for years without even having a set literal in the first place. So this isn't a big deal. frozenset is not the only base type that lacks a literals leading to loading values into these types involving creation of an intermediate throwaway object: bytearray. bytearray(b'short lived bytes object') I was going to suggest complex was in this boat as well, but upon investigation we appear to do constant folding (or amazing parsingon that so 0.3+0.6j does turn into a single LOAD_CONST instead of two consts and an addition. Nice! Not that I expect practical code to use complex numbers. -gps _______________________________________________ Python-ideas mailing list Python-ideas@python.org<mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On Wed, Jul 11, 2018 at 4:45 PM Jelle Zijlstra <jelle.zijlstra@gmail.com> wrote:
Neat optimization. I hadn't considered that. We do know for sure it is a builtin type at that point. If that were implemented, bytes objects could gain a to_bytearray() (along the lines of the int.to_bytes() API) method that could be optimized away in literal circumstances. -gps

On Thu, Jul 12, 2018 at 10:13 AM, Gregory P. Smith <greg@krypto.org> wrote:
Be careful: a bytearray is mutable, so this isn't open to very many optimizations. A .freeze() method on sets would allow a set display to become a frozenset "literal", stored as a constant on the corresponding function object, the way a tuple is; but that's safe because the frozenset doesn't need to concern itself with identity, only value. Example: def f(x): a = (1, 2, 3) # can be optimized b = (x, 4, 5) # cannot c = [6, 7, 8] # cannot Disassemble this or look at f.__code__.co_consts and you'll see (1, 2, 3) as a single constant; but the others have to be built. +1 on set.freeze(); +0 on bytes.to_bytearray(). ChrisA

{1,2,7}.freeze() or {1,2,7}.to_frozenset() seem very natural and if this can be optimized to avoid the copy, it's perfect. For bytearray, one use case would be to optimise bytearray([1,2,7,2]) in something like [1,2,7,2].to_byterray(). About bytes, one could have (1,2,7,2).to_bytes() instead of bytes((1,2,7,2)) because b'\x01\x02\x07\x02' is long and boring. What about variables in the values {1,2,x}.freeze() should work too ? bytes((1,2,7,x)) is not writable as a b string and creates a copy. Le jeu. 12 juil. 2018 à 02:24, Chris Angelico <rosuav@gmail.com<mailto:rosuav@gmail.com>> a écrit : On Thu, Jul 12, 2018 at 10:13 AM, Gregory P. Smith <greg@krypto.org<mailto:greg@krypto.org>> wrote:
Be careful: a bytearray is mutable, so this isn't open to very many optimizations. A .freeze() method on sets would allow a set display to become a frozenset "literal", stored as a constant on the corresponding function object, the way a tuple is; but that's safe because the frozenset doesn't need to concern itself with identity, only value. Example: def f(x): a = (1, 2, 3) # can be optimized b = (x, 4, 5) # cannot c = [6, 7, 8] # cannot Disassemble this or look at f.__code__.co_consts and you'll see (1, 2, 3) as a single constant; but the others have to be built. +1 on set.freeze(); +0 on bytes.to_bytearray(). ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org<mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

I know of many use cases for frozenset({...}) and I think a hack along those lines is fine -- but what's the common use for bytearray(b"...") or bytearray((...))? Almost invariably a bytearray is created empty or as a given number of zeros. I certainly wouldn't want to burden the tuple type with a to_bytearray() method, the types are unrelated (unlike set and frozenset). On Wed, Jul 11, 2018 at 6:03 PM, Robert Vanden Eynde < robertvandeneynde@hotmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Wed, Jul 11, 2018 at 6:13 PM Guido van Rossum <guido@python.org> wrote:
Agreed, bytearray(b'...') should be way less common. I don't immediately have a use for that beyond merely than disliking the copy from temporary bytes object and gc behind the scenes. I could find a practical use for it in micropython where ram is extremely limited, but that VM could already implement such a compile time optimization on its own. The concept of the optimization that'd be required just seemed similar to that of frozenset to me. frozenset is the one that led me down this train of thought as I was looking at code declaring a bunch on constants. -gps

Hi all, While we are at it, could {"a":1, "b":2}.freeze() perhaps create a MappingProxyType? I've a gist at https://gist.github.com/stephanh42/d277170dd8a3a2f026c272a4fda15396 with a stand-alone freeze function which attempts to convert objects to a read-only version, and dict -> MappingProxyType is one of the transforms. Stephan 2018-07-12 7:33 GMT+02:00 Gregory P. Smith <greg@krypto.org>:

12.07.18 08:33, Gregory P. Smith пише:
You can't avoid this since bytearray is mutable. The constant bytes argument can be shared, but the content of a new bytearray needs to be copied. a = b'abc' b = bytearray(b'abc') # should make a copy c = bytearray(b'abc') # should make a copy b[0] = 0 assert c[0] == 97 Although there is a possibility to apply in bytearray the same optimization as was made in BytesIO. The bytearray object can use an internal mutable bytes object for storing a content instead of a raw array. The constructor can save a reference to the passed bytes object, this is O(1) operation. bytes(bytearray) could just return a reference to that bytes object, it is O(1) too. Any mutating operation should check the refcount and make a copy if it is not 1. This will complicate the code, and I'm not sure if it is worth.

12.07.18 02:25, Gregory P. Smith пише:
It is just an implementation detail that set and frozenset use the same internal representation, and just have different types. But it is possible to implement more efficient representation for frozensets. I have some ideas and could implement them if Raymond allows.

On Wed, Jul 11, 2018 at 04:25:56PM -0700, Gregory P. Smith wrote:
+1 to the idea of a new set method, freeze, returning a frozenset, and allowing the interpreter to optimize calls like: {a, b, c}.freeze() to skip making the temporary set. I seem to have a vague recollection that there was already a CPython optimization in place that substitutes a frozenset for certain set displays... ah yes, here it is: py> dis.dis("x in {2, 3}") 1 0 LOAD_NAME 0 (x) 3 LOAD_CONST 2 (frozenset({2, 3})) 6 COMPARE_OP 6 (in) 9 RETURN_VALUE Nice! -- Steve
participants (8)
-
Chris Angelico
-
Gregory P. Smith
-
Guido van Rossum
-
Jelle Zijlstra
-
Robert Vanden Eynde
-
Serhiy Storchaka
-
Stephan Houben
-
Steven D'Aprano