Proposal to change List Sequence Repetition (*) so it is not useless for Mutable Objects

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Currently, the use of the (*) operator on a list is to duplicate a list by creating multiple references to the same object. While this works intuitively for immutable objects (like [True] * 5) as these immutable references are replaced when the list is assigned to, it makes the operator nigh unusable for mutable objects. The most obvious case is when the operator is duplicated in a sequence like this: arr = [[True] * 5] * 5 This does not create a matrix-like arrangement of the immutable truth variable, but instead creates a list of 5 references to the same list, such that a following assignment like arr[2][3] = False will not change just that one index, but every 4th element of each list in the outer list. This also makes the sequence construction using a mutable type a problem. For example, assume a class Foo: class Foo: def __init__(self): self.val = True def set(self): self.val = False def __repr__(self): return str(self.val) If I then use sequence repetition to create a list of these like so: arr = [Foo()] * 5 This will create a list of references to the same Foo instance, making the list construction itself effectively meaningless. Running the set() method on any of the instances in the list is the same as running it on all the instances in the list. It is my opinion that the sequence repetition operator should be modified to make copies of the objects it is repeating, rather than copying references alone. I believe this would both be more intuitive from a semantic point of view and more useful for the developer. This would change the operator in a way that is mostly unseen in current usage ([5] * 3 would still result in [5, 5, 5]) while treating mutable nesting in a way that is more understandable from the apparent intent of the syntax construction. Reference regarding previous discussion: https://bugs.python.org/issue27135 -----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJXTSDZAAoJEF14rZEhZ/cMd24H/1p24+EYIALc7pBR5qbGpW20 oxHUWGfVaERizkhvuDAbO/n5sXUB5QHbh6MMwe2tn3TCWLstnvRhvJDR9ahKx7gm EChB4sIAKM/npUQge6ljLqP61m88p7LpnIVV6gF4PC0Wkyz8g2iSMjVwFv4XEBYZ /PNWXa4QLlNmqksrcQ7pYKZObYSjU8lNAEsCmtRy8PbBTvWq2f+YB9kcc79byFIs W0bhSI7x2iaicU24UC7FJAo4bSFKNZ8LDSEMZhu7gWhFxJ7wVsyxk6/RrZkptdCx z/DNqo9/Ggs4UJ9vo4cfCoX0723bejT0VG1K/EuYxWAXYOlNuICYIMQVNiZhkoo= =2/BD -----END PGP SIGNATURE-----

On 31.05.2016 07:27, Matthew Tanous wrote:
Some questions: * How would you determine whether a list element is mutable or not ? * How would you copy the elements ? * For which object types would you want to change the behavior ? I agree that the repeat operator can sometimes create confusing and unwanted object structures if not used correctly, but it's main purpose it to repeat the already existing objects, not to copy them, so the current behavior still is conceptually correct. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 31 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

I think this would be better addressed by an extension to list comprehensions, e.g. arr = [[True times 5] times 5] where [x times n] would be equivalent to [x for i in range(n)] Since this would evaluate x afresh for each elment, it would avoid all the thorny issues involved in trying to automatically deep-copy arbitrary objects. -- Greg

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Maybe I'm not following, but I don't see there being anything but a potential overhead penalty from copying immutable objects. If the object is immutable, copying a reference to it and copying the object itself seem transparently "identical" in terms of future use. I acknowledge that copying the objects is a potential issue, but I think this would be solved by making the sequence repetition operator functionally equivalent to the list comprehension, such that [x()] * 5 is the same, semantically, as [x() for i in range(5)]. Alternatively, these objects could be copied in the same manner as the deepcopy functionality, although this solution may not be the best way to do it. Ostensibly, I don't see why this wouldn't apply to all collection objects that use the sequence repetition operator (lists, tuples, etc.) to create a sequence. I agree with your description of the current behavior as "to repeat the currently existing objects". But it seems to me that except for some extremely special cases, this limits it to immutable objects, where (somewhat ironically) there is no functional difference between repeating the objects and copying them other than slight differences in memory usage. On 5/31/16 1:46 AM, M.-A. Lemburg wrote: list.
-----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJXToLwAAoJEF14rZEhZ/cMxqQIAIpVlRs3PymKNKcgdVnOmf3e rXWMRmr0T+XZzEsLU5bXy9o2dOx97xoxtmC5k57J5ak7qNiqy3SUPRecOEvuv/Xb U2c3GNQDyXcHlfcx3C57AD3uyM40u4KXBX4dNsaHMZ6NT986SwS4hV/k2y1gkp8W lcP5NkTA9PJCnqo+J6/UWSY9jERGScPGaGYygedmdZpUJsQKtW4dUNslMHpdO/cZ m8b35A5BC8TjRWk/arwLK2vEXHJs4SnZz7JyUgIhigskLQN/vpaDjJuwGOCJVXCx V5Q1pbk7cu6WlcBs1K+KiwzonpyNzrCjrdu69LDIL63q7mbOH08/xqwX5SdWTWA= =8zaV -----END PGP SIGNATURE-----

On 01.06.2016 08:38, Matthew Tanous wrote:
What you are describing is a duplication mechanism, not a repeat mechanism, so essentially reassigning the meaning of seq * number. I don't think that's in line with the way we handle backwards compatibility in Python. Please note that there are indeed valid use cases for repeating even mutable types, namely when you don't intend to mutate the contents of the objects, but are only interested in producing a readable data structure with repeated entries, e.g. for iteration or use as multi-dimensional constant in calculations. The duplication mechanism you have in mind can be implemented using a list comprehension, for example: import copy arr = [copy.deepcopy([True] * 5) for i in range(5)] It's probably better to add a copy.duplicate() API of sorts than to try to change the * operator on built-in sequences.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jun 01 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Matthew Tanous <mtanous22@...> writes:
Hi Matthew, I agree that when I was starting out with python this tripped me up a couple of times but Marc-Andre pointed out it is not necessarily easy to solve in a generic way. As a point of reference, I currently tend to do this via generators arr = list(Foo() for i in range(5)) or list comprehensions arr = [Foo() for i in range(5)] note that this works for the mutable and immutable case. Kind Regards, Lorenz

I can only agree here. Even today, despite knowing the fact, it's causing some headaches in some cases.
How about raising an exception if mutable objects are in the list ? It's a pretty big backward incompatible change, but: Pros: - it "works" in the general case - it avoids hidden bugs - it avoids a transition period between new and old behavior - it is possible to propose using a list comprehension instead in the exception message Cons: - backward incompatible (but changing the behavior of * is backward incompatible anyway) - maybe there are useful use cases of duplicating the reference ? Note that the same arguments could be taken for mutable default arguments, but an actual use case is to create a cache variable. Joseph

On Tue, May 31, 2016 at 01:36:31PM +0000, Joseph Martinot-Lagarde wrote:
-1 It's a gratuitous breakage that cannot solve the problem, because you cannot tell in advance which objects are mutable and which are not. The best you can do is recognise known built-ins. ("list is mutable, tuple is not, except when it contains a mutable item, but mymodule.MySequence may or may not be, there's no way to tell in advance.")
- maybe there are useful use cases of duplicating the reference ?
Absolutely. That's a standard way of grouping items taken from an iterable: py> it = iter("hello world!") py> for a, b, c in zip(*[it]*3): # groups of three ... print(a, b, c) ... h e l l o w o r l d ! This wouldn't work if the iterators were three independent copies. -- Steve

On 5/31/2016 10:16 AM, Steven D'Aprano wrote:
-1 also. As Steven has pointed out several times in other threads, Python is consistent in not copying objects unless requested. Thus a = [1,2,3] b = a # a reference copy, not an object copy, as some expect a[1] = 4 print(b[1]) # 4 -- a surprise for those who expect '=' to copy the list object Sequence * is an elaboration of the same issue. -- Terry Jan Reedy

-1 as well. I have used the generator multiplication many times before. The use of the same object helps a lot in certain cases. Changing that will cause a confusion among those who know the phenomenon and would break compatibility. Regarding your matrix example, maybe we can make use of the new '@' operator for copying new objects, what do you say? -- Bar Harel On Tue, May 31, 2016 at 8:01 PM Terry Reedy <tjreedy@udel.edu> wrote:

I could see the use of the operator '@' to construct a matrix, perhaps. It is more limited though (essentially a 2D version of sequence repetition). Semantically, it would seem to me intuitive that [Foo()] * N would create a list of N Foos, not a list of N references to the same Foo. I can see that people who are used to how it actually works now would have issues with such a change in the semantics, however. Didn't consider the zip(*[it]*3) case being broken, I admit.

On 31.05.2016 07:27, Matthew Tanous wrote:
Some questions: * How would you determine whether a list element is mutable or not ? * How would you copy the elements ? * For which object types would you want to change the behavior ? I agree that the repeat operator can sometimes create confusing and unwanted object structures if not used correctly, but it's main purpose it to repeat the already existing objects, not to copy them, so the current behavior still is conceptually correct. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 31 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

I think this would be better addressed by an extension to list comprehensions, e.g. arr = [[True times 5] times 5] where [x times n] would be equivalent to [x for i in range(n)] Since this would evaluate x afresh for each elment, it would avoid all the thorny issues involved in trying to automatically deep-copy arbitrary objects. -- Greg

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Maybe I'm not following, but I don't see there being anything but a potential overhead penalty from copying immutable objects. If the object is immutable, copying a reference to it and copying the object itself seem transparently "identical" in terms of future use. I acknowledge that copying the objects is a potential issue, but I think this would be solved by making the sequence repetition operator functionally equivalent to the list comprehension, such that [x()] * 5 is the same, semantically, as [x() for i in range(5)]. Alternatively, these objects could be copied in the same manner as the deepcopy functionality, although this solution may not be the best way to do it. Ostensibly, I don't see why this wouldn't apply to all collection objects that use the sequence repetition operator (lists, tuples, etc.) to create a sequence. I agree with your description of the current behavior as "to repeat the currently existing objects". But it seems to me that except for some extremely special cases, this limits it to immutable objects, where (somewhat ironically) there is no functional difference between repeating the objects and copying them other than slight differences in memory usage. On 5/31/16 1:46 AM, M.-A. Lemburg wrote: list.
-----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJXToLwAAoJEF14rZEhZ/cMxqQIAIpVlRs3PymKNKcgdVnOmf3e rXWMRmr0T+XZzEsLU5bXy9o2dOx97xoxtmC5k57J5ak7qNiqy3SUPRecOEvuv/Xb U2c3GNQDyXcHlfcx3C57AD3uyM40u4KXBX4dNsaHMZ6NT986SwS4hV/k2y1gkp8W lcP5NkTA9PJCnqo+J6/UWSY9jERGScPGaGYygedmdZpUJsQKtW4dUNslMHpdO/cZ m8b35A5BC8TjRWk/arwLK2vEXHJs4SnZz7JyUgIhigskLQN/vpaDjJuwGOCJVXCx V5Q1pbk7cu6WlcBs1K+KiwzonpyNzrCjrdu69LDIL63q7mbOH08/xqwX5SdWTWA= =8zaV -----END PGP SIGNATURE-----

On 01.06.2016 08:38, Matthew Tanous wrote:
What you are describing is a duplication mechanism, not a repeat mechanism, so essentially reassigning the meaning of seq * number. I don't think that's in line with the way we handle backwards compatibility in Python. Please note that there are indeed valid use cases for repeating even mutable types, namely when you don't intend to mutate the contents of the objects, but are only interested in producing a readable data structure with repeated entries, e.g. for iteration or use as multi-dimensional constant in calculations. The duplication mechanism you have in mind can be implemented using a list comprehension, for example: import copy arr = [copy.deepcopy([True] * 5) for i in range(5)] It's probably better to add a copy.duplicate() API of sorts than to try to change the * operator on built-in sequences.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jun 01 2016)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Matthew Tanous <mtanous22@...> writes:
Hi Matthew, I agree that when I was starting out with python this tripped me up a couple of times but Marc-Andre pointed out it is not necessarily easy to solve in a generic way. As a point of reference, I currently tend to do this via generators arr = list(Foo() for i in range(5)) or list comprehensions arr = [Foo() for i in range(5)] note that this works for the mutable and immutable case. Kind Regards, Lorenz

I can only agree here. Even today, despite knowing the fact, it's causing some headaches in some cases.
How about raising an exception if mutable objects are in the list ? It's a pretty big backward incompatible change, but: Pros: - it "works" in the general case - it avoids hidden bugs - it avoids a transition period between new and old behavior - it is possible to propose using a list comprehension instead in the exception message Cons: - backward incompatible (but changing the behavior of * is backward incompatible anyway) - maybe there are useful use cases of duplicating the reference ? Note that the same arguments could be taken for mutable default arguments, but an actual use case is to create a cache variable. Joseph

On Tue, May 31, 2016 at 01:36:31PM +0000, Joseph Martinot-Lagarde wrote:
-1 It's a gratuitous breakage that cannot solve the problem, because you cannot tell in advance which objects are mutable and which are not. The best you can do is recognise known built-ins. ("list is mutable, tuple is not, except when it contains a mutable item, but mymodule.MySequence may or may not be, there's no way to tell in advance.")
- maybe there are useful use cases of duplicating the reference ?
Absolutely. That's a standard way of grouping items taken from an iterable: py> it = iter("hello world!") py> for a, b, c in zip(*[it]*3): # groups of three ... print(a, b, c) ... h e l l o w o r l d ! This wouldn't work if the iterators were three independent copies. -- Steve

On 5/31/2016 10:16 AM, Steven D'Aprano wrote:
-1 also. As Steven has pointed out several times in other threads, Python is consistent in not copying objects unless requested. Thus a = [1,2,3] b = a # a reference copy, not an object copy, as some expect a[1] = 4 print(b[1]) # 4 -- a surprise for those who expect '=' to copy the list object Sequence * is an elaboration of the same issue. -- Terry Jan Reedy

-1 as well. I have used the generator multiplication many times before. The use of the same object helps a lot in certain cases. Changing that will cause a confusion among those who know the phenomenon and would break compatibility. Regarding your matrix example, maybe we can make use of the new '@' operator for copying new objects, what do you say? -- Bar Harel On Tue, May 31, 2016 at 8:01 PM Terry Reedy <tjreedy@udel.edu> wrote:

I could see the use of the operator '@' to construct a matrix, perhaps. It is more limited though (essentially a 2D version of sequence repetition). Semantically, it would seem to me intuitive that [Foo()] * N would create a list of N Foos, not a list of N references to the same Foo. I can see that people who are used to how it actually works now would have issues with such a change in the semantics, however. Didn't consider the zip(*[it]*3) case being broken, I admit.
participants (11)
-
Bar Harel
-
Greg Ewing
-
Joseph Martinot-Lagarde
-
Lorenz
-
M.-A. Lemburg
-
Matthew Tanous
-
Michael Selik
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy
-
Yongsheng Cheng