[issue23406] interning and list comprehension leads to unexpected behavior

New submission from Abraham Smith: Some students were working on matrix routines for practice. The following code:
L = [ [0]*3 ]*3 for i in range(3): ... for j in range(3): ... if i==j: L[i][j]=1
was expected to return [[1,0,0],[0,1,0],[0,0,1]] but it returned [[1,1,1],[1,1,1],[1,1,1]] because the list [0]*3 was being interned silently, so all three rows were the same memory! To see this, I did
map(id, L) [139634871681464, 139634871681464, 139634871681464]
On the other hand
M=[ [ 0 for i in range(3) ] for j in range(3) ] does not intern: map(id, L) [139634871631672, 139634871681608, 139634871681680]
so the above loop works as expected. This is true in both python 2.7 and 3.4. This is very confusing to users! If this intern behavior with [0]*3 is intended, it should be documented more clearly, because this is something that new students of python might encounter right away when playing with the language's list methods. I didn't see any reference to interning in the discussion of lists in the standard library reference. Moreover, I also could not find any reference to the automatic interning of mutable objects, such as lists. Personally, I cannot see any reason to silently and automatically intern a mutable object; however, if this behavior is really desired, it should be documented. ---------- assignee: docs@python components: Documentation messages: 235520 nosy: Abraham.Smith, docs@python priority: normal severity: normal status: open title: interning and list comprehension leads to unexpected behavior versions: Python 2.7, Python 3.4 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Abraham Smith added the comment: (Obviously, there's a copy/paste mistake in the second case; it should read map(id, M).) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Georg Brandl added the comment: There is no interning going on. Multiplying lists just copies references. This is not so surprising if you consider that the case may be simple for nested lists, but what about ``[a] * 3`` with some arbitrary object "a"? Copying (or even deep copying) that object is usually not wanted, and impossible in general. This is also documented here (see especially note 2 below the table): https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-l... You're right though that this might be good to mention in the tutorial, as it comes up every now and then. I'll leave the issue open to discuss that. ---------- nosy: +georg.brandl _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Steven D'Aprano added the comment: This is already a FAQ: https://docs.python.org/2/faq/programming.html#how-do-i-create-a-multidimens... I guess this bites every beginning Python programmer, but it's a natural, and desirable, consequence of Python's object model and the fact that * does not copy the list items. ---------- nosy: +steven.daprano resolution: -> not a bug _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Abraham Smith added the comment: Thanks for the helpful responses and correcting my misunderstanding. Regarding improved documentation, I see now that the table at https://docs.python.org/2/library/stdtypes.html#id2 indeed says "shallow copies"; however, the footnote seems to bury the lede. Perhaps the footnote should be expanded to either link to the FAQ entry or provide an abbreviated version of it. The FAQ entry is actually very good, but I would guess that most readers (like me) skip the FAQs and jump straight to the library reference. Internet users have been trained for 20 years to believe that FAQs are full of useless, snarky answers to questions at a much shallower level, like "what do I do with a .tar.gz file?". The fact that Python's FAQ is extremely well-written and helpful is a pleasant surprise, but a surprise none-the-less. ---------- resolution: not a bug -> _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Changes by Berker Peksag <berker.peksag@gmail.com>: ---------- keywords: +easy stage: -> needs patch type: -> enhancement versions: +Python 3.5 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Matheus Vieira Portela added the comment: Does anyone else think the note should be expanded? For me, it seems to be pretty accurate although it may indeed be confusing to beginners. If anything, I can work on rewriting it to be more explanatory. ---------- nosy: +matheus.v.portela _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Steven D'Aprano added the comment: Which note are you referring to? There are at least two mentioned in this thread, the FAQ and a footnote in the docs for stdtypes. If you're referring to the table of operations just below these: https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-l... https://docs.python.org/3/library/stdtypes.html#common-sequence-operations where the docs say: s * n, n * s n shallow copies of s concatenated I think that could be worded better. It is too easy to misread it as saying that the items of s are copied (as I just did now, despite knowing that they aren't). I would word it: repeat s n times and concatenate which matches the common name of * as the sequence repetition operator, and avoids using the word prone to misinterpretation, "copy". Given how error-prone sequence repetition is, I'd add an example directly in the table: for example, [x]*3 returns [x, x, x] (note that x is not copied). ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

R. David Murray added the comment: Interesting. I usually start with a project's FAQ because I find they usually give me an overview of the project and an indication of its quality :) The footnote looks very explanatory to me already (complete with examples). The 'confusing to beginners' text could be made a link to the FAQ, though. ---------- nosy: +r.david.murray _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Matheus Vieira Portela added the comment: I was referring to the table of operations. So, what if I replace "n shallow copies of s concatenated" by "repeat s n times and concatenate (to create a multidimensional list, refer to [link to FAQ])"? ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

R. David Murray added the comment: I agree that the table entry could be made more precise. I would suggest replacing the table entry with "equivalent to adding s to itself n times". This formulation serves to explain *why* the multiply operation works the way it does:
a = [1, []] b = a * 4 c = a + a + a + a b [1, [], 1, [], 1, [], 1, []] c [1, [], 1, [], 1, [], 1, []] a.append(2) a [1, [], 2] b [1, [], 1, [], 1, [], 1, []] c [1, [], 1, [], 1, [], 1, []] a[1].append(3) a [1, [3], 2] b [1, [3], 1, [3], 1, [3], 1, [3]] c [1, [3], 1, [3], 1, [3], 1, [3]]
I don't think it is appropriate to put an example in the table; IMO that belongs in the footnote where it currently is. You could hyperlink the table entry to the FAQ entry, though. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Matheus Vieira Portela added the comment: I'm attaching a patch to update the stdtypes.rst documentation according to our discussion. I've replaced the table explanation to "equivalent to adding s to itself n times" and added a link to the FAQ entry. I'm not sure, however, where to put the FAQ link. Should it be directly in the table? In this patch, I've put it after the last line of note #2. ---------- keywords: +patch Added file: http://bugs.python.org/file40321/issue23406_doc_stdtypes.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Martin Panter added the comment: Left some review comments. The positioning of the link to the FAQ entry seems sensible to me, just that the markup could be better :) ---------- nosy: +martin.panter stage: needs patch -> patch review versions: +Python 3.6 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Matheus Vieira Portela added the comment: Applying review comments. Now, there is an internal link to the FAQ entry on multidimensional lists. ---------- Added file: http://bugs.python.org/file40337/issue23406_doc_stdtypes_and_faq.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

R. David Murray added the comment: Looks good to me. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Martin Panter added the comment: I was about to commit this, but I think I was wrong about one of my comments. The list that is multiplied is indeed shallow-copied. E.g. “[x] * 1” copies the list, but the reference to x is the same. I propose to commit the second patch, except to revert one of the sentences to the first patch, so it reads: Note also that the copies are shallow; nested structures are not copied but referenced. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

R. David Murray added the comment: What is actually happening is that the *contents* of the list are copied, but the list itself is not. This is a consequence of the definition in terms of +. So, yes, that is a shallow copy, but not quite in the sense that mylist.copy() is a shallow copy, since the references to the contents of s get appended to the list being constructed by *, not a new list that is a "copy" of s. You are correct that "s is only referenced" is not really accurate. But how about "Note that the contents of the *s* object are not copied, they are referenced multiple times". I think that highlights the source of the confusion: that the *contents* are not copied. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

R. David Murray added the comment: Sorry, I meant "references to the content are copied" in my first sentence there. This just goes to show why this is complicated to explain :) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

R. David Murray added the comment: Or better, "items in the list *s* are not copied..." ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Martin Panter added the comment: That works for me. Here is a new patch using David’s new wording. ---------- Added file: http://bugs.python.org/file40369/issue23406_doc_stdtypes.v3.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

R. David Murray added the comment: Looks good to me. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Roundup Robot added the comment: New changeset 6c222848badd by Martin Panter <vadmium> in branch '2.7': Issue #23406: Clarify documentation on multiplying a sequence https://hg.python.org/cpython/rev/6c222848badd New changeset 57f8c7ad7782 by Martin Panter <vadmium> in branch '3.4': Issue #23406: Clarify documentation on multiplying a sequence https://hg.python.org/cpython/rev/57f8c7ad7782 New changeset f624b7fd3b83 by Martin Panter <vadmium> in branch '3.5': Issue #23406: Merge 3.4 into 3.5 https://hg.python.org/cpython/rev/f624b7fd3b83 New changeset d2b3c7c5ef02 by Martin Panter <vadmium> in branch 'default': Issue #23406: Merge 3.5 into 3.6 https://hg.python.org/cpython/rev/d2b3c7c5ef02 ---------- nosy: +python-dev _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Changes by Martin Panter <vadmium+py@gmail.com>: ---------- resolution: -> fixed stage: patch review -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________

Roundup Robot added the comment: New changeset 2640a4630e9e by Martin Panter <vadmium> in branch '3.5': Issue #23406: Remove specific line number from susp-ignored.csv https://hg.python.org/cpython/rev/2640a4630e9e New changeset 48f1e9a47301 by Martin Panter <vadmium> in branch 'default': Issue #23406: Merge 3.5 into 3.6 https://hg.python.org/cpython/rev/48f1e9a47301 ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23406> _______________________________________
participants (8)
-
Abraham Smith
-
Berker Peksag
-
Georg Brandl
-
Martin Panter
-
Matheus Vieira Portela
-
R. David Murray
-
Roundup Robot
-
Steven D'Aprano