I had the good fortune to lead a three day immersive training in Python for scientist technicians working with astronomical data from Hubble. The focus was basic, core Python, although the workaday environment is intensively NumPy and Pyfits related (also matplotlib). Students found this behavior somewhat confusing.
def f(arg, y=[]): y.append(arg) return y
r = f(10) r [10] r = f(11) r [10, 11] r.append(100) f(12) [10, 11, 100, 12]
Here's one of the explanations for this behavior: http://mail.python.org/pipermail/python-list/2007-March/1116072.html In the above example, y is bound to a mutable object at the time the function def is evaluated, not at runtime each time the function is called. Once this object starts filling with data, it doesn't go out of scope but persists between function calls -- even though "y" is not in the globals namespace (it "lives" in the function). It gets worse:
f(11,[]) [11] f(12,[]) [12] f(13) [10, 11, 100, 12, 13] f(14) [10, 11, 100, 12, 13, 14]
One would think the empty list passed in as y would "override" and/or "reinitialize" y to the empty list. Instead, the passed in argument is bound to y at runtime, then goes out of scope. Meanwhile, the object assigned to y at the time of evaluation is still there in the background, ready for duty if no 2nd argument is passed.
r = f(12, []) r [12] r = f(r) r [10, 11, 100, 12, 13, 14, [12]]
A metaphor I use is the "guard at the castle gate". At the time of evaluation (when the module is compiled and the function is defined -- whether or not it gets called), objects get stored in a closet at the gate entrance, and those parameters assigned to defaults will always get those same objects out of the closet, dust them off, and use them whenever nothing gets passed for the parameter to "run with" at the time the function is called. If nothing is handed over at "call time" then use whatever you've got in the closet, is the rule. If the default object is mutable and nothing is passed in, then the guard at the gate (the parameter in question) is bound to the "closet object" and does whatever work with that object, returning it to the closet when done. There's no re-evaluation or re-initialization of the default object at the time the function is called so if it's mutable and stuff got added or changed, then it returns to the closet in its changed form. It does not "revert" to some evaluation-time value.
del r def f(arg, y=[]): global r y.append(arg) r = y return y
r
Traceback (most recent call last): File "<pyshell#68>", line 1, in <module> r NameError: name 'r' is not defined
r = f(10) r [10] r = f(11) r [10, 11] r = [] r = f(11) r [10, 11, 11]
In the above case, r has been made a global variable inside the function. Assigning r to the empty list above merely rebinds it externally however, does not affect y's default object in the closet. When the function is run with no argument for y, r is rebound within the function to our growing default list object.
f(9) [10, 11, 11, 9] r [10, 11, 11, 9] f(12) [10, 11, 11, 9, 12] r [10, 11, 11, 9, 12] r = [] f(12) [10, 11, 11, 9, 12, 12]
Again, it does no good to set r to the empty list with the expectation of reaching the y default in the castle closet. r is simply being rebound and is on the receiving end for y, which, in getting no arguments, simply reverts to using the growing list. At the end of the function call, however, r is bound to the same object as y (because of r = y, with r declared global), so we do have an opportunity to affect the y default object...
r[0]=999 r [999, 11, 11, 9, 12, 12] f(12) [999, 11, 11, 9, 12, 12, 12]
Ta dah!
r.pop(0) 999 r.pop(0) 11 r.pop(0) 11 r.pop(0) 9 r.pop(0) 12 r.pop(0) 12 r.pop(0) 12
The closet object has now had its members popped. We're back to an empty list to start with, thanks to runtime operations:
f(9) [9] f(9) [9, 9]
So what's a quick way to empty a list without rebinding, i.e. we don't want to pop everything or remove one by one. Nor do we want to end up with a bunch of None objects. Here's our global r:
r [9, 9]
Check it out with a new global: we're able to delete slices:
test [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] del test[0] test [1, 2, 3, 4, 5, 6, 7, 8, 9] del test[1:5] test [1, 6, 7, 8, 9]
... so that must mean we're able to delete all members of a list, without rebinding to a new list object:
del test[:] test []
Experimenting with our function some more:
f(11) [9, 9, 11] f(12) [9, 9, 11, 12] f(13) [9, 9, 11, 12, 13]
So this is r before:
r [9, 9, 11, 12, 13]
And this is r after:
del r[:] r []
... meaning the y's default object in the castle gate closet has likewise been emptied out, given r is bound to it.
f(13) [13] f(13) [13, 13]
... like starting over eh? This is safe syntax BTW: del [][:] In other words, it's safe to delete "all members" from an empty list without first checking to see if the list is empty or not.
r = [] del r[:] r []
Note that there's really no need to declare r global in order to reach in and change y, provided there's binding going on at the time of return:
del r def f(arg, y=[]): y.append(arg) return y
f(10) [10] f(11) [10, 11] f(12) [10, 11, 12] r
Traceback (most recent call last): File "<pyshell#148>", line 1, in <module> r NameError: name 'r' is not defined
r = f(13) r [10, 11, 12, 13] del r[:] r [] f(10) [10] f(11) [10, 11]
Kirby
They have every right to be confused. Who would guess that Python evaluates optional arguments once at compile-time? Heck, I've used Python for years and I wouldn't have been able to tell you off the top of my head how Python handles this code. That reminds me of a similar gotcha I've unfortunately run into on more than one occasion. # intialize a to be a list of 5 empty lists a = 5*[[]] # push a value onto the first list. a[0].append(1) What's a? --Mark (P.S. Stuff like this drives me nuts, which is why I find myself gravitating more and more to languages with a richer set of immutable data structures, e.g., Clojure/Scala/etc., where both of these gotchas would become a non-issue. I would love to see more immutable datatypes in Python other than numbers/strings/tuples, although realistically, I'd say that's unlikely to happen.)
Ouch, that's bizarre. On Fri, Apr 23, 2010 at 12:44 PM, Mark Engelberg < mark.engelberg@alumni.rice.edu> wrote:
They have every right to be confused. Who would guess that Python evaluates optional arguments once at compile-time? Heck, I've used Python for years and I wouldn't have been able to tell you off the top of my head how Python handles this code.
That reminds me of a similar gotcha I've unfortunately run into on more than one occasion.
# intialize a to be a list of 5 empty lists a = 5*[[]] # push a value onto the first list. a[0].append(1)
What's a?
--Mark
(P.S. Stuff like this drives me nuts, which is why I find myself gravitating more and more to languages with a richer set of immutable data structures, e.g., Clojure/Scala/etc., where both of these gotchas would become a non-issue. I would love to see more immutable datatypes in Python other than numbers/strings/tuples, although realistically, I'd say that's unlikely to happen.) _______________________________________________ Edu-sig mailing list Edu-sig@python.org http://mail.python.org/mailman/listinfo/edu-sig
Mark Engelberg wrote:
That reminds me of a similar gotcha I've unfortunately run into on more than one occasion.
# intialize a to be a list of 5 empty lists a = 5*[[]] # push a value onto the first list. a[0].append(1)
What's a?
The result actually makes sense, although I did guess it wrong. :>(
a = b = [1,2,3] c = [a,b] # a list with two references to the same object
id(c[0]) 3626848 id(c[1]) 3626848
c[0][0] = 5 a [5, 2, 3]
What is b? Would you rather have Python do something different? When taking shortcuts like a = 5*[[]], never trust, always verify. -- Dave
In the 5*[[]] example, the issue is again mutable objects. (list * int) does a shallow copy. This is the only thing that makes sense. There is no universal clone operation that you could use instead of the shallow copy. Issues with mutable objects are not going to go away just because we wish they would. I like to be able to write an initialization with immutable objects, like 5 *[None], and would not want to give that up because of the gotcha's with mutable objects. What one probably intends by 5 * [ [ ] ] is [ [ ] for i in range(5)] and this only works because [ ] is a literal, with a new version created each time through the list comprehension. Andy On Fri, Apr 23, 2010 at 4:41 PM, David MacQuigg <macquigg@ece.arizona.edu>wrote:
Mark Engelberg wrote:
That reminds me of a similar gotcha I've unfortunately run into on more than one occasion.
# intialize a to be a list of 5 empty lists a = 5*[[]] # push a value onto the first list. a[0].append(1)
What's a?
The result actually makes sense, although I did guess it wrong. :>(
a = b = [1,2,3] c = [a,b] # a list with two references to the same object
id(c[0]) 3626848 id(c[1]) 3626848
c[0][0] = 5 a [5, 2, 3]
What is b?
Would you rather have Python do something different?
When taking shortcuts like a = 5*[[]], never trust, always verify.
-- Dave
_______________________________________________ Edu-sig mailing list Edu-sig@python.org http://mail.python.org/mailman/listinfo/edu-sig
-- Andrew N. Harrington Director of Academic Programs Computer Science Department Loyola University Chicago 512B Lewis Towers (office) Snail mail to Lewis Towers 416 820 North Michigan Avenue Chicago, Illinois 60611 http://www.cs.luc.edu/~anh Phone: 312-915-7982 Fax: 312-915-7998 gpd@cs.luc.edu for graduate administration upd@cs.luc.edu for undergrad administration aharrin@luc.edu as professor
On Fri, Apr 23, 2010 at 2:41 PM, David MacQuigg <macquigg@ece.arizona.edu> wrote:
Would you rather have Python do something different?
My own preference is that I would like 5*[[]] to be syntactic sugar for: [ [ ] for i in range(5)] I find this to be more intuitive, and I believe this is what most people incorrectly assume the syntax expands to. If that's the way it worked, literals would always create fresh copies each time through the loop, but if you still want shared behavior, you could do it explicitly with something like: x = [] a = 5*x Anyway, every language has its shares of gotchas. It's just this is one I've gotten burned by more than once.
Mark Engelberg wrote:
On Fri, Apr 23, 2010 at 2:41 PM, David MacQuigg wrote:
Would you rather have Python do something different?
My own preference is that I would like 5*[[]] to be syntactic sugar for: [ [ ] for i in range(5)]
Then what about 5*[x], 5*[[x]], 5*[[3,x]], ... where x itself can be a list, a list of lists, or any complex object? How deep would you go in creating distinct objects? Would you want similar behavior for dictionaries, or is this just something special for lists? What about other operators? Is [x] + [x] no longer equivalent to 2*[x]? We need a clear, concise rule for when to generate distinct objects, and when to keep the current, and more efficient behavior - multiple references to the same object.
I find this to be more intuitive, and I believe this is what most people incorrectly assume the syntax expands to.
The important thing is not that our initial guess is right in every odd case, but that the syntax be simple and consistent, so we get it right if we think about it. My initial guess was wrong, but that is OK with me, because 1) I could have figured it out if I had taken the time, and 2) I usually don't take the time to figure these things out. I just pop an example in the interpreter, and see if I get what I want. The seemingly bizarre behavior of a = 5*[[]] is not so bizarre if you really understand the relationship between variables and objects in Python. It's simple, but it's different than other languages. Our students start with C, so when I explain Python, I really emphasize the difference. http://ece.arizona.edu/~edatools/ece175/Lecture/python-variables.htm
If that's the way it worked, literals would always create fresh copies each time through the loop, but if you still want shared behavior, you could do it explicitly with something like: x = [] a = 5*x
That changes the meaning of * for list objects from its current and very useful semantics of "extend this list", and five times empty is still empty.
Anyway, every language has its shares of gotchas.
Agreed. I think of language design as laying a carpet in an odd-shaped room (the real world of problems we need to solve). A good design like Python will be very smooth in the middle, and the wrinkles will be at the edges. I've never needed to construct a list of lists, identical in value, but distinct objects in memory, so this is an edge-case for me. Bill's example of accidental scoping is a much more troublesome gotcha. Even if you are not pushing the edges, a simple mis-spelling coupled with Python's complex implicit scoping rules, can result in an error not detected by the interpreter or even by pychecker. If I were designing a language, external variables would always be explicitly declared, and the rule would be simple. An external variable refers to the nearest enclosing scope in which that variable is assigned a value. No difference if the enclosing scope is a class or another function or the module itself. No need for a 'global' keyword that actually means "module level", not truly global. -- Dave
Here are some challenges to test students' understanding of how Python handles objects in memory. Scroll down one line at a time and see if you can guess True or False on each equality or identity test. ### More on copying of complex objects
from copy import copy, deepcopy # functions for shallow and deep copies x <__main__.X object at 0x14cab50>
a = 2*[[x]] a [[<__main__.X object at 0x14cab50>], [<__main__.X object at 0x14cab50>]] a[0] == a[1] # equal in value True a[0] is a[1] # same object True
a = [[x] for i in range(2)] # trick to make distinct objects a = [copy([x]) for i in range(2)] # same, but more clear a [[<__main__.X object at 0x14cab50>], [<__main__.X object at 0x14cab50>]] a[0] is a[1] False
b = a # new label for same object b is a True
b = a[:] # common Python idiom for shallow copy b = copy(a) # same, but more clear b [[<__main__.X object at 0x14cab50>], [<__main__.X object at 0x14cab50>]] b == a True b is a False b[0] is a[0] True b[0] is b[1] False
b = deepcopy(a) # clone everything b [[<__main__.X object at 0x14d1d90>], [<__main__.X object at 0x14d1d90>]] b == a # careful, object x has no "value" # if an object has no value, it can't be == to anything but itself False
b[0] is a[0] False b[0] is b[1] False
b[0][0] <__main__.X object at 0x14d1d90> b[0][0] == a[0][0] False b[0][0] == b[1][0] True
-- ************************************************************ * * David MacQuigg, PhD email: macquigg at ece.arizona.edu * * * Research Associate phone: USA 520-721-4583 * * * * ECE Department, University of Arizona * * * * 9320 East Mikelyn Lane * * * * http://purl.net/macquigg Tucson, Arizona 85710 * ************************************************************ * David MacQuigg wrote:
Mark Engelberg wrote:
On Fri, Apr 23, 2010 at 2:41 PM, David MacQuigg wrote:
Would you rather have Python do something different?
My own preference is that I would like 5*[[]] to be syntactic sugar for: [ [ ] for i in range(5)]
Then what about 5*[x], 5*[[x]], 5*[[3,x]], ... where x itself can be a list, a list of lists, or any complex object? How deep would you go in creating distinct objects? Would you want similar behavior for dictionaries, or is this just something special for lists? What about other operators? Is [x] + [x] no longer equivalent to 2*[x]? We need a clear, concise rule for when to generate distinct objects, and when to keep the current, and more efficient behavior - multiple references to the same object.
I find this to be more intuitive, and I believe this is what most people incorrectly assume the syntax expands to.
The important thing is not that our initial guess is right in every odd case, but that the syntax be simple and consistent, so we get it right if we think about it. My initial guess was wrong, but that is OK with me, because 1) I could have figured it out if I had taken the time, and 2) I usually don't take the time to figure these things out. I just pop an example in the interpreter, and see if I get what I want.
The seemingly bizarre behavior of a = 5*[[]] is not so bizarre if you really understand the relationship between variables and objects in Python. It's simple, but it's different than other languages. Our students start with C, so when I explain Python, I really emphasize the difference. http://ece.arizona.edu/~edatools/ece175/Lecture/python-variables.htm
If that's the way it worked, literals would always create fresh copies each time through the loop, but if you still want shared behavior, you could do it explicitly with something like: x = [] a = 5*x
That changes the meaning of * for list objects from its current and very useful semantics of "extend this list", and five times empty is still empty.
Anyway, every language has its shares of gotchas.
Agreed. I think of language design as laying a carpet in an odd-shaped room (the real world of problems we need to solve). A good design like Python will be very smooth in the middle, and the wrinkles will be at the edges. I've never needed to construct a list of lists, identical in value, but distinct objects in memory, so this is an edge-case for me.
Bill's example of accidental scoping is a much more troublesome gotcha. Even if you are not pushing the edges, a simple mis-spelling coupled with Python's complex implicit scoping rules, can result in an error not detected by the interpreter or even by pychecker.
If I were designing a language, external variables would always be explicitly declared, and the rule would be simple. An external variable refers to the nearest enclosing scope in which that variable is assigned a value. No difference if the enclosing scope is a class or another function or the module itself. No need for a 'global' keyword that actually means "module level", not truly global.
-- Dave
Default mutables are very confusing indeed. The recommended procedure is to assign the default to be None, then check for that value in the function and do the assignment there, such as: def myFun(myList=None): if myList == None: myList = [] ... So is the example of replicating a mutable as Mark showed. To round out the top three, which are the ones that get even my best students, look at the below: myVar = 27 def myFun (): print myVar myVar += 1 return myVar print myFun What happens when you run this? >>>bill<<< kirby urner wrote:
I had the good fortune to lead a three day immersive training in Python for scientist technicians working with astronomical data from Hubble.
The focus was basic, core Python, although the workaday environment is intensively NumPy and Pyfits related (also matplotlib).
Students found this behavior somewhat confusing.
def f(arg, y=[]):
y.append(arg) return y
r = f(10) r
[10]
r = f(11) r
[10, 11]
r.append(100) f(12)
[10, 11, 100, 12]
Here's one of the explanations for this behavior:
http://mail.python.org/pipermail/python-list/2007-March/1116072.html
In the above example, y is bound to a mutable object at the time the function def is evaluated, not at runtime each time the function is called. Once this object starts filling with data, it doesn't go out of scope but persists between function calls -- even though "y" is not in the globals namespace (it "lives" in the function).
It gets worse:
f(11,[])
[11]
f(12,[])
[12]
f(13)
[10, 11, 100, 12, 13]
f(14)
[10, 11, 100, 12, 13, 14]
One would think the empty list passed in as y would "override" and/or "reinitialize" y to the empty list. Instead, the passed in argument is bound to y at runtime, then goes out of scope. Meanwhile, the object assigned to y at the time of evaluation is still there in the background, ready for duty if no 2nd argument is passed.
r = f(12, []) r
[12]
r = f(r) r
[10, 11, 100, 12, 13, 14, [12]]
A metaphor I use is the "guard at the castle gate".
At the time of evaluation (when the module is compiled and the function is defined -- whether or not it gets called), objects get stored in a closet at the gate entrance, and those parameters assigned to defaults will always get those same objects out of the closet, dust them off, and use them whenever nothing gets passed for the parameter to "run with" at the time the function is called.
If nothing is handed over at "call time" then use whatever you've got in the closet, is the rule.
If the default object is mutable and nothing is passed in, then the guard at the gate (the parameter in question) is bound to the "closet object" and does whatever work with that object, returning it to the closet when done.
There's no re-evaluation or re-initialization of the default object at the time the function is called so if it's mutable and stuff got added or changed, then it returns to the closet in its changed form. It does not "revert" to some evaluation-time value.
del r def f(arg, y=[]):
global r y.append(arg) r = y return y
r
Traceback (most recent call last): File "<pyshell#68>", line 1, in <module> r NameError: name 'r' is not defined
r = f(10) r
[10]
r = f(11) r
[10, 11]
r = [] r = f(11) r
[10, 11, 11]
In the above case, r has been made a global variable inside the function. Assigning r to the empty list above merely rebinds it externally however, does not affect y's default object in the closet. When the function is run with no argument for y, r is rebound within the function to our growing default list object.
f(9)
[10, 11, 11, 9]
r
[10, 11, 11, 9]
f(12)
[10, 11, 11, 9, 12]
r
[10, 11, 11, 9, 12]
r = [] f(12)
[10, 11, 11, 9, 12, 12]
Again, it does no good to set r to the empty list with the expectation of reaching the y default in the castle closet. r is simply being rebound and is on the receiving end for y, which, in getting no arguments, simply reverts to using the growing list.
At the end of the function call, however, r is bound to the same object as y (because of r = y, with r declared global), so we do have an opportunity to affect the y default object...
r[0]=999 r
[999, 11, 11, 9, 12, 12]
f(12)
[999, 11, 11, 9, 12, 12, 12]
Ta dah!
r.pop(0)
999
r.pop(0)
11
r.pop(0)
11
r.pop(0)
9
r.pop(0)
12
r.pop(0)
12
r.pop(0)
12
The closet object has now had its members popped. We're back to an empty list to start with, thanks to runtime operations:
f(9)
[9]
f(9)
[9, 9]
So what's a quick way to empty a list without rebinding, i.e. we don't want to pop everything or remove one by one. Nor do we want to end up with a bunch of None objects.
Here's our global r:
r
[9, 9]
Check it out with a new global: we're able to delete slices:
test
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
del test[0] test
[1, 2, 3, 4, 5, 6, 7, 8, 9]
del test[1:5] test
[1, 6, 7, 8, 9]
... so that must mean we're able to delete all members of a list, without rebinding to a new list object:
del test[:] test
[]
Experimenting with our function some more:
f(11)
[9, 9, 11]
f(12)
[9, 9, 11, 12]
f(13)
[9, 9, 11, 12, 13]
So this is r before:
r
[9, 9, 11, 12, 13]
And this is r after:
del r[:] r
[]
... meaning the y's default object in the castle gate closet has likewise been emptied out, given r is bound to it.
f(13)
[13]
f(13)
[13, 13]
... like starting over eh?
This is safe syntax BTW:
del [][:]
In other words, it's safe to delete "all members" from an empty list without first checking to see if the list is empty or not.
r = [] del r[:] r
[]
Note that there's really no need to declare r global in order to reach in and change y, provided there's binding going on at the time of return:
del r def f(arg, y=[]):
y.append(arg) return y
f(10)
[10]
f(11)
[10, 11]
f(12)
[10, 11, 12]
r
Traceback (most recent call last): File "<pyshell#148>", line 1, in <module> r NameError: name 'r' is not defined
r = f(13) r
[10, 11, 12, 13]
del r[:] r
[]
f(10)
[10]
f(11)
[10, 11]
Kirby _______________________________________________ Edu-sig mailing list Edu-sig@python.org http://mail.python.org/mailman/listinfo/edu-sig
On Fri, Apr 23, 2010 at 11:46 AM, Bill Punch <punch@cse.msu.edu> wrote:
To round out the top three, which are the ones that get even my best students, look at the below:
myVar = 27
def myFun (): print myVar myVar += 1 return myVar
print myFun
What happens when you run this?
To fully appreciate Bill's example, I think it's worth pointing out that this version would behave exactly as expected: myVar = 27 def myFun (): print myVar # myVar += 1 return myVar print myFun()
On Fri, Apr 23, 2010 at 3:24 PM, Mark Engelberg <mark.engelberg@alumni.rice.edu> wrote:
On Fri, Apr 23, 2010 at 11:46 AM, Bill Punch <punch@cse.msu.edu> wrote:
To round out the top three, which are the ones that get even my best students, look at the below:
myVar = 27
def myFun (): print myVar myVar += 1 return myVar
print myFun
What happens when you run this?
To fully appreciate Bill's example, I think it's worth pointing out that this version would behave exactly as expected:
myVar = 27
def myFun (): print myVar # myVar += 1 return myVar
print myFun()
Also, for contrast: myVar = [27] def myFun (): print myVar myVar[0]+=1 return myVar
myFun() [27] [28]
The above works because the we're not referencing an unbound local, are instead mutating a global. No need to declare myVar global as there could be no other legal meaning. Functions look outside the local scope for bindings. def myFun (): print myVar myVar = 1 return myVar The above crashes because the print references an unbound local. It's a local because of the local binding (don't call it "rebinding") -- can't be the same as the global in that case, so no sense asking to print it before it has a value. def myFun (): global myVar print myVar myVar = 1 return myVar
myFun() [32] 1
myVar 1
The above works because the print statement accesses the global. Said global is then rebound. Of course the return is not required for the change to occur at the global level. def myFun (): myVar = 1 print myVar return myVar No problem.... local myVar is already bound when print is encountered. The global myVar is unchanged. myVar = [27] [27] def myFun (): print myVar myVar = [] return myVar The above crashes. It's the binding of myVar to some object other than what the global myVar already references that makes it a local to this function's scope, not the fact that it's bound to a mutable. Can't print a local before the binding occurs. Kirby
participants (6)
-
Andrew Harrington
-
Bill Punch
-
Chris Stromberger
-
David MacQuigg
-
kirby urner
-
Mark Engelberg