function defaults and an empty() builtin

During a code review I got asked a question about a "pythonic" idiom I've been asked about before. The code was like this: def func(optional=None): if optional is None: optional = [] The question was why the optional value wasn't set to an empty list in the first place. The answer is that Really Bad Things can happen if someone actually goes and manipulates that empty list because all future callers will see the modified version. I don't think this defensive programming practice is yet passe - I can think of lots of unit tests that wouldn't trigger bad behavior. You would have to intentionally provoke it by adding some unit tests to be sure. What would make my life a little easier is a builtin container named "empty()" that emulates all builtin containers and raises an exception for any add/subtract manipulations. Something like: class empty(): def _bad_user(self, *args): raise ValueError("empty objects are empty") append = pop = __getitem__ = add = setdeafult = __ior__ = __iand__ = _bad_user def _empty(self): return [] items = keys = values = get = _empty return nothing when asked for something and raise a ValueError when any attempt is made to add/remove items. -Jack

On Fri, 20 May 2011 02:46:14 pm Jack Diederich wrote:
Assuming that this behaviour is not intended. However, I agree that, in general, the behaviour of mutable defaults in Python is a Gotcha.
Er, yes... how is that different from any other behaviour, good or bad? You have to write the unit tests to test the behaviour you want to test for, or else it won't be tested.
I don't think that this idea will actually be as useful as you think it will, but in any case, why does it need to be a built-in? def func(optional=empty()): ... works just as well whether empty is built-in or not. But as I said, I don't think this will fly. What's the point? If you don't pass an argument for optional, and get a magic empty list, your function will raise an exception as soon as it tries to do something with the list. To my mind, that makes it rather useless. If you want the function to raise an exception if the default value is used, surely it's better to just make the argument non-optional. But perhaps I've misunderstood something. -- Steven D'Aprano

On 2011-05-20, at 07:57 , Steven D'Aprano wrote:
That Jack's object would be an empty, immutable collection. Not an arbitrary object. The idea is rather similar to Java's Collections.empty* (emptySet, emptyMap and emptyList), which could be fine solutions to this issue indeed. There is already `frozenset` for sets, but there is no way to instantiate an immutable list or dict in Python right now, as far as I know (tuples don't work as several mixed list&tuple operations yield an error, and I don't like using tuples as sequences personally). The ability to either make a collection (list or dict, maybe via separate functions) immutable or to create a special immutable empty variant thereof would work nicely.

On Fri, 20 May 2011 04:37:30 pm you wrote:
Yes, I get that, but what's the point? What's an actual use-case for it? What's the point of having an immutable collection that has the same methods as a list, but raises an exception if you use them? Most importantly, why single out an *empty* immutable list for special treatment, instead of providing a general immutable list type? It seems to me that all this suggested pattern does is use a too-clever and round-about way of turning a buggy function into an exception for the caller, possibly a long way from where the error actually exists. I can't think of any reason I would use this special empty() value as a default instead of either: - fix the function to not use the same default list; or - if using a default value causes problems, don't use a default value [...]
These are two different issues. Being able to freeze an object would be handy, but a special dedicated empty immutable list strikes me as completely pointless. -- Steven D'Aprano

On 2011-05-20, at 13:54 , Steven D'Aprano wrote:
I'm guessing the point is to be able to avoid the `if collection is None` dance when the collection is not *supposed* to be modified: an immutable collection would immediately raise on modification, acting as a precondition/invariant and ensuring mutation is not introduced on the original collection. paragraphs of my comment. But in Jack's case, I'm guessing it's because the Python bug of collections-as-default-values is most generally encountered with empty collections. that, but it may not even be visible as a bug — even though data is corrupted — and manifest itself as an even harder to track memory leak). By making that default-empty-collection immutable, mutations of the default-argument collection become obvious (they blow up), and the function can be fixed. It's much easier to track this down than a strange memory leak. On 2011-05-20, at 13:54 , Steven D'Aprano wrote:
An immutable empty collection can be a system-wide singleton, and extremely cheap to use. It makes for a good default value or default object member when you expect the collection to never be modified. Using `collections.empty_list` is also more readable and clearer than, say, `collections.freeze([])`.

I share Steve's puzzlement as the intended use case. To get value from the magic empty immutable list, you will have to explicitly test that calling your function with the default value does the right thing. But if you're writing an explicit test, having that test call the function *twice* to confirm correct use of the 'is None' idiom will work just as well. There are limits to how much we can help people that don't test their code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2011-05-20, at 17:03 , Nick Coghlan wrote:
It is no different than adding `assert` calls in the code.
There are 17 functions or methods with list default parameters and 133 with dict default parameters in the Python standard library. Surely some of them legitimately make use of a mutable default parameter as some kind of process-wide cache or accumulator, but I would doubt the majority does (why would SMTP.sendmail need to accumulate data in its mail_options parameter across runs?) Do you know for sure that no mutation of these 150+ parameters will ever be introduced, that all of these functions and methods are sufficiently tested, called often enough that the introduction of a mutation of the default parameter in themselves or one of their callees would *never* be able to pass muster?

It seems to me that a better way of doing this is: def func(optional_list=[]) optional_list = freeze(optional_list) That is, if I expect the list to be immutable when it's empty, why wouldn't I expect it to be immutable when it's not empty? The only case where the immutability of the empty list matters is if there's a bug that changes the list (or a called function changes its signature when that wasn't expected). Wouldn't it be worth protecting against that when the list isn't empty as well? Of course PEP 351 was rejected so there is no freeze() builtin. (Although personally, I don't agree with all the arguments against it. For example, I don't think that freezing a dict has to be hashable. I also think that immutable objects are useful in unit testing where it's allows me to easily be sure that a passed in dict isn't changed by a function.) Anyway, in the case of a list I suspect that this is pretty close to what you want: def func(optional_list=[]) optional_list = tuple(optional_list) --- Bruce Latest blog post: http://www.vroospeak.com Your social security number is a very poor password Learn how to hack web apps: http://j.mp/gruyere-security (learn how to write buggy Python too) On Fri, May 20, 2011 at 11:15 AM, Masklinn <masklinn@masklinn.net> wrote:

On 2011-05-20, at 21:10 , Bruce Leban wrote:
Absolutely, but the mutation of the default parameters seems to be the main problem (historically): it's a memory leak, and it's a global data corruption, where modifying a provided parameter is a local data corruption (unless the object passed in is global of course). Ideally, you could just add a decorator or an annotation doing that for you without additional work and name mutation within the function. lists and tuples does not work). Plus, it does not help with dicts, which can expose the same issue.

On Fri, May 20, 2011 at 12:27 PM, Masklinn <masklinn@masklinn.net> wrote:
Agreed, but that wasn't how I interpreted Jack's question. I think there are two issues (with the first one the one that I think Jack was targeting): (1) I want to make sure a parameter is immutable so I don't accidentally change it (and none of the functions I call can do that). Akin to declaring a parameter const in C-like languages. For example, is_ip_blocked(ip_address, blocked_ip_list). (That's not just ip_address in blocked_ip_list if the list contains CIDR addresses.) We could add code to make the values immutable or use annotations: @const def func(x : const, y : const = []): pass Personally, I like declaring the contract that a parameter is not being modified explicitly and I would like a shallow freeze() function. (2) The gotcha that the default value is the same value every time rather than a new value. Lot's of ways to deal with this but none of them work without educating people how the feature works. For example: @copy def func(x : copy, y : copy = []): pass --- Bruce Latest blog post: http://www.vroospeak.com Your social security number is a very poor password Learn how to hack web apps: http://j.mp/gruyere-security

On Fri, May 20, 2011 at 1:53 PM, Bruce Leban <bruce@leapyear.org> wrote:
One bad solution to both would be to have the language enforce that default values cannot be of mutable type. Though not a valid solution, that idea highlights the only two reasons I can see for using mutable defaults: - caching across function calls - having your default be of the same type as your expected argument (a documentation of sorts) The idiom of using None that Jack originally described is an acceptable alternative to using mutable defaults. It identifies no expectations on the type of the argument. It explicitly indicates that None is a valid argument. It implies that it will be special-cased in the function/class. It is easy to be consistent using None regardless of the expected type of the argument. The advantage of Jack's original proposal is that in cases where you are not modifying the argument your won't need to plug the correct object in for None, so that if statement he included would not be necessary. -eric
--- Bruce

Masklinn wrote:
Absolutely, but the mutation of the default parameters seems to be the main problem (historically):
The most common problem regarding default parameters is mutation of them by functions which *are* supposed to modify the parameter. IMO you're trying to solve an almost-nonexistent problem.
it's a global data corruption, where modifying a provided parameter is a local data corruption
It's just as damaging, though -- the program still produces incorrect results. -- Greg

Please end this thread. The original empty() proposal is clearly not working, and no modification of it is going to work. The recommended pattern is very clear and matches the Zen of Python: Explicit is better than implicit. -- --Guido van Rossum (python.org/~guido)

On 5/20/2011 3:10 PM, Bruce Leban wrote:
If a function only needs a read-only sequence, it should not require or to said to require a list. "def func(optional_seq = ()):". If it only iterates through an input collection, it should only require an iterable: "def func(optional_iter=()): it = iter(optional_iter)". If you are paranoid and want the function to raise on any attempt to do much of anything with the input, replace '()' with 'iter()'. -- Terry Jan Reedy

Masklinn wrote:
In this scenario: def func(mylist=empty()): do_some_stuff_with_mylist mylist is the empty() object, you *know* func() does not modify mylist, you are wrong (heh) and it does... but your program always calls func() with an actual list -- how is empty() going to save you then? Hint: it won't. And if you're thinking a unittest would catch that -- yes it would, but it would also catch it without empty() (make a copy first, call the func(), compare afterwards -- different? Mutation!) ~Ethan~

On 2011-05-20, at 21:12 , Ethan Furman wrote:
It will not until one day func() is called without an actual list, and then you get a clear and immediate error instead of silent data corruption, or a memory leak, which are generally the result of an improperly mutated default argument collection and much harder to spot. As I wrote in part of the message you quoted (but ignored), empty() acts as an assertion. The assertion is that the default parameter will never be modified, and if the assertion fails, an error is generated. That's it. That's a pretty common bug in Python, and it solves it. No more, and no less.

On Sat, 21 May 2011 05:10:25 am Masklinn wrote:
I wouldn't call it a memory leak. As I understand it, a memory leak is normally understood to mean that your program is assigning memory in such a way that neither you, nor the compiler, can free it, not that you merely haven't noticed that you're assigning memory. Since the default value is exposed, either the function or the caller can free that memory.
You are simply wrong there. There is no reason to imagine that the caller will *immediately* attempt to modify the result: y = func() # returns immutable empty list y.append(None) The attempt to mutate y might not happen until much later, in some distant part of the code, in another function, or module, or thread, or even another process. There is no limit to how distant in time or space the exception could be. It's a landmine waiting to blow up, not an assertion. y = func() # ... much later data = {'key': y} # ... much later still params.update(data) response = connect('something', params) def connect(x, params): a = params.get('key', []) a.append('something') When connect fails, there's nothing to associate the error with the mistake of calling func() without supplying an argument. But note that calling func() without an argument is supposed to be legal. Why is it a mistake? It's only a mistake because func exposes internal data to the caller, and then compounds that bug by punishing the caller for inadvertently modifying that internal data rather than not exposing it in the first place. This is *astonishingly* awful design. [...]
That's it. That's a pretty common bug in Python, and it solves it. No more, and no less.
This doesn't solve the problem, it just creates a new one. If people can't remember to use the "if default is None" idiom, what makes you think they will remember to use empty()? And if they do remember, they're just disguising their bug as the caller's mistake. -- Steven D'Aprano

On Sat, 21 May 2011 04:15:18 am you wrote:
It's not the caller's responsibility to avoid mangling the internals of the function. It is the function's responsibility to avoid exposing those internals. This suggestion seems crazy to me. Let me try to explain from the point of view of the caller. Suppose I call func and get a list back: x = func(a) So I can treat x as a list, because that's what it is: x.append(None) But if I fail to pass an argument, and the default empty() is used, I get something that looks like a list: y = func() hasattr(y, "append") # returns True but blows up when I try to use it: y.append(None) # raise an exception All because the function author doesn't want me modifying the return result. And why does the author care what I do with the result? Because he's exposing the default function value in such a way that the caller can mangle it. If it causes problems when the function returns the default value, stop returning the default value! Don't push the burden onto the caller by dropping a landmine into their code. Now you can "fix" this, for some definition of "fix", by documenting the fact that not passing the argument will result in something other than a list: "If you don't pass an argument, and use the default, then you will get back an immutable empty sequence that has the same API as a list but that will raise an exception if you try to mutate it." This is downright awful API design. As the caller, I simply don't care about the function author's difficulties in ensuring that the default value is not modified. That's Not My Problem. Fix your own buggy code. (Not that it is actually difficult: the idiom for mutable default values is two simple lines.) What Is My Problem is that rather than fix his function, the author has dumped the problem in my lap. Now I have this immutable empty sequence that is useless to me. I either have to detect it and change it myself: result = func(*args) # args could be empty if result is empty(): # Fix stupid design flaw in func result = [] or I have to remember to never, under any circumstances, call func() without supplying an argument. [...]
Fine, you've discovered 150 potentially buggy functions in the standard library. If the authors didn't remember to use the default=None idiom in their functions, what makes you think that they'd remember to use default=empty() instead? This suggested idiom is counterproductive. The function author doesn't save any work -- he still has to remember not to write default=[] in his functions. The author's burden is increased, because now he has to choose between three idioms instead of two: # use this when default is like a cache default=[] # use this when you need to mutate default within the function default=None if default is None: default = [] # use this when you want to return the default value but don't want # the caller to mutate it default=empty() And the caller's burden is increased, because now he has to deal with this immutable list instead of a real list. -- Steven D'Aprano

On 5/20/2011 2:15 PM, Masklinn wrote: =
There are 17 functions or methods with list default parameters and 133 with dict default parameters in the Python standard library.
Could you give the re or whatever that you used to find these? I might want to look and possibly change a few.
I suspect that that nearly all of these uses are for read-only inputs. In Python 3, () could replace [] in such cases, as tuples now have all the read-only sequence methods (they once had no methods). Of course, even that does not protect against perhaps crazy code like if input: <mutate it> # skips empty args, default or not That suggests that all functions that are supposes to only read an input sequences should be tested with tuples. Actually, if only an iterable is needed, then such should be tested with non-seequence iterables. That is actually very easy to produce: for instance, iter((1,2,3)). Thinking about it more, it the only use of an arg is to iterate through key,value pairs, then the default could be 'iter({})' insteaad of '{}' to better document the usage.
No. However, anyone qualified for push access to the central source should know that defaults should be treated as read-only unless documented otherwise. This is especially true for {}. So I consider it a somewhat paranoid worry, in the absence of cases where revisers *have* introduced mutation where not present before. That aside, developers have and are improving the test suite. That was the focus of the recent post-PyCon sprint. It continues with a test improvement most every day. If you want to join us volunteers to improve tests further, please do. -- Terry Jan Reedy

On Fri, May 20, 2011 at 11:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The use case isn't very fancy, it's to have a generic empty iterable as a place holder in function defs instead of doing the "if x is None" dance. Unit tests make using a real empty iterable less likely to trigger bad behavior, but because the behavior of real empty iterables in function defs is tricky and non-intuitive, unit tests for some of those functions would need some extra boilerplate. That might not be a bad tradeoff compared to adding extra "if x is None" checks for each optional arg to a function. FYI, here is the code that triggered the query. The "if None" check is mostly habit with a small dose of pedagogical reinforcement for other devs that would read it. def query_sphinx(search_text, include=None, exclude=None): if include is None: include = {} if exclude is None: exclude = {} query = sphinxapi.client() for field, values in include.items(): query.SetFilter(field, values) for field, values in exclude.items(): query.SetFilter(field, values, exclude=True) return query.query(search_text) -Jack

On 5/20/2011 7:10 PM, Jack Diederich wrote:
The use case isn't very fancy, it's to have a generic empty iterable
If () and frozenset() are not generic enough for *you*, try this: def empty(): raise StopIteration yield emp = empty() for i in emp: print('something') for i in emp: print('something') # prints nothing, both times. Actually, iter(()), iter([]), iter({}) behave the same when iterated.
as a place holder in function defs instead of doing the "if x is None"
That idiom is for a completely differert use case: when one wants a new empty *mutable* on every call, that will be filled with values and, typically, returned.
Since you are not mutating include and exclude, there is no point to this noise. I fact, I consider it wrong because it actually *misleads* other devs who would expect something put into each of them and returned. The proper way to write this is def query_sphinx(search_text, include={}, exclude={}): which documents that the parameters should be dicts (or similar) and that they are read only. You version implies that it would be ok to write to them, which is wrong.
-- Terry Jan Reedy

On 5/21/2011 5:48 PM, Terry Reedy wrote:
Here is back-compatible rewrite that expands the domain for 'include' and 'exclude' to iterables of key-value pairs. It both documents and ensures that query_sphinx() will do nothing but iterate through key-value pairs from the last two args. def query_sphinx(search_text, include=iter({}.values()), exclude=iter({}.values())): if isinstance(include, dict): include = include.items() if isinstance(exclude, dict): exclude = exclude.items() query = sphinxapi.client() for field, values in include(): query.SetFilter(field, values) for field, values in exclude(): query.SetFilter(field, values, exclude=True) return query.query(search_text) Lifting effectively constant expressions out of a loop is a standard technique. In this case, the 'loop' is whatever would cause repeated calls to the function without explicit args. The small define-time cost of the extra calls would eventually be saved at runtime by reusing the dict_valueiterators instead of creating equivalent ones over and over. -- Terry Jan Reedy

Masklinn wrote:
If the function can't proceed properly without an actual parameter, why supply a default? Make it required, and then the function will blow up when it's called without one. I suppose there could be a case where one is going to iterate through a collection, and useful work may still happen if said collection is empty, and one is feeling too lazy to create an empty one on the spot where the function is called and so relies an the immutable empty default... but if one knows all that one should be able to not call any mutating methods. But the original problem is that an empty list is used as the default because an actual list is expected. I think the problem has been misunderstood -- it's not *if* the list gets modified, but *when* -- so you would have the same dance, only instead of None, your now saying if default == empty(): default = [] So you haven't saved a thing, and still don't really get the purpose behind mutable defaults. ~Ethan~

Masklinn wrote:
Yes, I am aware. And the point of providing an empty list as a default is so you have a list to add things to -- so what have you gained by providing an empty frozen list as a default? Seems to me all you have now is a built-in time bomb -- every call = a blow up.
So what happens when you provide a *real* list, that is to be modified? Not modify it? Or have code that is constantly checking to see if it's okay to modify the list because it might be the immutable empty() object? ~Ethan~

On 2011-05-20, at 20:56 , Ethan Furman wrote:
-- so what have you gained by providing an empty frozen list as a default? Seems to me all you have now is a built-in time bomb -- every call = a blow up. See above, your assumption is flawed and all reasoning following it is nonsense.
collections they were provided as parameters (which is the vast majority of functions, really).
Not modify it? Or have code that is constantly checking to see if it's okay to modify the list because it might be the immutable empty() object? This "objection" is absolutely nonsensical. Please cease.

On 2011-05-20, at 21:22 , Ethan Furman wrote:
Or do you mean accumulating in a global or class instance? Accumulating can be done in anything.
And why would you iterate over an empty list? Because you're iterating period, and that it's an empty list has no influence on your behavior. You'll simply do nothing during your iteration, because the iteration count will be 0. Why special-case empty lists when there is no need to?
Same with dict, `get` works on empty dicts as well as on any other such collection.
If you have an example from the stdlib I'd love to see it (seriously -- I'm always up for learning something).
Mailcap does that line 170: `subst` takes an empty list as a default parameter, forwards that parameter to `findparam` which iterates on the list to try and find the param. If it can't find the param in the list, it simply returns an empty string. An empty list is simply a case where it will never find the param, and it will Just Work. No need to create a special case. Have you really never done such a thing?

On Sat, 21 May 2011 05:18:30 am Masklinn wrote:
Why special-case empty lists when there is no need to?
That's a remarkable statement. What is empty() except a special case for empty lists? If you and Jack are serious about this proposal, it would require at least two such functions, emptylist and emptydict, not just empty(). And even if you are right that it solves the problem of default=[] (which you aren't, but for the sake of the argument lets pretend), it doesn't solve the general issue of mutable defaults. As I said, having a freeze() function that creates an immutable list might be a good idea, although not for the default argument issue. (I'm not entirely sure how that differs from tuple, but that's another issue...) But special casing a frozen empty list seems silly, and the use-case given by the OP, and defended by you, is actively harmful. -- Steven D'Aprano

On 21/05/2011 04:19, Steven D'Aprano wrote:
Or all collections could have a "mutable" attribute which, once it has been set to False, can never subsequently be reset to True. Then you could merge lists and tuples into a single type, ditto sets and frozen sets, and you get immutable dictionaries as well. Plus a considerable simplification of the language.

On 20May2011 15:11, Terry Reedy <tjreedy@udel.edu> wrote: | On 5/20/2011 1:51 PM, Masklinn wrote: | | I am as puzzled as other people. | | >empty() is both an empty list (because the code iterates over a list | >for instance, or maps it, or what have you) and an assertion that | >this list is *not* to be modified. | | So use () as the default. It has all the methods of [] except for | the mutation methods. You're missing the point. This thread is about providing a complex solution to a common problem. Your technique of providing a simple solution to the problem doesn't help the thread persist. [ Hmm, I see my random sig quoter has hit the money again:-) Truly, the quote below was pot luck! ] Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ If you can keep your head while all those about you are losing theirs, perhaps you don't understand the situation. - Paul Wilson <Paul_Wilson.DBS@dbsnotes.dbsoftware.com>

On Thu, May 19, 2011 at 9:46 PM, Jack Diederich <jackdied@gmail.com> wrote:
return nothing when asked for something and raise a ValueError when any attempt is made to add/remove items.
Couldn't you just use the empty immutable version for whatever type the optional might be? For sequences, use (). For sets, use frozenset(). For dicts, use ... oh. Crap. -- Daniel Stutzbach

On Fri, 20 May 2011 02:46:14 pm Jack Diederich wrote:
Assuming that this behaviour is not intended. However, I agree that, in general, the behaviour of mutable defaults in Python is a Gotcha.
Er, yes... how is that different from any other behaviour, good or bad? You have to write the unit tests to test the behaviour you want to test for, or else it won't be tested.
I don't think that this idea will actually be as useful as you think it will, but in any case, why does it need to be a built-in? def func(optional=empty()): ... works just as well whether empty is built-in or not. But as I said, I don't think this will fly. What's the point? If you don't pass an argument for optional, and get a magic empty list, your function will raise an exception as soon as it tries to do something with the list. To my mind, that makes it rather useless. If you want the function to raise an exception if the default value is used, surely it's better to just make the argument non-optional. But perhaps I've misunderstood something. -- Steven D'Aprano

On 2011-05-20, at 07:57 , Steven D'Aprano wrote:
That Jack's object would be an empty, immutable collection. Not an arbitrary object. The idea is rather similar to Java's Collections.empty* (emptySet, emptyMap and emptyList), which could be fine solutions to this issue indeed. There is already `frozenset` for sets, but there is no way to instantiate an immutable list or dict in Python right now, as far as I know (tuples don't work as several mixed list&tuple operations yield an error, and I don't like using tuples as sequences personally). The ability to either make a collection (list or dict, maybe via separate functions) immutable or to create a special immutable empty variant thereof would work nicely.

On Fri, 20 May 2011 04:37:30 pm you wrote:
Yes, I get that, but what's the point? What's an actual use-case for it? What's the point of having an immutable collection that has the same methods as a list, but raises an exception if you use them? Most importantly, why single out an *empty* immutable list for special treatment, instead of providing a general immutable list type? It seems to me that all this suggested pattern does is use a too-clever and round-about way of turning a buggy function into an exception for the caller, possibly a long way from where the error actually exists. I can't think of any reason I would use this special empty() value as a default instead of either: - fix the function to not use the same default list; or - if using a default value causes problems, don't use a default value [...]
These are two different issues. Being able to freeze an object would be handy, but a special dedicated empty immutable list strikes me as completely pointless. -- Steven D'Aprano

On 2011-05-20, at 13:54 , Steven D'Aprano wrote:
I'm guessing the point is to be able to avoid the `if collection is None` dance when the collection is not *supposed* to be modified: an immutable collection would immediately raise on modification, acting as a precondition/invariant and ensuring mutation is not introduced on the original collection. paragraphs of my comment. But in Jack's case, I'm guessing it's because the Python bug of collections-as-default-values is most generally encountered with empty collections. that, but it may not even be visible as a bug — even though data is corrupted — and manifest itself as an even harder to track memory leak). By making that default-empty-collection immutable, mutations of the default-argument collection become obvious (they blow up), and the function can be fixed. It's much easier to track this down than a strange memory leak. On 2011-05-20, at 13:54 , Steven D'Aprano wrote:
An immutable empty collection can be a system-wide singleton, and extremely cheap to use. It makes for a good default value or default object member when you expect the collection to never be modified. Using `collections.empty_list` is also more readable and clearer than, say, `collections.freeze([])`.

I share Steve's puzzlement as the intended use case. To get value from the magic empty immutable list, you will have to explicitly test that calling your function with the default value does the right thing. But if you're writing an explicit test, having that test call the function *twice* to confirm correct use of the 'is None' idiom will work just as well. There are limits to how much we can help people that don't test their code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2011-05-20, at 17:03 , Nick Coghlan wrote:
It is no different than adding `assert` calls in the code.
There are 17 functions or methods with list default parameters and 133 with dict default parameters in the Python standard library. Surely some of them legitimately make use of a mutable default parameter as some kind of process-wide cache or accumulator, but I would doubt the majority does (why would SMTP.sendmail need to accumulate data in its mail_options parameter across runs?) Do you know for sure that no mutation of these 150+ parameters will ever be introduced, that all of these functions and methods are sufficiently tested, called often enough that the introduction of a mutation of the default parameter in themselves or one of their callees would *never* be able to pass muster?

It seems to me that a better way of doing this is: def func(optional_list=[]) optional_list = freeze(optional_list) That is, if I expect the list to be immutable when it's empty, why wouldn't I expect it to be immutable when it's not empty? The only case where the immutability of the empty list matters is if there's a bug that changes the list (or a called function changes its signature when that wasn't expected). Wouldn't it be worth protecting against that when the list isn't empty as well? Of course PEP 351 was rejected so there is no freeze() builtin. (Although personally, I don't agree with all the arguments against it. For example, I don't think that freezing a dict has to be hashable. I also think that immutable objects are useful in unit testing where it's allows me to easily be sure that a passed in dict isn't changed by a function.) Anyway, in the case of a list I suspect that this is pretty close to what you want: def func(optional_list=[]) optional_list = tuple(optional_list) --- Bruce Latest blog post: http://www.vroospeak.com Your social security number is a very poor password Learn how to hack web apps: http://j.mp/gruyere-security (learn how to write buggy Python too) On Fri, May 20, 2011 at 11:15 AM, Masklinn <masklinn@masklinn.net> wrote:

On 2011-05-20, at 21:10 , Bruce Leban wrote:
Absolutely, but the mutation of the default parameters seems to be the main problem (historically): it's a memory leak, and it's a global data corruption, where modifying a provided parameter is a local data corruption (unless the object passed in is global of course). Ideally, you could just add a decorator or an annotation doing that for you without additional work and name mutation within the function. lists and tuples does not work). Plus, it does not help with dicts, which can expose the same issue.

On Fri, May 20, 2011 at 12:27 PM, Masklinn <masklinn@masklinn.net> wrote:
Agreed, but that wasn't how I interpreted Jack's question. I think there are two issues (with the first one the one that I think Jack was targeting): (1) I want to make sure a parameter is immutable so I don't accidentally change it (and none of the functions I call can do that). Akin to declaring a parameter const in C-like languages. For example, is_ip_blocked(ip_address, blocked_ip_list). (That's not just ip_address in blocked_ip_list if the list contains CIDR addresses.) We could add code to make the values immutable or use annotations: @const def func(x : const, y : const = []): pass Personally, I like declaring the contract that a parameter is not being modified explicitly and I would like a shallow freeze() function. (2) The gotcha that the default value is the same value every time rather than a new value. Lot's of ways to deal with this but none of them work without educating people how the feature works. For example: @copy def func(x : copy, y : copy = []): pass --- Bruce Latest blog post: http://www.vroospeak.com Your social security number is a very poor password Learn how to hack web apps: http://j.mp/gruyere-security

On Fri, May 20, 2011 at 1:53 PM, Bruce Leban <bruce@leapyear.org> wrote:
One bad solution to both would be to have the language enforce that default values cannot be of mutable type. Though not a valid solution, that idea highlights the only two reasons I can see for using mutable defaults: - caching across function calls - having your default be of the same type as your expected argument (a documentation of sorts) The idiom of using None that Jack originally described is an acceptable alternative to using mutable defaults. It identifies no expectations on the type of the argument. It explicitly indicates that None is a valid argument. It implies that it will be special-cased in the function/class. It is easy to be consistent using None regardless of the expected type of the argument. The advantage of Jack's original proposal is that in cases where you are not modifying the argument your won't need to plug the correct object in for None, so that if statement he included would not be necessary. -eric
--- Bruce

Masklinn wrote:
Absolutely, but the mutation of the default parameters seems to be the main problem (historically):
The most common problem regarding default parameters is mutation of them by functions which *are* supposed to modify the parameter. IMO you're trying to solve an almost-nonexistent problem.
it's a global data corruption, where modifying a provided parameter is a local data corruption
It's just as damaging, though -- the program still produces incorrect results. -- Greg

Please end this thread. The original empty() proposal is clearly not working, and no modification of it is going to work. The recommended pattern is very clear and matches the Zen of Python: Explicit is better than implicit. -- --Guido van Rossum (python.org/~guido)

On 5/20/2011 3:10 PM, Bruce Leban wrote:
If a function only needs a read-only sequence, it should not require or to said to require a list. "def func(optional_seq = ()):". If it only iterates through an input collection, it should only require an iterable: "def func(optional_iter=()): it = iter(optional_iter)". If you are paranoid and want the function to raise on any attempt to do much of anything with the input, replace '()' with 'iter()'. -- Terry Jan Reedy

Masklinn wrote:
In this scenario: def func(mylist=empty()): do_some_stuff_with_mylist mylist is the empty() object, you *know* func() does not modify mylist, you are wrong (heh) and it does... but your program always calls func() with an actual list -- how is empty() going to save you then? Hint: it won't. And if you're thinking a unittest would catch that -- yes it would, but it would also catch it without empty() (make a copy first, call the func(), compare afterwards -- different? Mutation!) ~Ethan~

On 2011-05-20, at 21:12 , Ethan Furman wrote:
It will not until one day func() is called without an actual list, and then you get a clear and immediate error instead of silent data corruption, or a memory leak, which are generally the result of an improperly mutated default argument collection and much harder to spot. As I wrote in part of the message you quoted (but ignored), empty() acts as an assertion. The assertion is that the default parameter will never be modified, and if the assertion fails, an error is generated. That's it. That's a pretty common bug in Python, and it solves it. No more, and no less.

On Sat, 21 May 2011 05:10:25 am Masklinn wrote:
I wouldn't call it a memory leak. As I understand it, a memory leak is normally understood to mean that your program is assigning memory in such a way that neither you, nor the compiler, can free it, not that you merely haven't noticed that you're assigning memory. Since the default value is exposed, either the function or the caller can free that memory.
You are simply wrong there. There is no reason to imagine that the caller will *immediately* attempt to modify the result: y = func() # returns immutable empty list y.append(None) The attempt to mutate y might not happen until much later, in some distant part of the code, in another function, or module, or thread, or even another process. There is no limit to how distant in time or space the exception could be. It's a landmine waiting to blow up, not an assertion. y = func() # ... much later data = {'key': y} # ... much later still params.update(data) response = connect('something', params) def connect(x, params): a = params.get('key', []) a.append('something') When connect fails, there's nothing to associate the error with the mistake of calling func() without supplying an argument. But note that calling func() without an argument is supposed to be legal. Why is it a mistake? It's only a mistake because func exposes internal data to the caller, and then compounds that bug by punishing the caller for inadvertently modifying that internal data rather than not exposing it in the first place. This is *astonishingly* awful design. [...]
That's it. That's a pretty common bug in Python, and it solves it. No more, and no less.
This doesn't solve the problem, it just creates a new one. If people can't remember to use the "if default is None" idiom, what makes you think they will remember to use empty()? And if they do remember, they're just disguising their bug as the caller's mistake. -- Steven D'Aprano

On Sat, 21 May 2011 04:15:18 am you wrote:
It's not the caller's responsibility to avoid mangling the internals of the function. It is the function's responsibility to avoid exposing those internals. This suggestion seems crazy to me. Let me try to explain from the point of view of the caller. Suppose I call func and get a list back: x = func(a) So I can treat x as a list, because that's what it is: x.append(None) But if I fail to pass an argument, and the default empty() is used, I get something that looks like a list: y = func() hasattr(y, "append") # returns True but blows up when I try to use it: y.append(None) # raise an exception All because the function author doesn't want me modifying the return result. And why does the author care what I do with the result? Because he's exposing the default function value in such a way that the caller can mangle it. If it causes problems when the function returns the default value, stop returning the default value! Don't push the burden onto the caller by dropping a landmine into their code. Now you can "fix" this, for some definition of "fix", by documenting the fact that not passing the argument will result in something other than a list: "If you don't pass an argument, and use the default, then you will get back an immutable empty sequence that has the same API as a list but that will raise an exception if you try to mutate it." This is downright awful API design. As the caller, I simply don't care about the function author's difficulties in ensuring that the default value is not modified. That's Not My Problem. Fix your own buggy code. (Not that it is actually difficult: the idiom for mutable default values is two simple lines.) What Is My Problem is that rather than fix his function, the author has dumped the problem in my lap. Now I have this immutable empty sequence that is useless to me. I either have to detect it and change it myself: result = func(*args) # args could be empty if result is empty(): # Fix stupid design flaw in func result = [] or I have to remember to never, under any circumstances, call func() without supplying an argument. [...]
Fine, you've discovered 150 potentially buggy functions in the standard library. If the authors didn't remember to use the default=None idiom in their functions, what makes you think that they'd remember to use default=empty() instead? This suggested idiom is counterproductive. The function author doesn't save any work -- he still has to remember not to write default=[] in his functions. The author's burden is increased, because now he has to choose between three idioms instead of two: # use this when default is like a cache default=[] # use this when you need to mutate default within the function default=None if default is None: default = [] # use this when you want to return the default value but don't want # the caller to mutate it default=empty() And the caller's burden is increased, because now he has to deal with this immutable list instead of a real list. -- Steven D'Aprano

On 5/20/2011 2:15 PM, Masklinn wrote: =
There are 17 functions or methods with list default parameters and 133 with dict default parameters in the Python standard library.
Could you give the re or whatever that you used to find these? I might want to look and possibly change a few.
I suspect that that nearly all of these uses are for read-only inputs. In Python 3, () could replace [] in such cases, as tuples now have all the read-only sequence methods (they once had no methods). Of course, even that does not protect against perhaps crazy code like if input: <mutate it> # skips empty args, default or not That suggests that all functions that are supposes to only read an input sequences should be tested with tuples. Actually, if only an iterable is needed, then such should be tested with non-seequence iterables. That is actually very easy to produce: for instance, iter((1,2,3)). Thinking about it more, it the only use of an arg is to iterate through key,value pairs, then the default could be 'iter({})' insteaad of '{}' to better document the usage.
No. However, anyone qualified for push access to the central source should know that defaults should be treated as read-only unless documented otherwise. This is especially true for {}. So I consider it a somewhat paranoid worry, in the absence of cases where revisers *have* introduced mutation where not present before. That aside, developers have and are improving the test suite. That was the focus of the recent post-PyCon sprint. It continues with a test improvement most every day. If you want to join us volunteers to improve tests further, please do. -- Terry Jan Reedy

On Fri, May 20, 2011 at 11:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The use case isn't very fancy, it's to have a generic empty iterable as a place holder in function defs instead of doing the "if x is None" dance. Unit tests make using a real empty iterable less likely to trigger bad behavior, but because the behavior of real empty iterables in function defs is tricky and non-intuitive, unit tests for some of those functions would need some extra boilerplate. That might not be a bad tradeoff compared to adding extra "if x is None" checks for each optional arg to a function. FYI, here is the code that triggered the query. The "if None" check is mostly habit with a small dose of pedagogical reinforcement for other devs that would read it. def query_sphinx(search_text, include=None, exclude=None): if include is None: include = {} if exclude is None: exclude = {} query = sphinxapi.client() for field, values in include.items(): query.SetFilter(field, values) for field, values in exclude.items(): query.SetFilter(field, values, exclude=True) return query.query(search_text) -Jack

On 5/20/2011 7:10 PM, Jack Diederich wrote:
The use case isn't very fancy, it's to have a generic empty iterable
If () and frozenset() are not generic enough for *you*, try this: def empty(): raise StopIteration yield emp = empty() for i in emp: print('something') for i in emp: print('something') # prints nothing, both times. Actually, iter(()), iter([]), iter({}) behave the same when iterated.
as a place holder in function defs instead of doing the "if x is None"
That idiom is for a completely differert use case: when one wants a new empty *mutable* on every call, that will be filled with values and, typically, returned.
Since you are not mutating include and exclude, there is no point to this noise. I fact, I consider it wrong because it actually *misleads* other devs who would expect something put into each of them and returned. The proper way to write this is def query_sphinx(search_text, include={}, exclude={}): which documents that the parameters should be dicts (or similar) and that they are read only. You version implies that it would be ok to write to them, which is wrong.
-- Terry Jan Reedy

On 5/21/2011 5:48 PM, Terry Reedy wrote:
Here is back-compatible rewrite that expands the domain for 'include' and 'exclude' to iterables of key-value pairs. It both documents and ensures that query_sphinx() will do nothing but iterate through key-value pairs from the last two args. def query_sphinx(search_text, include=iter({}.values()), exclude=iter({}.values())): if isinstance(include, dict): include = include.items() if isinstance(exclude, dict): exclude = exclude.items() query = sphinxapi.client() for field, values in include(): query.SetFilter(field, values) for field, values in exclude(): query.SetFilter(field, values, exclude=True) return query.query(search_text) Lifting effectively constant expressions out of a loop is a standard technique. In this case, the 'loop' is whatever would cause repeated calls to the function without explicit args. The small define-time cost of the extra calls would eventually be saved at runtime by reusing the dict_valueiterators instead of creating equivalent ones over and over. -- Terry Jan Reedy

Masklinn wrote:
If the function can't proceed properly without an actual parameter, why supply a default? Make it required, and then the function will blow up when it's called without one. I suppose there could be a case where one is going to iterate through a collection, and useful work may still happen if said collection is empty, and one is feeling too lazy to create an empty one on the spot where the function is called and so relies an the immutable empty default... but if one knows all that one should be able to not call any mutating methods. But the original problem is that an empty list is used as the default because an actual list is expected. I think the problem has been misunderstood -- it's not *if* the list gets modified, but *when* -- so you would have the same dance, only instead of None, your now saying if default == empty(): default = [] So you haven't saved a thing, and still don't really get the purpose behind mutable defaults. ~Ethan~

Masklinn wrote:
Yes, I am aware. And the point of providing an empty list as a default is so you have a list to add things to -- so what have you gained by providing an empty frozen list as a default? Seems to me all you have now is a built-in time bomb -- every call = a blow up.
So what happens when you provide a *real* list, that is to be modified? Not modify it? Or have code that is constantly checking to see if it's okay to modify the list because it might be the immutable empty() object? ~Ethan~

On 2011-05-20, at 20:56 , Ethan Furman wrote:
-- so what have you gained by providing an empty frozen list as a default? Seems to me all you have now is a built-in time bomb -- every call = a blow up. See above, your assumption is flawed and all reasoning following it is nonsense.
collections they were provided as parameters (which is the vast majority of functions, really).
Not modify it? Or have code that is constantly checking to see if it's okay to modify the list because it might be the immutable empty() object? This "objection" is absolutely nonsensical. Please cease.

On 2011-05-20, at 21:22 , Ethan Furman wrote:
Or do you mean accumulating in a global or class instance? Accumulating can be done in anything.
And why would you iterate over an empty list? Because you're iterating period, and that it's an empty list has no influence on your behavior. You'll simply do nothing during your iteration, because the iteration count will be 0. Why special-case empty lists when there is no need to?
Same with dict, `get` works on empty dicts as well as on any other such collection.
If you have an example from the stdlib I'd love to see it (seriously -- I'm always up for learning something).
Mailcap does that line 170: `subst` takes an empty list as a default parameter, forwards that parameter to `findparam` which iterates on the list to try and find the param. If it can't find the param in the list, it simply returns an empty string. An empty list is simply a case where it will never find the param, and it will Just Work. No need to create a special case. Have you really never done such a thing?

On Sat, 21 May 2011 05:18:30 am Masklinn wrote:
Why special-case empty lists when there is no need to?
That's a remarkable statement. What is empty() except a special case for empty lists? If you and Jack are serious about this proposal, it would require at least two such functions, emptylist and emptydict, not just empty(). And even if you are right that it solves the problem of default=[] (which you aren't, but for the sake of the argument lets pretend), it doesn't solve the general issue of mutable defaults. As I said, having a freeze() function that creates an immutable list might be a good idea, although not for the default argument issue. (I'm not entirely sure how that differs from tuple, but that's another issue...) But special casing a frozen empty list seems silly, and the use-case given by the OP, and defended by you, is actively harmful. -- Steven D'Aprano

On 21/05/2011 04:19, Steven D'Aprano wrote:
Or all collections could have a "mutable" attribute which, once it has been set to False, can never subsequently be reset to True. Then you could merge lists and tuples into a single type, ditto sets and frozen sets, and you get immutable dictionaries as well. Plus a considerable simplification of the language.

On 20May2011 15:11, Terry Reedy <tjreedy@udel.edu> wrote: | On 5/20/2011 1:51 PM, Masklinn wrote: | | I am as puzzled as other people. | | >empty() is both an empty list (because the code iterates over a list | >for instance, or maps it, or what have you) and an assertion that | >this list is *not* to be modified. | | So use () as the default. It has all the methods of [] except for | the mutation methods. You're missing the point. This thread is about providing a complex solution to a common problem. Your technique of providing a simple solution to the problem doesn't help the thread persist. [ Hmm, I see my random sig quoter has hit the money again:-) Truly, the quote below was pot luck! ] Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ If you can keep your head while all those about you are losing theirs, perhaps you don't understand the situation. - Paul Wilson <Paul_Wilson.DBS@dbsnotes.dbsoftware.com>

On Thu, May 19, 2011 at 9:46 PM, Jack Diederich <jackdied@gmail.com> wrote:
return nothing when asked for something and raise a ValueError when any attempt is made to add/remove items.
Couldn't you just use the empty immutable version for whatever type the optional might be? For sequences, use (). For sets, use frozenset(). For dicts, use ... oh. Crap. -- Daniel Stutzbach
participants (13)
-
Bruce Leban
-
Cameron Simpson
-
Daniel Stutzbach
-
Eric Snow
-
Ethan Furman
-
Greg Ewing
-
Guido van Rossum
-
Jack Diederich
-
Masklinn
-
Nick Coghlan
-
Rob Cliffe
-
Steven D'Aprano
-
Terry Reedy