I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip adding any start value if none is specified? This current behavior is preventing me from using `sum` to add up a bunch of non- number objects. Ram.
Ram Rachum wrote:
I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip adding any start value if none is specified?
This current behavior is preventing me from using `sum` to add up a bunch of non-number objects.
Sometimes you might find that the list you're summing is empty. Because 'sum' is most often used with numbers, the default sum of a list is 0. If you want to sum a list of non-numbers, provide a suitable start value. For example, to sum a list of lists a suitable start value is []:
sum([[0, 1], [2, 3]], []) [0, 1, 2, 3]
I agree that it would be nice if the start value could just be omitted, but then what should 'sum' return if the list is empty? If sum([1, 2]) returned 3, then I'd want sum([]) to return 0. If sum([[1], [2]]) returned [1, 2], then I'd want sum([]) to return []. Unfortunately, I can't have it both ways.
Sometimes you might find that the list you're summing is empty. Because 'sum' is most often used with numbers, the default sum of a list is 0. If you want to sum a list of non-numbers, provide a suitable start value. For example, to sum a list of lists a suitable start value is []:
sum([[0, 1], [2, 3]], []) [0, 1, 2, 3]
I agree that it would be nice if the start value could just be omitted, but then what should 'sum' return if the list is empty?
I see the problem. I think a good solution would be to tell the user, "If you want `sum` to be able to handle a non-empty list, you must supply `start`." Users that want to add up a (possibly empty) sequence of numbers will have to specify `start`. If start is supplied, it will work like it does now. If start isn't supplied, it will add up all the elements without adding any `start` to them. What do you think?
Ram Rachum schrieb:
Sometimes you might find that the list you're summing is empty. Because 'sum' is most often used with numbers, the default sum of a list is 0. If you want to sum a list of non-numbers, provide a suitable start value. For example, to sum a list of lists a suitable start value is []:
sum([[0, 1], [2, 3]], []) [0, 1, 2, 3]
I agree that it would be nice if the start value could just be omitted, but then what should 'sum' return if the list is empty?
I see the problem. I think a good solution would be to tell the user, "If you want `sum` to be able to handle a non-empty list, you must supply `start`." Users that want to add up a (possibly empty) sequence of numbers will have to specify `start`.
If start is supplied, it will work like it does now. If start isn't supplied, it will add up all the elements without adding any `start` to them.
What do you think?
There is a choice between these two variants: a) require start for non-numerical sequences b) require start for possibly empty sequences I don't have a preference for either, so for compatibility's sake I would vote to keep the current one, which is a). It also stands to reason that case b) -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
Ram Rachum schrieb:
Sometimes you might find that the list you're summing is empty. Because 'sum' is most often used with numbers, the default sum of a list is 0. If you want to sum a list of non-numbers, provide a suitable start value. For example, to sum a list of lists a suitable start value is []:
sum([[0, 1], [2, 3]], []) [0, 1, 2, 3]
I agree that it would be nice if the start value could just be omitted, but then what should 'sum' return if the list is empty?
I see the problem. I think a good solution would be to tell the user, "If you want `sum` to be able to handle a non-empty list, you must supply `start`." Users that want to add up a (possibly empty) sequence of numbers will have to specify `start`.
If start is supplied, it will work like it does now. If start isn't supplied, it will add up all the elements without adding any `start` to them.
What do you think?
(sorry, pressed wrong key) There is a choice between these two variants: a) require start for non-numerical sequences b) require start for possibly empty sequences I don't have a preference for either, so for compatibility's sake I would vote to keep the current one, which is a). It also stands to reason that buggy usage in case b) is harder to detect, since the common case will not uncover the bug (the sequence being nonempty), while for case a) it does. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
There is a choice between these two variants:
a) require start for non-numerical sequences b) require start for possibly empty sequences
I don't have a preference for either, so for compatibility's sake I would vote to keep the current one, which is a). It also stands to reason that buggy usage in case b) is harder to detect, since the common case will not uncover the bug (the sequence being nonempty), while for case a) it does.
I prefer (b). The problem with requiring `start` for sequences of non-numerical objects is that you now have to go out and create a "zero object" of the same type as your other objects. The object class might not even have a concept of a "zero object". Ram.
Ram Rachum wrote:
There is a choice between these two variants:
a) require start for non-numerical sequences b) require start for possibly empty sequences
I don't have a preference for either, so for compatibility's sake I would vote to keep the current one, which is a). It also stands to reason that buggy usage in case b) is harder to detect, since the common case will not uncover the bug (the sequence being nonempty), while for case a) it does.
I prefer (b). The problem with requiring `start` for sequences of non-numerical objects is that you now have to go out and create a "zero object" of the same type as your other objects. The object class might not even have a concept of a "zero object".
If the objects can be summed, shouldn't there also be a zero object? Does anyone have an example when that's not possible?
MRAB <python@...> writes:
I prefer (b). The problem with requiring `start` for sequences of non- numerical objects is that you now have to go out and create a "zero object" of the same type as your other objects. The object class might not even have a concept of a "zero object".
If the objects can be summed, shouldn't there also be a zero object? Does anyone have an example when that's not possible?
You're right MRAB, probably almost every object type that has a concept of "addition" will have a concept of a zero element. BUT, that zero object has to be created by the user of `sum`, and that has two problems: 1. The user might not know from beforehand which type of object he's adding. Even within the same type there might be problems. What happens when the user is using `sum` to add a bunch of vectors, and he doesn't know from beforehand what the dimensions of the vectors are? How will he know if his zero element should be Vector([0, 0]) or Vector([0, 0, 0]) 2. A smaller problem: The user has to actually create that zero object now, and for some objects the definition might be lengthy, adding needless complexity to the code. Also, using the `start` has some overhead, for creating the zero object and calling __add__. Ram.
2009/12/5 Ram Rachum <cool-rr@cool-rr.com>:
MRAB <python@...> writes:
I prefer (b). The problem with requiring `start` for sequences of non- numerical objects is that you now have to go out and create a "zero object" of the same type as your other objects. The object class might not even have a concept of a "zero object".
If the objects can be summed, shouldn't there also be a zero object? Does anyone have an example when that's not possible?
You're right MRAB, probably almost every object type that has a concept of "addition" will have a concept of a zero element.
BUT, that zero object has to be created by the user of `sum`, and that has two problems:
1. The user might not know from beforehand which type of object he's adding. Even within the same type there might be problems. What happens when the user is using `sum` to add a bunch of vectors, and he doesn't know from beforehand what the dimensions of the vectors are? How will he know if his zero element should be Vector([0, 0]) or Vector([0, 0, 0])
Ugly, but works: itr = iter(sequence) sum(itr, itr.next()) This is actually a good example in favor of not requiring a start value.
Vitor Bosshard schrieb:
2009/12/5 Ram Rachum <cool-rr@cool-rr.com>:
MRAB <python@...> writes:
I prefer (b). The problem with requiring `start` for sequences of non- numerical objects is that you now have to go out and create a "zero object" of the same type as your other objects. The object class might not even have a concept of a "zero object".
If the objects can be summed, shouldn't there also be a zero object? Does anyone have an example when that's not possible?
You're right MRAB, probably almost every object type that has a concept of "addition" will have a concept of a zero element.
BUT, that zero object has to be created by the user of `sum`, and that has two problems:
1. The user might not know from beforehand which type of object he's adding. Even within the same type there might be problems. What happens when the user is using `sum` to add a bunch of vectors, and he doesn't know from beforehand what the dimensions of the vectors are? How will he know if his zero element should be Vector([0, 0]) or Vector([0, 0, 0])
Ugly, but works:
itr = iter(sequence) sum(itr, itr.next())
Or, for sequences: sum(islice(seq, 1), seq[0]) which clearly communicates the need for a non-empty sequence. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
I prefer (b). The problem with requiring `start` for sequences of non- numerical objects is that you now have to go out and create a "zero object" of the same type as your other objects. The object class might not even have a concept of a "zero object".
If the objects can be summed, shouldn't there also be a zero object?
Use a single univeral zero object that works for everything. Here's an example from my earlier post:
class Zero: ... 'universal zero for addition' ... def __add__(self, other): ... return other ... def __radd__(self, other): ... return other ... Zero() + 'xyz' 'xyz' sum(['xyz', 'pdq'], Zero()) 'xyzpdq'
Raymond
Raymond Hettinger wrote:
I prefer (b). The problem with requiring `start` for sequences of non- numerical objects is that you now have to go out and create a "zero object" of the same type as your other objects. The object class might not even have a concept of a "zero object".
If the objects can be summed, shouldn't there also be a zero object?
Use a single univeral zero object that works for everything. Here's an example from my earlier post:
class Zero: ... 'universal zero for addition' ... def __add__(self, other): ... return other ... def __radd__(self, other): ... return other ... Zero() + 'xyz' 'xyz' sum(['xyz', 'pdq'], Zero()) 'xyzpdq'
I would not have expected this to work, as it does not match "The iterable‘s items are normally numbers, and are not allowed to be strings." It appears that it is the start value that may not be a string. I suggest a doc fix in http://bugs.python.org/issue7447 FWIW, sum was designed for summing numbers at C speed. I think it probably is as good a compromise as we can get. It is easy to program any other exact behavior one wants, and summing user objects is going to go at Python speed anyway. Certainly, none of the suggested alterations strike me as worth breaking code. Terry Jan Reedy
["Terry Reedy"]
FWIW, sum was designed for summing numbers at C speed. I think it probably is as good a compromise as we can get. It is easy to program any other exact behavior one wants, and summing user objects is going to go at Python speed anyway. Certainly, none of the suggested alterations strike me as worth breaking code.
Wisely spoken. Raymond
Ram Rachum wrote:
I prefer (b). The problem with requiring `start` for sequences of non-numerical objects is that you now have to go out and create a "zero object" of the same type as your other objects. The object class might not even have a concept of a "zero object".
class _AdditiveIdentity(object): def __add__(self, other): return other __radd__ = __add__ AdditiveIdentity = _AdditiveIdentity() total = sum(itr, start=AdditiveIdentity) if total is AdditiveIdentity: # Iterable was empty else: # we got a real result (Raymond already posted along these lines, but I wanted to point out that by making the identity object a singleton you can save the cost of repeated instantiation and simplify the after-the-fact check for an empty iterable) The other philosophical point here is one Guido has expressed several times in the past: "In general, the type of a return value should not depend on the *value* of an argument" (although the different numeric types tend to blur together a bit in this specific context) With only a default value, sum() could return entirely different types based on whether or not the sequence was empty. With a start value, on the other hand, the type returned must at least be one that is compatible under addition with the start value. You can subvert that a bit through the use of a universal additive identity, but it holds short of that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Georg Brandl wrote:
Ram Rachum schrieb:
Sometimes you might find that the list you're summing is empty. Because 'sum' is most often used with numbers, the default sum of a list is 0. If you want to sum a list of non-numbers, provide a suitable start value. For example, to sum a list of lists a suitable start value is []:
sum([[0, 1], [2, 3]], []) [0, 1, 2, 3]
I agree that it would be nice if the start value could just be omitted, but then what should 'sum' return if the list is empty?
I see the problem. I think a good solution would be to tell the user, "If you want `sum` to be able to handle a non-empty list, you must supply `start`." Users that want to add up a (possibly empty) sequence of numbers will have to specify `start`.
If start is supplied, it will work like it does now. If start isn't supplied, it will add up all the elements without adding any `start` to them.
What do you think?
(sorry, pressed wrong key)
There is a choice between these two variants:
a) require start for non-numerical sequences b) require start for possibly empty sequences
I don't have a preference for either, so for compatibility's sake I would vote to keep the current one, which is a). It also stands to reason that buggy usage in case b) is harder to detect, since the common case will not uncover the bug (the sequence being nonempty), while for case a) it does.
True, providing start will ensure that the result is of the correct class, instead of it sometimes being an int, causing a TypeError later on.
On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <cool-rr@cool-rr.com> wrote:
I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip adding any start value if none is specified?
This current behavior is preventing me from using `sum` to add up a bunch of non- number objects.
In your proposed implementation, sum([]) would be undefined. -- André Engels, andreengels@gmail.com
On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels@gmail.com> wrote:
On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <cool-rr@cool-rr.com> wrote:
I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip adding any start value if none is specified?
This current behavior is preventing me from using `sum` to add up a bunch of non- number objects.
In your proposed implementation, sum([]) would be undefined.
Which would make it consistent with min/max. George
2009/12/5 George Sakkis <george.sakkis@gmail.com>:
On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels@gmail.com> wrote:
On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <cool-rr@cool-rr.com> wrote:
I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip adding any start value if none is specified?
This current behavior is preventing me from using `sum` to add up a bunch of non- number objects.
In your proposed implementation, sum([]) would be undefined.
Which would make it consistent with min/max.
And in that case the special string handling could also be dropped?
sum(["a","b"], "start") Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> sum(["a","b"], "start") TypeError: sum() can't sum strings [use ''.join(seq) instead]
This behaviour is quite bothersome. Sum can handle arbitrary objects in theory (as long as they define the correct special methods, etc.), but it gratuitously raises an exception on strings. This behaviour is also inconsistent with the following:
sum(["a","b"]) Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> sum(["a","b"]) TypeError: unsupported operand type(s) for +: 'int' and 'str'
Where sum actually tries to add "a" to the default value of 0.
Vitor Bosshard schrieb:
2009/12/5 George Sakkis <george.sakkis@gmail.com>:
On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels@gmail.com> wrote:
On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <cool-rr@cool-rr.com> wrote:
I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip adding any start value if none is specified?
This current behavior is preventing me from using `sum` to add up a bunch of non- number objects.
In your proposed implementation, sum([]) would be undefined.
Which would make it consistent with min/max.
And in that case the special string handling could also be dropped?
sum(["a","b"], "start") Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> sum(["a","b"], "start") TypeError: sum() can't sum strings [use ''.join(seq) instead]
This behaviour is quite bothersome. Sum can handle arbitrary objects in theory (as long as they define the correct special methods, etc.), but it gratuitously raises an exception on strings.
This seems to be an instance where the "practicality" Zen rule beats the "special cases" rule :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
2009/12/5 Georg Brandl <g.brandl@gmx.net>:
Vitor Bosshard schrieb:
2009/12/5 George Sakkis <george.sakkis@gmail.com>:
On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels@gmail.com> wrote:
On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <cool-rr@cool-rr.com> wrote:
I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip adding any start value if none is specified?
This current behavior is preventing me from using `sum` to add up a bunch of non- number objects.
In your proposed implementation, sum([]) would be undefined.
Which would make it consistent with min/max.
And in that case the special string handling could also be dropped?
sum(["a","b"], "start") Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> sum(["a","b"], "start") TypeError: sum() can't sum strings [use ''.join(seq) instead]
This behaviour is quite bothersome. Sum can handle arbitrary objects in theory (as long as they define the correct special methods, etc.), but it gratuitously raises an exception on strings.
This seems to be an instance where the "practicality" Zen rule beats the "special cases" rule :)
It might be more accurate to say "hand-holding" instead of practicality (and it doesn't even catch all errors it's meant to). I'm not so sure that's special enough ;-) Vitor
On Sat, Dec 5, 2009 at 10:23, Vitor Bosshard <algorias@gmail.com> wrote:
And in that case the special string handling could also be dropped?
sum(["a","b"], "start") Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> sum(["a","b"], "start") TypeError: sum() can't sum strings [use ''.join(seq) instead]
This behaviour is quite bothersome. Sum can handle arbitrary objects in theory (as long as they define the correct special methods, etc.), but it gratuitously raises an exception on strings. This behaviour is also inconsistent with the following:
sum(["a","b"]) Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> sum(["a","b"]) TypeError: unsupported operand type(s) for +: 'int' and 'str'
Where sum actually tries to add "a" to the default value of 0.
sum is defined by repeatedly adding each number in a sequence. As each number is usually constant, and the size of total grows logarithmically, this is O(n log n) (but due to implementation coarseness it usually isn't distinguished from O(n)). Concatenation however grows the total's size very quickly. You instead get a performance of O(n**2). Same result, wrong algorithm. It would be possible to special case strings, but why? The programmer should know what algorithm they're using and what complexity class it has, so they can pick the right one (''.join(seq) in this case). IOW, handling arbitrary objects is an illusion. For an another example on why the programmer needs to understand the algorithmic complexity of the operations they're using, and that the language should value performance consistency and not just correct output, see ABC's usage of rational numbers: http://python-history.blogspot.com/2009/03/problem-with-integer-division.htm... -- Adam Olsen, aka Rhamphoryncus
2009/12/5 Adam Olsen <rhamph@gmail.com>:
On Sat, Dec 5, 2009 at 10:23, Vitor Bosshard <algorias@gmail.com> wrote:
And in that case the special string handling could also be dropped?
sum(["a","b"], "start") Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> sum(["a","b"], "start") TypeError: sum() can't sum strings [use ''.join(seq) instead]
This behaviour is quite bothersome. Sum can handle arbitrary objects in theory (as long as they define the correct special methods, etc.), but it gratuitously raises an exception on strings. This behaviour is also inconsistent with the following:
sum(["a","b"]) Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> sum(["a","b"]) TypeError: unsupported operand type(s) for +: 'int' and 'str'
Where sum actually tries to add "a" to the default value of 0.
sum is defined by repeatedly adding each number in a sequence. As each number is usually constant, and the size of total grows logarithmically, this is O(n log n) (but due to implementation coarseness it usually isn't distinguished from O(n)).
Concatenation however grows the total's size very quickly. You instead get a performance of O(n**2). Same result, wrong algorithm.
It would be possible to special case strings, but why? The programmer should know what algorithm they're using and what complexity class it has, so they can pick the right one (''.join(seq) in this case). IOW, handling arbitrary objects is an illusion.
I think you misunderstood my point. Sorry if I wasn't clear enough in my original message. I understand the performance characteristics of repeated concatenation vs str.join. I just wonder why the language goes out of its way to catch this particular occurrence of bad code, given there are plenty of ways to misuse sum or any other builtin for that matter. A newbie is more likely to get n**2 performance by using a for loop than sum: final = "" for s in strings: final += s Should python refuse to compile the above snippet? The answer is an emphatic "no".
On Sat, Dec 5, 2009 at 12:19, Vitor Bosshard <algorias@gmail.com> wrote:
I think you misunderstood my point. Sorry if I wasn't clear enough in my original message. I understand the performance characteristics of repeated concatenation vs str.join. I just wonder why the language goes out of its way to catch this particular occurrence of bad code, given there are plenty of ways to misuse sum or any other builtin for that matter. A newbie is more likely to get n**2 performance by using a for loop than sum:
final = "" for s in strings: final += s
Should python refuse to compile the above snippet? The answer is an emphatic "no".
All the individual operations there are fine. It's the composition that's wrong. Adding a sanity check would require recognizing that pattern, and changing the semantics of an individual operation based on what surrounds it. Not a nice thing to do. sum() is already a single operation (regardless of how it's implemented), so it doesn't have that problem. -- Adam Olsen, aka Rhamphoryncus
George Sakkis writes:
On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels@gmail.com> wrote:
In your proposed implementation, sum([]) would be undefined.
Which would make it consistent with min/max.
There's no justification for trying to make 'min' and 'sum' consistent. The sum of an empty list of numbers is a well-defined *number*, namely 0, but the max of an empty list of numbers is a well-defined *non-number*, namely "minus infinity". The real question is "what harm is done by preferring the (well-defined) sum of an empty list of numbers over the (well-defined) empty sums of lists and/or strings?" Then, if there is any harm, "can the situation be improved by having no useful default for empty lists of any type?" Finally, "is it worth breaking existing code to ensure equal treatment of different types?" My guess is that the answers are "very little", "hardly at all", and "emphatically no."<wink>
On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
George Sakkis writes: > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels@gmail.com> wrote:
> > In your proposed implementation, sum([]) would be undefined. > > Which would make it consistent with min/max.
There's no justification for trying to make 'min' and 'sum' consistent. The sum of an empty list of numbers is a well-defined *number*, namely 0, but the max of an empty list of numbers is a well-defined *non-number*, namely "minus infinity".
The real question is "what harm is done by preferring the (well-defined) sum of an empty list of numbers over the (well-defined) empty sums of lists and/or strings?" Then, if there is any harm, "can the situation be improved by having no useful default for empty lists of any type?" Finally, "is it worth breaking existing code to ensure equal treatment of different types?"
My guess is that the answers are "very little", "hardly at all", and "emphatically no."<wink>
Agreed that there is little harm in preferring numbers over other types when it comes to empty sequences, but the more important question is "should the start argument be used even if the sequence is *not* empty?". The OP doesn't think so and I agree. George
2009/12/5 George Sakkis <george.sakkis@gmail.com>:
Agreed that there is little harm in preferring numbers over other types when it comes to empty sequences, but the more important question is "should the start argument be used even if the sequence is *not* empty?". The OP doesn't think so and I agree.
In that case, "default" would be a more appropriate name than "start". That change of concept is a potential break in compatibility. How often is the start argument given as a non-zero value? Not all that often I suppose, but it's still a valid use-case. Ergo, the start argument should never be omitted if it was explicitly set.
On Sat, Dec 5, 2009 at 8:39 PM, Vitor Bosshard <algorias@gmail.com> wrote:
2009/12/5 George Sakkis <george.sakkis@gmail.com>:
Agreed that there is little harm in preferring numbers over other types when it comes to empty sequences, but the more important question is "should the start argument be used even if the sequence is *not* empty?". The OP doesn't think so and I agree.
In that case, "default" would be a more appropriate name than "start". That change of concept is a potential break in compatibility. How often is the start argument given as a non-zero value? Not all that often I suppose, but it's still a valid use-case. Ergo, the start argument should never be omitted if it was explicitly set.
Ok I see the different semantics between 'start' and 'default' and the use cases for each but at the end of the day there should be a way (preferably the default) that given a sequence [x1, ..., xN] one can compute "x1+...+xN" instead of "start+x1+...+xN". George
George Sakkis <george.sakkis@gmail.com> wrote:
On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
George Sakkis writes: > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels@gmail.com> wrote:
> > In your proposed implementation, sum([]) would be undefined. > > Which would make it consistent with min/max.
There's no justification for trying to make 'min' and 'sum' consistent. The sum of an empty list of numbers is a well-defined *number*, namely 0, but the max of an empty list of numbers is a well-defined *non-number*, namely "minus infinity".
The real question is "what harm is done by preferring the (well-defined) sum of an empty list of numbers over the (well-defined) empty sums of lists and/or strings?" Then, if there is any harm, "can the situation be improved by having no useful default for empty lists of any type?" Finally, "is it worth breaking existing code to ensure equal treatment of different types?"
My guess is that the answers are "very little", "hardly at all", and "emphatically no."<wink>
Agreed that there is little harm in preferring numbers over other types when it comes to empty sequences, but the more important question is "should the start argument be used even if the sequence is *not* empty?". The OP doesn't think so and I agree.
Or perhaps, the *default* start value should not be used if it doesn't match in type the first element of a non-empty sequence. An explicitly specified start value should still be used even if the sequence is *not* empty. Bill
Bill Janssen wrote:
George Sakkis <george.sakkis@gmail.com> wrote:
On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
George Sakkis writes:
On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels@gmail.com> wrote:
In your proposed implementation, sum([]) would be undefined.
Which would make it consistent with min/max.
There's no justification for trying to make 'min' and 'sum' consistent. The sum of an empty list of numbers is a well-defined *number*, namely 0, but the max of an empty list of numbers is a well-defined *non-number*, namely "minus infinity".
The real question is "what harm is done by preferring the (well-defined) sum of an empty list of numbers over the (well-defined) empty sums of lists and/or strings?" Then, if there is any harm, "can the situation be improved by having no useful default for empty lists of any type?" Finally, "is it worth breaking existing code to ensure equal treatment of different types?"
My guess is that the answers are "very little", "hardly at all", and "emphatically no."<wink> Agreed that there is little harm in preferring numbers over other types when it comes to empty sequences, but the more important question is "should the start argument be used even if the sequence is *not* empty?". The OP doesn't think so and I agree.
Or perhaps, the *default* start value should not be used if it doesn't match in type the first element of a non-empty sequence. An explicitly specified start value should still be used even if the sequence is *not* empty.
Currently if start is None then the result is None if the sequence is empty, but raises a TypeError otherwise. Would it break any existing code if was this instead: sum(sequence, start=0) If start is None then it's omitted from the summation, unless the sequence is empty, in which case the result is None.
On Sat, Dec 5, 2009 at 11:23, George Sakkis <george.sakkis@gmail.com> wrote:
On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
George Sakkis writes: > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels@gmail.com> wrote:
> > In your proposed implementation, sum([]) would be undefined. > > Which would make it consistent with min/max.
There's no justification for trying to make 'min' and 'sum' consistent. The sum of an empty list of numbers is a well-defined *number*, namely 0, but the max of an empty list of numbers is a well-defined *non-number*, namely "minus infinity".
The real question is "what harm is done by preferring the (well-defined) sum of an empty list of numbers over the (well-defined) empty sums of lists and/or strings?" Then, if there is any harm, "can the situation be improved by having no useful default for empty lists of any type?" Finally, "is it worth breaking existing code to ensure equal treatment of different types?"
My guess is that the answers are "very little", "hardly at all", and "emphatically no."<wink>
Agreed that there is little harm in preferring numbers over other types when it comes to empty sequences, but the more important question is "should the start argument be used even if the sequence is *not* empty?". The OP doesn't think so and I agree.
Only sometimes adding the start value makes it more fragile. If you have Foo() objects that aren't compatible with int and you do sum([Foo(), Foo()]) you get a Foo() back. If your sequence then happens to be empty you do sum([]) and get an int back. The result is likely to be used in a context that's not compatible with int either. Better always fail and require an explicit start if you need it. -- Adam Olsen, aka Rhamphoryncus
[Ram Rachum]
I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip adding any start value if none is specified?
Once the API has been released, it is difficult to change without breaking code.
This current behavior is preventing me from using `sum` to add up a bunch of non- number objects.
You have plenty of options: * use sum() as designed and supply your own Zero object as a start (see below) * use reduce(operator.add, s) * write a simple for-loop to do summing It's not like summing is a hard task. There's nothing in you situation that would warrant changing the behavior of a published API where sum(s) is defined even when s is of length zero or one. Raymond ------------------------------------
class Zero: ... 'universal zero for addition' ... def __add__(self, other): ... return other ... def __radd__(self, other): ... return other ... Zero() + 'xyz' 'xyz' sum(['xyz pdq'], Zero()) 'xyz pdq'
participants (12)
-
Adam Olsen
-
Andre Engels
-
Bill Janssen
-
Georg Brandl
-
George Sakkis
-
MRAB
-
Nick Coghlan
-
Ram Rachum
-
Raymond Hettinger
-
Stephen J. Turnbull
-
Terry Reedy
-
Vitor Bosshard