restriction on sum: intentional bug?

Tim Chase python.list at tim.thechases.com
Fri Oct 16 15:40:23 EDT 2009


Stephen Hansen wrote:
> There really is just a right way verses a wrong way to join strings
> together; using + is always the wrong way. Sometimes that level of 'wrong'
> is so tiny that no one cares, like if you are using it to join together two
> small strings. But when joining together a sequence of strings, the wrong
> amplifies to become /clearly/ wrong.

Then I'm fine with sum() being smart enough to recognize this 
horrid case and do the "right" thing by returning ''.join() 
instead.  If sum() were limited to int/floats like some 
array/numpy functions explicitly claim, that would be an "oh, we 
only handle these specific things and nothing else".  But sum() 
is defined over "things that have an __add__ method", and strings 
have an __add__ method, making this breakage purely for 
breakage's sake.

   >>> class W:
   ...     def __init__(self, s):
   ...             self.s = s
   ...     def __add__(self, other):
   ...             return W(self.s + other.s)
   ...     def __repr__(self): return "<W(%r)>" % self.s
   ...     def __str__(self): return self.s
   ...
   >>> lst = [W('hello'), W('world'), W('foo')]
   >>> print sum(lst, W(''))
   helloworldfoo

It's not an error (that it *can't* be done)...it's just plain 
ornery :)

>>  count = 0
>>  for i in range(1000000):
>>    if i % 1000: count += 1
>>
>> instead of specifying the step-size?  Or even forcing me to precompute this
>> constant value for `count` because looping is inefficient in this case?
>
> That comparison is apples to... rocket launchers.
> 
> The case with sum has nothing at all to do with the the above example or it
> maybe one day trying to "force" you into doing one thing or the other in the
> name of Efficiency-- or start going down some data-hiding road.

For sum() to error out because strings are a special-case of 
inefficiency, the above loop should error out too because it's 
much more efficient to just say

   count = 999000

To look at the "for" loop version and tell me that's dumb is 
exactly why I feel the sum() case is dumb.  If I have performance 
problems because I'm sum()ing strings when I should be 
''.join()ing them, it's my responsibility to read the docs on 
sum() and see that's a foolish thing for me to be doing.  But 
don't tell me I *can't* do dumb things.

> Yes, sum() is doing some "hand holding" here, but only in one specific case:
> because its -always-wrong- to use it in that case.

What's always wrong is giving me an *error* when the semantics 
are perfectly valid.  I don't care if the implementation is

   def sum(iterable, default=0):
     if is_instance(default, base_string):
       return ''.join(iterable)
     else:
       result = default
       for item in iterable:
         result += item
       return result

to do the "right" thing of performing __add__ on all the elements 
of the iterable unless it's a string.  If you want to 
special-case strings to perform a ''.join() the go right ahead.

> The "consenting adults" argument sort of applies, sure. But these general
> principles aren't absolutes. None of them are. In this case, someone decided
> that it was way too easy for someone to NOT know that this is wrong and make
> a mistake.

Among consenting adults, it's not "wrong".  You'll just discover 
there are better ways when your sum() becomes a hot-spot for CPU 
cycles.

Just a burr in my boots.

-tkc






More information about the Python-list mailing list