restriction on sum: intentional bug?
Tim Chase
python.list at tim.thechases.com
Fri Oct 16 15:40:23 EDT 2009
Stephen Hansen wrote:
> There really is just a right way verses a wrong way to join strings
> together; using + is always the wrong way. Sometimes that level of 'wrong'
> is so tiny that no one cares, like if you are using it to join together two
> small strings. But when joining together a sequence of strings, the wrong
> amplifies to become /clearly/ wrong.
Then I'm fine with sum() being smart enough to recognize this
horrid case and do the "right" thing by returning ''.join()
instead. If sum() were limited to int/floats like some
array/numpy functions explicitly claim, that would be an "oh, we
only handle these specific things and nothing else". But sum()
is defined over "things that have an __add__ method", and strings
have an __add__ method, making this breakage purely for
breakage's sake.
>>> class W:
... def __init__(self, s):
... self.s = s
... def __add__(self, other):
... return W(self.s + other.s)
... def __repr__(self): return "<W(%r)>" % self.s
... def __str__(self): return self.s
...
>>> lst = [W('hello'), W('world'), W('foo')]
>>> print sum(lst, W(''))
helloworldfoo
It's not an error (that it *can't* be done)...it's just plain
ornery :)
>> count = 0
>> for i in range(1000000):
>> if i % 1000: count += 1
>>
>> instead of specifying the step-size? Or even forcing me to precompute this
>> constant value for `count` because looping is inefficient in this case?
>
> That comparison is apples to... rocket launchers.
>
> The case with sum has nothing at all to do with the the above example or it
> maybe one day trying to "force" you into doing one thing or the other in the
> name of Efficiency-- or start going down some data-hiding road.
For sum() to error out because strings are a special-case of
inefficiency, the above loop should error out too because it's
much more efficient to just say
count = 999000
To look at the "for" loop version and tell me that's dumb is
exactly why I feel the sum() case is dumb. If I have performance
problems because I'm sum()ing strings when I should be
''.join()ing them, it's my responsibility to read the docs on
sum() and see that's a foolish thing for me to be doing. But
don't tell me I *can't* do dumb things.
> Yes, sum() is doing some "hand holding" here, but only in one specific case:
> because its -always-wrong- to use it in that case.
What's always wrong is giving me an *error* when the semantics
are perfectly valid. I don't care if the implementation is
def sum(iterable, default=0):
if is_instance(default, base_string):
return ''.join(iterable)
else:
result = default
for item in iterable:
result += item
return result
to do the "right" thing of performing __add__ on all the elements
of the iterable unless it's a string. If you want to
special-case strings to perform a ''.join() the go right ahead.
> The "consenting adults" argument sort of applies, sure. But these general
> principles aren't absolutes. None of them are. In this case, someone decided
> that it was way too easy for someone to NOT know that this is wrong and make
> a mistake.
Among consenting adults, it's not "wrong". You'll just discover
there are better ways when your sum() becomes a hot-spot for CPU
cycles.
Just a burr in my boots.
-tkc
More information about the Python-list
mailing list