[Python-ideas] The non-obvious nature of str.join (was Re: sum(...) limitation)
Stephen J. Turnbull
stephen at xemacs.org
Tue Aug 12 07:51:19 CEST 2014
Nathaniel Smith writes:
> I don't have any data here, but I bet people who know about str.join
> (even for its natural use cases like ", ".join(...)) outnumber the
> people who know that sum() takes a second argument by a very large
> factor.
This is easy to fix, it now occurs to me. Allow types with __add__ to
provide an optional __sum__ method, and give the numeric ABC a default
__sum__ implementation. (It would be nice if it could check for
floats and restart with fsum if one is encountered. And of course,
there may be other ABCs with __add__ that could get a default __sum__.)
Then sum could be just
def sum(itr, start=0):
if start = 0:
itr = iter(itr)
start = next(itr)
return start.__sum__(itr)
and
class str(...):
def __sum__(self, itr):
return self + ''.join(itr) # probably can be optimized
Is it really it worth it, though?
> But practically speaking, how would this work? In general str.join and
> sum have different semantics.
sum(iter_of_str) currently doesn't have semantics. The semantics
proponents of sum() seem to expect is precisely ''.join(iter_of_str).
Where's the problem?
> What happens if we descend deep into the iterable and then discover
> a non-string (that might nonetheless still have a + operator)?
We lose, er, an exception is raised. Why is that a problem? I think
most people who want a polymorphic sum() expect it to accept a
homogeneous iterable as the first argument. I don't think they have
expectations that sum will be equivalent to
def new_sum(it, start=0): # compatible signature ;-)
it = iter(it)
result = result or next(it)
for x in it:
result = result + next(it)
return result
for heterogeneous iterables. Among other things, how do you decide
the appropriate return type? start's? That of next(iter(it))? The
"most important" of the types in it? Ask for a BDFL pronouncement at
each invocation?
I suppose you could ask that functions that operate on iterables be
partially applicable in the sense that if they *do* raise on the
"wrong" type, the exception should provide a partial result, the
oddball operand, and an iterable containing the unconsumed operands
as attributes. Then the __sum__ method could handle heterogeneous
operands if it wants to.
Note that partial_sum + oddball may have a different type from the
expected one even if it works. This seems like a recipe for bugs to
me. Are there use cases for such heterogenous sums?
The only exception that might be pretty safe would be a case where you
can coerce the oddball to the partial result's type. But in the
salient case of str, pretty much every x has a str(x). I don't think
that an optimized version of:
def new_sum(iter, start):
expected_type = type(start)
result = start
for x in iter:
try:
result = result + x
except TypeError:
result = result + expected_type(x)
return result
is really what we want when type(start) == str, so it probably
shouldn't be default, and probably not when type(start) is numeric,
either.
More information about the Python-ideas
mailing list