[Python-ideas] The non-obvious nature of str.join (was Re: sum(...) limitation)

Stephen J. Turnbull stephen at xemacs.org
Tue Aug 12 07:51:19 CEST 2014


Nathaniel Smith writes:

 > I don't have any data here, but I bet people who know about str.join
 > (even for its natural use cases like ", ".join(...)) outnumber the
 > people who know that sum() takes a second argument by a very large
 > factor.

This is easy to fix, it now occurs to me.  Allow types with __add__ to
provide an optional __sum__ method, and give the numeric ABC a default
__sum__ implementation.  (It would be nice if it could check for
floats and restart with fsum if one is encountered.  And of course,
there may be other ABCs with __add__ that could get a default __sum__.)

Then sum could be just

    def sum(itr, start=0):
        if start = 0:
            itr = iter(itr)
            start = next(itr)
        return start.__sum__(itr)

and

    class str(...):

        def __sum__(self, itr):
            return self + ''.join(itr)    # probably can be optimized

Is it really it worth it, though?

 > But practically speaking, how would this work? In general str.join and
 > sum have different semantics.

sum(iter_of_str) currently doesn't have semantics.  The semantics
proponents of sum() seem to expect is precisely ''.join(iter_of_str).
Where's the problem?

 > What happens if we descend deep into the iterable and then discover
 > a non-string (that might nonetheless still have a + operator)?

We lose, er, an exception is raised.  Why is that a problem?  I think
most people who want a polymorphic sum() expect it to accept a
homogeneous iterable as the first argument.  I don't think they have
expectations that sum will be equivalent to

def new_sum(it, start=0):        # compatible signature ;-)
    it = iter(it)
    result = result or next(it)
    for x in it:
        result = result + next(it)
    return result

for heterogeneous iterables.  Among other things, how do you decide
the appropriate return type?  start's?  That of next(iter(it))?  The
"most important" of the types in it?  Ask for a BDFL pronouncement at
each invocation?

I suppose you could ask that functions that operate on iterables be
partially applicable in the sense that if they *do* raise on the
"wrong" type, the exception should provide a partial result, the
oddball operand, and an iterable containing the unconsumed operands
as attributes.  Then the __sum__ method could handle heterogeneous
operands if it wants to.

Note that partial_sum + oddball may have a different type from the
expected one even if it works.  This seems like a recipe for bugs to
me.  Are there use cases for such heterogenous sums?

The only exception that might be pretty safe would be a case where you
can coerce the oddball to the partial result's type.  But in the
salient case of str, pretty much every x has a str(x).  I don't think
that an optimized version of:

    def new_sum(iter, start):
        expected_type = type(start)
        result = start
        for x in iter:
            try:
                result = result + x
            except TypeError:
                result = result + expected_type(x)
        return result

is really what we want when type(start) == str, so it probably
shouldn't be default, and probably not when type(start) is numeric,
either.



More information about the Python-ideas mailing list