[Python-ideas] Another attempt at a sum() alternative: the concatenation protocol

Sergey sergemp at mail.ru
Wed Jul 17 17:03:50 CEST 2013


On Jul 16, 2013 Oscar Benjamin wrote:

> On 16 July 2013 07:50, Nick Coghlan wrote:
>> I haven't been following the sum() threads fully, but something Ron
>> suggested gave me an idea for a concatenation API and protocol. I
>> think we may also be able to use a keyword-only argument to solve the
>> old string.join vs str.join problem in a more intuitive way.
>>
>>     def concat(start, iterable, *, interleave=None):
>>         try:
>>             build = start.__concat__
>>         except AttributeError:
>>             result = start
>>             if interleave is None:
>>                 for x in iterable:
>>                     result += x
>>             else:
>>                 for x in iterable:
>>                     result += interleave
>>                     result += x
>>         else:
>>             result = build(iterable, interleave=interleave)

(I assume `return result` in the end)

That's an interesting idea. Somewhat similar to my #4 suggestion with
awful name __init_concatenable_sequence_from_iterable__.

Two questions about this idea:

* What obj.__concat__ is expected to mean? E.g.
    class X:
      def __add__(self, other):
        returns new object being sum of `self` and `other`
  But:
    class X:
      def __concat__(self, <what_is_here?>):
        <what it is expected to return?>

* What should happen for mixed lists, i.e. code:
    concat(["str1", "str2", "str3"])
  looks rather obvious, but what about code:
    concat(["string", some_object, some_other_object])
  Would it raise an error or not?
  If not, what type would be a result of such operation?
  What if that `some_object` is somehow "concatenable" with
  string, while string has no idea how to concat that some_object?

> The sum() threads have highlighted one and only one problem which is
> that people are often using (or at least suggesting to use) sum() in
> order to concatenate sequences even though it has quadratic
> performance for this. The stdlib already has a solution for this:
> chain. No one in the sum threads has raised any issue with using chain
> (or chain.from_iterable) except to argue that it is not widely used.

I did. Here's one of issues.

Imagine a type, that somehow modifies items that it stores, removes
duplicates, or sorts them, or something else, e.g.:
  class aset(set):
      def __add__(self, other):
          return self|other

Now we have a code:
  list_of_sets = [ aset(["item1","item2","item3"]) ] * 1000
  [...]
  for i in sum(list_of_sets, aset()):
      deal_with(i)

If you replace `sum` with `chain` you get something like:
  for i in chain.from_iterable(list_of_sets):
      deal_with(i)

Which works! (that's the worst part) but produces WRONG result!

This example makes `chain` error-prone replacement for `sum`. It does
not make `chain` bad, if you understand what you do you're free to
use `chain`. It just makes `chain` not so good general replacement.

-- 


More information about the Python-ideas mailing list