[Python-ideas] Another attempt at a sum() alternative: the concatenation protocol

Ron Adam ron3200 at gmail.com
Tue Jul 16 14:59:42 CEST 2013



On 07/16/2013 06:06 AM, Oscar Benjamin wrote:
> On 16 July 2013 11:37, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>>
>> On 16 Jul, 2013, at 12:21, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
>>
>>> On 16 July 2013 07:50, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>>> I haven't been following the sum() threads fully, but something Ron
>>>> suggested gave me an idea for a concatenation API and protocol. I
>>>> think we may also be able to use a keyword-only argument to solve the
>>>> old string.join vs str.join problem in a more intuitive way.
>>>
>>> The sum() threads have highlighted one and only one problem which is
>>> that people are often using (or at least suggesting to use) sum() in
>>> order to concatenate sequences even though it has quadratic
>>> performance for this. The stdlib already has a solution for this:
>>> chain. No one in the sum threads has raised any issue with using chain
>>> (or chain.from_iterable) except to argue that it is not widely used.
>>>
>>> If people are using sum() to concatenate lists then this should be
>>> taken not as evidence that a new solution needs to be found but as
>>> evidence that chain is not sufficiently well-known. The obvious
>>> solution to that is not to implement a new protocol but to make the
>>> existing solution more well known i.e. move chain.from_iterable to
>>> builtins and rename it (the obvious choice being concat).

Yes, currently chain is the best way to do this.  And no, concat would not 
be a good name for a relocated chain unless it's also wrapped in a 
constructor to give an object instead of a generator.  This isn't the idea 
that is being suggested.


>>>>     def concat(start, iterable, *, interleave=None):
>>>>         try:
>>>>             build = start.__concat__
>>>>         except AttributeError:
>>>>             result = start
>>>>             if interleave is None:
>>>>                 for x in iterable:
>>>>                     result += x
>>>>             else:
>>>>                 for x in iterable:
>>>>                     result += interleave
>>>>                     result += x
>>>>         else:
>>>>             result = build(iterable, interleave=interleave)
>>>
>>> That doesn't seem like a very nice signature e.g.:
>>>
>>>    concat(lines[0], lines[1:], interleave='\n')
>>>
>>> is not as good as
>>>
>>>     '\n'.join(lines)

That will still work, and concat wouldn't join lines like this.  Although I 
think a lot of people may complain about that.  Concatination, "concat" is 
associated fairly strongly with strings, so it would be a surprise if it 
didn't do strings with that name.  But this is a nit=pic, and we may be 
able to come up with a better name that does't have that baggage.

>>> It's worse with an iterator:
>>>
>>>     it = iter(iterable)
>>>     try:
>>>         start = next(it)
>>>     except StopIteration:
>>>         result = ''
>>>     else:
>>>         result = concat(start, it, interleave=sep)
>>>
>>> Or have I misunderstood?
>>
>> concat('', iterable, interleave=sep) should work.
>
> Not with the code as shown. The result would be prepended with sep.

It would be a TypeError.

The part you are misunderstanding is this all depends on weather or not a 
builtin version of this can be significantly faster than chain.  And/or if 
there is enough use cases where this will be beneficial.

Ideas like this don't just get in automatically, they still need to be 
"worth it".

Cheers,
    Ron



More information about the Python-ideas mailing list