[Python-ideas] Use the plus operator to concatenate iterators
Steven D'Aprano
steve at pearwood.info
Wed Aug 5 04:15:37 CEST 2015
On Wed, Aug 05, 2015 at 12:22:51AM +0000, Grayson, Samuel Andrew wrote:
> Concatenation is the most fundamental operation that can be done on iterators.
Surely "get next value" is the most fundamental operation that can be
done on interators. Supporting concatenation is not even part of the
definition of iterator.
But having said that, concatenation does make sense as an iterator
method. Python chooses to make that a function, itertools.chain, rather
than a method or operator. See below.
> In fact, we already do that with lists.
>
> [1, 2, 3] + [4, 5, 6]
> # evaluates to [1, 2, 3, 4, 5, 6]
>
> I propose:
>
> iter([1, 2, 3]) + iter([4, 5, 6])
> # evaluates to something like itertools.chain(iter([1, 2, 3]), iter([4, 5, 6]))
> # equivalent to iter([1, 2, 3, 4, 5, 6])
I don't entirely dislike this. I'm not a big fan of Python's choice to
use + for concatenation, but the principle of supporting concatenation
for iterators makes sense.
But, "iterator" isn't a type in Python, it is a protocol, so there are a
whole lot of *different* types that count as iterators, and as far as I
can see, they don't share any common superclass apart from object
itself. I count at least nine in the builtins alone:
range_iterator, list_iterator, tuple_iterator, str_iterator,
set_iterator, dict_keyiterator, dict_valueiterator, dict_itemiterator,
generator
(These are distinct from the types range, list, tuple, etc.)
and custom-made iterators don't have to inherit from any special class,
they just need to obey the protocol.
So where would you put the __add__ and __radd__ methods?
The usual Pythonic solution to the problem of where to put a method that
needs to operate on a lot of disparate types with no shared superclass
is to turn it into a function. We already have that: itertools.chain.
A bonus with chain is that there is no need to manually convert each
argument to an iterator first, it does it for you:
chain(this, that, another)
versus
iter(this) + iter(that) + iter(another)
And the bonus with chain() is that you can start using it *right now*,
and not wait another two years for Python 3.6.
> There is some python2 code where:
>
> a = dict(zip('abcd', range(4)))
> isinstance(a.values(), list)
> alphabet = a.keys() + a.values()
>
> In python2, this `alphabet` becomes a list of all values and keys
>
> In current python3, this raises:
>
> TypeError: unsupported operand type(s) for +: 'dict_keys' and 'dict_values'
>
> But in my proposal, it works just fine. `alphabet` becomes an iterator
> over all values and keys (similar to the python2 case).
dict_keys and dict_values are not iterators, they are set-like views,
and concatenating them does not make sense.
The Python 2 equivalent of Python 3's `a.keys() + a.values()` is
a.viewkeys() + a.viewvalues()
which also raises TypeError, as it should.
Or to put it another way, the Python 3 equivalent of the Python 2 code
is this:
list(a.keys()) + list(a.values())
Either way, since dict keys and values aren't iterators, any change to
the iterator protocol or support for iterator concatenation won't change
them.
--
Steve
More information about the Python-ideas
mailing list