[Python-ideas] Add a .chunks() method to sequences

Erik python at lucidity.plus.com
Fri May 5 06:29:47 EDT 2017

Hi Nick,

On 05/05/17 08:29, Nick Coghlan wrote:
> And then given the proposed str.splitgroups() on the one hand, and the
> existing memoryview.cast() on the other, offering
> itertools.itergroups() as a corresponding building block specifically
> for working with streams of regular data would make sense to me -
> that's a standard approach in time-division multiplexing protocols,
> and it also shows up in areas like digital audio processing as well
> (where you're often doing things like shuffling incoming data chunks
> into FFT buffers)

It looks to me like your "itertools.itergroups()" is similar to 
more_itertools.chunked() - with at least one obvious change, see below(*).

If anyone wants to persue this (or any itertools) enhancement, then 
please be aware of the following thread (and in particular the message 
being linked to - and the bug and discussion that it is replying to):


I have been told off for bringing this up already, but I do it again in 
direct response to your suggestion because it seems there is a bar to 
getting something included in itertools and something like "chunked()" 
has already failed to make it. The thing to do is probably to talk 
directly to Raymond to see if there's an acceptable solution first 
before too much work is put into something that may be rejected as being 
too high level.

It may be that a C version of "more_itertools" for things which people 
would find a speedup useful might be a solution (where the 
more_itertools package defers to those built-ins if they exist on the 
version of Python its executing on, otherwise uses its existing 
implementation as a fallback). I am not suggesting implementing the 
_whole_ of more_itertools in C - it's quite large now.

(*) I had implemented itertools.chunked in C before (also for audio 
processing, as it happens) and one thing that I didn't like is the way 
strings get unpacked:

 >>> tuple(more_itertools.chunked("foo bar baz", 2))
(['f', 'o'], ['o', ' '], ['b', 'a'], ['r', ' '], ['b', 'a'], ['z'])

If the chunked/itergroups method checked for the presence of a 
__chunks__ or similar dunder method in the source sequence which returns 
an iterator, then the string class could efficiently yield substrings 
rather than individual characters which then had to be wrapped in a list 
or tuple (which I think is what you wanted itergroups() to do):

 >>> tuple(itertools.chunked("foo bar baz", 2))
('fo', 'o ', 'ba', 'r ', 'ba', 'z')

Similarly, for objects which _represent_ a lot of data but do not 
actually hold those data literally (for example, range objects or even 
memoryviews), the returned chunks can also be representations of the 
data (subranges or subviews) and not the actual rendered data. For 
example, the existing:

 >>> range(10)
range(0, 10)
 >>> tuple(more_itertools.chunked(range(10), 3))
([0, 1, 2], [3, 4, 5], [6, 7, 8], [9])


 >>> tuple(more_itertools.chunked(range(10), 3))
(range(0, 3), range(3, 6), range(6, 9), range(9, 10))

Obviously, with those short strings and ranges one could argue that 
there's no point, but the principle of doing it this way scales better 
than the version that collects all of the data in lists - for things 
like chunks of some sort of "view" object, you would still only have the 
actual data stored once in the original object.

I suppose that one thing to consider is what happens when an iterator is 
passed to the chunked() function. An iterator could have a __chunks__ 
method which returned chunks of the source sequence from the existing 
point in the iteration, however the difference between such an iterator 
and one that _doesn't_ have a __chunks__ method is that in the second 
case the iterator would be consumed by the fall-back code which just 
does what more_itertools.chunked() does now, but in the first it would not.

Perhaps there is a precedent for that particular edge case with 
iterators in a different context.

Hope that helps,

More information about the Python-ideas mailing list