[Python-Dev] [Python-ideas] itertools.chunks(iterable, size, fill=None)

Steven D'Aprano steve at pearwood.info
Thu Jul 5 17:57:17 CEST 2012

anatoly techtonik wrote:
> On Wed, Jul 4, 2012 at 9:31 PM, Terry Reedy <tjreedy at udel.edu> wrote:

>> A sliding window for a generic iterable requires a deque or ring buffer
>> approach that is quite different from the zip-longest -- grouper approach.
> That's why I'd like to drastically reduce the scope of proposal.
> itertools doesn't seem to be the best place anymore. How about
> sequence method?
>    string.chunks(size)  -> ABC DEF G
>    list.chunks(size) -> [A,B,C], [C,D,E],[G]


This is a fairly trivial problem to solve, and there are many variations on 
it. Many people will not find the default behaviour helpful, and will need to 
write their own. Why complicate the API for all sequence types with this?

I don't believe that we should enshrine one variation as a built-in method, 
without any evidence that it is the most useful or common variation. Even if 
there is one variation far more useful than the others, that doesn't 
necessarily mean we ought to make it a builtin method unless it is a 
fundamental sequence operation, has wide applicability, and is genuinely hard 
to write. I don't believe chunking meets *any* of those criteria, let alone 
all three.

Not every six line function needs to be a builtin.

I believe that splitting a sequence (or a string) into fixed-size chunks is 
more of a programming exercise problem than a genuinely useful tool. That does 
not mean that there is never any real use-cases for splitting into fixed-size 
chunks, only that this is the function that *seems* more useful in theory than 
it turns out in practice.

Compare this with more useful sequence/iteration tools, like (say) zip. You 
can hardly write a hundred lines of code without using zip at least once. But 
I bet you can write tens of thousands of lines of code without needing to 
split sequences into fixed chunks like this.

Besides, the name "chunks" is more general than how you are using it. For 
example, I consider chunking to be splitting a sequence up at a various 
delimiters or separators, not at fixed character positions. E.g. "the third 
word of item two of the fourth line" is a chunk.

This fits more with the non-programming use of the term chunk or chunking, and 
has precedence in Apple's Hypertalk language, which literally allowed you to 
talk about words, items and lines of text, each of which are described as chunks.

This might be a good candidate for a utility module made up of assorted useful 
functions, but not for the string and sequence APIs.


More information about the Python-Dev mailing list