[Python-Dev] [Python-ideas] itertools.chunks(iterable, size, fill=None)
Steven D'Aprano
steve at pearwood.info
Thu Jul 5 17:57:17 CEST 2012
anatoly techtonik wrote:
> On Wed, Jul 4, 2012 at 9:31 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>> A sliding window for a generic iterable requires a deque or ring buffer
>> approach that is quite different from the zip-longest -- grouper approach.
>
> That's why I'd like to drastically reduce the scope of proposal.
> itertools doesn't seem to be the best place anymore. How about
> sequence method?
>
> string.chunks(size) -> ABC DEF G
> list.chunks(size) -> [A,B,C], [C,D,E],[G]
-1
This is a fairly trivial problem to solve, and there are many variations on
it. Many people will not find the default behaviour helpful, and will need to
write their own. Why complicate the API for all sequence types with this?
I don't believe that we should enshrine one variation as a built-in method,
without any evidence that it is the most useful or common variation. Even if
there is one variation far more useful than the others, that doesn't
necessarily mean we ought to make it a builtin method unless it is a
fundamental sequence operation, has wide applicability, and is genuinely hard
to write. I don't believe chunking meets *any* of those criteria, let alone
all three.
Not every six line function needs to be a builtin.
I believe that splitting a sequence (or a string) into fixed-size chunks is
more of a programming exercise problem than a genuinely useful tool. That does
not mean that there is never any real use-cases for splitting into fixed-size
chunks, only that this is the function that *seems* more useful in theory than
it turns out in practice.
Compare this with more useful sequence/iteration tools, like (say) zip. You
can hardly write a hundred lines of code without using zip at least once. But
I bet you can write tens of thousands of lines of code without needing to
split sequences into fixed chunks like this.
Besides, the name "chunks" is more general than how you are using it. For
example, I consider chunking to be splitting a sequence up at a various
delimiters or separators, not at fixed character positions. E.g. "the third
word of item two of the fourth line" is a chunk.
This fits more with the non-programming use of the term chunk or chunking, and
has precedence in Apple's Hypertalk language, which literally allowed you to
talk about words, items and lines of text, each of which are described as chunks.
This might be a good candidate for a utility module made up of assorted useful
functions, but not for the string and sequence APIs.
--
Steven
More information about the Python-Dev
mailing list