Re: [Python-ideas] [Python-Dev] itertools.chunks(iterable, size, fill=None)

On Wed, Jul 4, 2012 at 9:31 PM, Terry Reedy <tjreedy@udel.edu> wrote:
I actually meant that there is a problem to propose a patch in the sense of getting checkout, working on a diff, sending it by attaching to bug tracker as developer guide says.
Can you figure out from the code what this stuff does? It doesn't give chunks of strings.
That's not the problem. People always want something extra. The problem that we don't have a real wish distribution. If 1000 people want chunks and 1 wants groups with exception - we still account these as equal variants. Therefore my idea is deliberately limited to "string to chunks" user story, and SO implementation proposal.
Right. That's the top answer on SO that people think should be in stdlib. Great we are talking about the same thing actually.
This is deducible from seqchunk_odd(s, size)
Also deducible from seqchunk_odd(s, size)
I now don't even think that fill value is needed as argument. if len(chunk) < size: chunk.extend( [fill] * ( size - len(chunk)) )
I just need to chunk strings and sequences. Generic API is too complex without counting all usecases and iterating over them.
That's why I'd like to drastically reduce the scope of proposal. itertools doesn't seem to be the best place anymore. How about sequence method? string.chunks(size) -> ABC DEF G list.chunks(size) -> [A,B,C], [C,D,E],[G] If somebody needs a keyword argument - this can come later without breaking compatibility.

anatoly techtonik wrote:
On Wed, Jul 4, 2012 at 9:31 PM, Terry Reedy <tjreedy@udel.edu> wrote:
-1 This is a fairly trivial problem to solve, and there are many variations on it. Many people will not find the default behaviour helpful, and will need to write their own. Why complicate the API for all sequence types with this? I don't believe that we should enshrine one variation as a built-in method, without any evidence that it is the most useful or common variation. Even if there is one variation far more useful than the others, that doesn't necessarily mean we ought to make it a builtin method unless it is a fundamental sequence operation, has wide applicability, and is genuinely hard to write. I don't believe chunking meets *any* of those criteria, let alone all three. Not every six line function needs to be a builtin. I believe that splitting a sequence (or a string) into fixed-size chunks is more of a programming exercise problem than a genuinely useful tool. That does not mean that there is never any real use-cases for splitting into fixed-size chunks, only that this is the function that *seems* more useful in theory than it turns out in practice. Compare this with more useful sequence/iteration tools, like (say) zip. You can hardly write a hundred lines of code without using zip at least once. But I bet you can write tens of thousands of lines of code without needing to split sequences into fixed chunks like this. Besides, the name "chunks" is more general than how you are using it. For example, I consider chunking to be splitting a sequence up at a various delimiters or separators, not at fixed character positions. E.g. "the third word of item two of the fourth line" is a chunk. This fits more with the non-programming use of the term chunk or chunking, and has precedence in Apple's Hypertalk language, which literally allowed you to talk about words, items and lines of text, each of which are described as chunks. This might be a good candidate for a utility module made up of assorted useful functions, but not for the string and sequence APIs. -- Steven

anatoly techtonik wrote:
On Wed, Jul 4, 2012 at 9:31 PM, Terry Reedy <tjreedy@udel.edu> wrote:
-1 This is a fairly trivial problem to solve, and there are many variations on it. Many people will not find the default behaviour helpful, and will need to write their own. Why complicate the API for all sequence types with this? I don't believe that we should enshrine one variation as a built-in method, without any evidence that it is the most useful or common variation. Even if there is one variation far more useful than the others, that doesn't necessarily mean we ought to make it a builtin method unless it is a fundamental sequence operation, has wide applicability, and is genuinely hard to write. I don't believe chunking meets *any* of those criteria, let alone all three. Not every six line function needs to be a builtin. I believe that splitting a sequence (or a string) into fixed-size chunks is more of a programming exercise problem than a genuinely useful tool. That does not mean that there is never any real use-cases for splitting into fixed-size chunks, only that this is the function that *seems* more useful in theory than it turns out in practice. Compare this with more useful sequence/iteration tools, like (say) zip. You can hardly write a hundred lines of code without using zip at least once. But I bet you can write tens of thousands of lines of code without needing to split sequences into fixed chunks like this. Besides, the name "chunks" is more general than how you are using it. For example, I consider chunking to be splitting a sequence up at a various delimiters or separators, not at fixed character positions. E.g. "the third word of item two of the fourth line" is a chunk. This fits more with the non-programming use of the term chunk or chunking, and has precedence in Apple's Hypertalk language, which literally allowed you to talk about words, items and lines of text, each of which are described as chunks. This might be a good candidate for a utility module made up of assorted useful functions, but not for the string and sequence APIs. -- Steven
participants (2)
-
anatoly techtonik
-
Steven D'Aprano