New subject: [Python-Dev] itertools.chunks(iterable, size, fill=None)

July 5, 2012

      On Wed, Jul 4, 2012 at 9:31 PM, Terry Reedy <tjreedy@udel.edu> wrote:
...
On 7/4/2012 5:57 AM, anatoly techtonik wrote:
...
On Fri, Jun 29, 2012 at 11:32 PM, Georg Brandl <g.brandl@gmx.net> wrote:
...
...
Anatoly, so far there were no negative votes -- would you care to go
another step and propose a patch?
Was about to say "no problem",
Did you read that there *are* strong negative votes? And that this idea has
been rejected before? I summarized the objections in my two responses and
pointed to the tracker issues. One of the objections is that there are 4
different things one might want if the sequence length is not an even
multiple of the chunk size. Your original 'idea' did not specify.
I actually meant that there is a problem to propose a patch in the
sense of getting checkout, working on a diff, sending it by attaching
to bug tracker as developer guide says.
...
...
For now the best thing I can do (I don't risk even to mention anything
with 3.3) is to copy/paste code from the docs here:
from itertools import izip_longest
def chunks(iterable, size, fill=None):
     """Split an iterable into blocks of fixed-length"""
     # chunks('ABCDEFG', 3, 'x') --> ABC DEF Gxx
     args = [iter(iterable)] * size
     return izip_longest(fillvalue=fill, *args)
Python ideas is about Python 3 ideas. Please post Python 3 code.
This is actually a one liner
return zip_longest(*[iter(iterable)]*size, fillvalue=file)
We don't generally add such to the stdlib.
Can you figure out from the code what this stuff does?
It doesn't give chunks of strings.
...
...
BTW, this doesn't work as expected (at least for strings). Expected is:
   chunks('ABCDEFG', 3, 'x') --> 'ABC' 'DEF' 'Gxx'
got:
   chunks('ABCDEFG', 3, 'x') --> ('A' 'B' 'C') ('D' 'E' 'F') ('G' 'x' 'x')
One of the problems with idea of 'add a chunker' is that there are at least
a dozen variants that different people want.
That's not the problem. People always want something extra. The
problem that we don't have a real wish distribution. If 1000 people
want chunks and 1 wants groups with exception - we still account these
as equal variants.

Therefore my idea is deliberately limited to "string to chunks" user
story, and SO implementation proposal.
...
I discussed the problem of
return types issue in my responses. I showed how to get the 'expected'
response above using grouper, but also suggested that it is the wrong basis
for splitting strings. Repeated slicing make more sense for concrete
sequence types.
def seqchunk_odd(s, size):
    # include odd size left over
    for i in range(0, len(s), size):
        yield s[i:i+size]
print(list(seqchunk_odd('ABCDEFG', 3)))
#
['ABC', 'DEF', 'G']
Right. That's the top answer on SO that people think should be in
stdlib. Great we are talking about the same thing actually.
...
def seqchunk_even(s, size):
    # only include even chunks
    for i in range(0, size*(len(s)//size), size):
        yield s[i:i+size]
print(list(seqchunk_even('ABCDEFG', 3)))
#
['ABC', 'DEF']
This is deducible from seqchunk_odd(s, size)
...
def strchunk_fill(s, size, fill):
    # fill odd chunks
    q, r = divmod(len(s), size)
    even = size * q
    for i in range(0, even, size):
        yield s[i:i+size]
    if size != even:
        yield s[even:] + fill * (size - r)
print(list(strchunk_fill('ABCDEFG', 3, 'x')))
#
['ABC', 'DEF', 'Gxx']
Also deducible from seqchunk_odd(s, size)
...
Because the 'fill' value is necessarily a sequence for strings,
strchunk_fill would only work for lists and tuples if the fill value were
either required to be given as a tuple or list of length 1 or if it were
internally converted inside the function. Skipping that for now.
Having written the fill version based on the even version, it is easy to
select among the three behaviors by modifying the fill version.
def strchunk(s, size, fill=NotImplemented):
    # fill odd chunks
    q, r = divmod(len(s), size)
    even = size * q
    for i in range(0, even, size):
        yield s[i:i+size]
    if size != even and fill is not NotImplemented:
        yield s[even:] + fill * (size - r)
print(*strchunk('ABCDEFG', 3))
print(*strchunk('ABCDEFG', 3, ''))
print(*strchunk('ABCDEFG', 3, 'x'))
#
ABC DEF
ABC DEF G
ABC DEF Gxx
I now don't even think that fill value is needed as argument.
if len(chunk) < size:
  chunk.extend( [fill] * ( size - len(chunk)) )
...
I already described how something similar could be done by checking each
grouper output tuple for a fill value, but that requires that the fill value
be a sentinal that could not otherwise appear in the tuple. One could modify
grouper to fill with a private object() and check the last item of each
group for that sentinal and act accordingly (delete, truncate, or replace).
A generic api needs some thought, though.
I just need to chunk strings and sequences. Generic API is too complex
without counting all usecases and iterating over them.
...
An issue I did not previously mention is that people sometimes want
overlapping chunks rather than contiguous disjoint chunks. The slice
approach trivially adapts to that.
def seqlap(s, size):
    for i in range(len(s)-size+1):
        yield s[i:i+size]
print(*seqlap('ABCDEFG', 3))
#
ABC BCD CDE DEF EFG
A sliding window for a generic iterable requires a deque or ring buffer
approach that is quite different from the zip-longest -- grouper approach.
That's why I'd like to drastically reduce the scope of proposal.
itertools doesn't seem to be the best place anymore. How about
sequence method?

   string.chunks(size)  -> ABC DEF G
   list.chunks(size) -> [A,B,C], [C,D,E],[G]

If somebody needs a keyword argument - this can come later without
breaking compatibility.

Re: [Python-ideas] [Python-Dev] itertools.chunks(iterable, size, fill=None)

anatoly techtonik

Steven D'Aprano

Steven D'Aprano

tags

participants (2)