
def chunks(total, step): assert total >= step while total > step: yield step; total -= step; if total: yield total
chunks(12, 4) [4, 4, 4] chunks(13, 4) [4, 4, 4, 1]
I'm not sure how appropriate "chunks" is as a name for such a function. Anyway, I wrote that because in a unit test I had to create a file of a precise size, like this: FILESIZE = (10 * 1024 * 1024) + 423 # 10MB and 423 bytes with open(TESTFN, 'wb') as f: for csize in chunks(FILESIZE, 262144): f.write(b'x' * csize) Now I wonder, would it make sense to have something like this into itertools module? --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/

+1 Very useful function. 2013/4/6 Giampaolo Rodolà <g.rodola@gmail.com>
def chunks(total, step): assert total >= step while total > step: yield step; total -= step; if total: yield total
chunks(12, 4) [4, 4, 4] chunks(13, 4) [4, 4, 4, 1]
I'm not sure how appropriate "chunks" is as a name for such a function. Anyway, I wrote that because in a unit test I had to create a file of a precise size, like this:
FILESIZE = (10 * 1024 * 1024) + 423 # 10MB and 423 bytes with open(TESTFN, 'wb') as f: for csize in chunks(FILESIZE, 262144): f.write(b'x' * csize)
Now I wonder, would it make sense to have something like this into itertools module?
--- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Carlo Pires

On 06/04/2013 13:50, Giampaolo Rodolà wrote:
def chunks(total, step): assert total >= step while total > step: yield step; total -= step; if total: yield total
Why shouldn't total be less than step? def chunks(total, step): while total >= step: yield step total -= step if total > 0: yield total
chunks(12, 4) [4, 4, 4] chunks(13, 4) [4, 4, 4, 1]
I'm not sure how appropriate "chunks" is as a name for such a function. Anyway, I wrote that because in a unit test I had to create a file of a precise size, like this:
FILESIZE = (10 * 1024 * 1024) + 423 # 10MB and 423 bytes with open(TESTFN, 'wb') as f: for csize in chunks(FILESIZE, 262144): f.write(b'x' * csize)
Now I wonder, would it make sense to have something like this into itertools module?

2013/4/6 MRAB <python@mrabarnett.plus.com>:
On 06/04/2013 13:50, Giampaolo Rodolà wrote:
def chunks(total, step): assert total >= step while total > step: yield step; total -= step; if total: yield total
Why shouldn't total be less than step?
def chunks(total, step): while total >= step: yield step total -= step if total > 0: yield total
I agree the assert statement can be removed. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/

Isn't it just divmod(total, step) João Bernardo 2013/4/6 Giampaolo Rodolà <g.rodola@gmail.com>
2013/4/6 MRAB <python@mrabarnett.plus.com>:
On 06/04/2013 13:50, Giampaolo Rodolà wrote:
def chunks(total, step): assert total >= step while total > step: yield step; total -= step; if total: yield total
Why shouldn't total be less than step?
def chunks(total, step): while total >= step: yield step total -= step if total > 0: yield total
I agree the assert statement can be removed.
--- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

2013/4/6 João Bernardo <jbvsmo@gmail.com>:
Isn't it just
divmod(total, step)
João Bernardo
Not really:
list(chunks(13, 4)) [4, 4, 4, 1] divmod(13, 4) (3, 1)
Literally chunks() keeps yielding 'step' until 'total' is reached and makes sure the last yielded item has the correct remainder. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/

On Sat, Apr 6, 2013 at 2:53 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
2013/4/6 João Bernardo <jbvsmo@gmail.com>:
Isn't it just
divmod(total, step)
João Bernardo
Not really:
list(chunks(13, 4)) [4, 4, 4, 1] divmod(13, 4) (3, 1)
I think what João means is you can do: def chunks(total, step): a,b = divmod(total,step) return [step]*a + [b]
chunks(13,4) [4, 4, 4, 1]
Or, to avoid necessarily constructing the list all at once: def chunks(total, step): a,b = divmod(total,step) return itertools.chain(itertools.repeat(step,a), [b])
list(chunks(13,4)) [4, 4, 4, 1]
Nathan
Literally chunks() keeps yielding 'step' until 'total' is reached and makes sure the last yielded item has the correct remainder.
--- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

A function like this is useful, but I don't agree with the name. This name implies to me that it actually yields chunks and not chunk sizes. Maybe call it chunk_sizes? I don't know. Also I find myself often writing helper functions like these: def chunked(sequence,size): i = 0 while True: j = i i += size chunk = sequence[j:i] if not chunk: return yield chunk def chunked_stream(stream,size): while True: chunk = stream.read(size) if not chunk: return yield chunk Maybe these functions should be in the stdlib? Too trivial? On 04/06/2013 02:50 PM, Giampaolo Rodolà wrote:
def chunks(total, step): assert total >= step while total > step: yield step; total -= step; if total: yield total
chunks(12, 4) [4, 4, 4] chunks(13, 4) [4, 4, 4, 1]
I'm not sure how appropriate "chunks" is as a name for such a function. Anyway, I wrote that because in a unit test I had to create a file of a precise size, like this:
FILESIZE = (10 * 1024 * 1024) + 423 # 10MB and 423 bytes with open(TESTFN, 'wb') as f: for csize in chunks(FILESIZE, 262144): f.write(b'x' * csize)
Now I wonder, would it make sense to have something like this into itertools module?
--- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/

From: Mathias Panzenböck <grosser.meister.morti@gmx.net>
Sent: Saturday, April 6, 2013 9:19 PM
Also I find myself often writing helper functions like these:
def chunked(sequence,size): i = 0 while True: j = i i += size chunk = sequence[j:i] if not chunk: return yield chunk
The grouper function in the itertools recipes does the same thing, except that it works for any iterable, not just sequences (and it fills out the last group with an optional fillvalue).
def chunked_stream(stream,size): while True: chunk = stream.read(size) if not chunk: return yield chunk
This is just iter(partial(stream.read, size), '').
Maybe these functions should be in the stdlib? Too trivial?
I personally agree that grouper, and some of the other itertools recipes, should actually be included in the module, so you could just import itertools and call grouper instead of having to copy the 3 lines of code into dozens of different programs. But I personally deal with that by just installing more-itertools off PyPI. As for the original suggestion:
On 04/06/2013 02:50 PM, Giampaolo Rodolà wrote:
def chunks(total, step): assert total >= step while total > step: yield step; total -= step; if total: yield total
I honestly don't think this is very useful. For one thing, if you really need it, it's equivalent to a trivial genexp: min(step, total - chunkstart) for chunkstart in range(0, total, step) For another, I think most obvious uses for it would be better done at a higher level. For example:
FILESIZE = (10 * 1024 * 1024) + 423 # 10MB and 423 bytes with open(TESTFN, 'wb') as f: for csize in chunks(FILESIZE, 262144): f.write(b'x' * csize)
First, is the memory cost of f.write(b'x' * FILESIZE) really an issue for your program? If so, aren't you better off creating an mmap and filling it with x? And if you want to do it with itertools, can't you just chunk repeat(b'x') instead of explicitly generating the lengths and multiplying them? Besides, the logic here is actually a bit hidden. You create a FILESIZE which is 10MB and 423 bytes, and then you use a function that writes the 10MB in groups of 256KB and then writes the 423 bytes. Why not just keep it simple? with open(TESTFN, 'wb') as f: for _ in range(10 * 1024 / 256): f.write(b'x' * (256*1024)) f.write(b'x' * 423)

On 07/04/2013 10:51, Andrew Barnert wrote:
I personally agree that grouper, and some of the other itertools recipes, should actually be included in the module, so you could just import itertools and call grouper instead of having to copy the 3 lines of code into dozens of different programs. But I personally deal with that by just installing more-itertools off PyPI.
For those who aren't aware there's always https://pypi.python.org/pypi/itertools_recipes/0.1 or even https://pypi.python.org/pypi/more-itertools/2.2 -- If you're using GoogleCrap™ please read this http://wiki.python.org/moin/GoogleGroupsPython. Mark Lawrence

On 06/04/13 23:50, Giampaolo Rodolà wrote:
def chunks(total, step): assert total >= step while total > step: yield step; total -= step; if total: yield total [...] Now I wonder, would it make sense to have something like this into itertools module?
Since it doesn't operate on iterators, I don't think it belongs in itertools. It can also be implemented like this: def chunks(total, step): a, b = divmod(total, step) for i in range(a): yield step if b: yield b which is probably also less likely to go wrong if you pass float arguments. -- Steven

Le Sat, 6 Apr 2013 14:50:16 +0200, Giampaolo Rodolà <g.rodola@gmail.com> a écrit :
def chunks(total, step): assert total >= step while total > step: yield step; total -= step; if total: yield total
chunks(12, 4) [4, 4, 4] chunks(13, 4) [4, 4, 4, 1]
I'm not sure how appropriate "chunks" is as a name for such a function. Anyway, I wrote that because in a unit test I had to create a file of a precise size, like this:
FILESIZE = (10 * 1024 * 1024) + 423 # 10MB and 423 bytes with open(TESTFN, 'wb') as f: for csize in chunks(FILESIZE, 262144): f.write(b'x' * csize)
This doesn't sound very useful to me, actually. range() already does what you want, except for the "last chunk" thing. Regards Antoine.

On Sat, Apr 6, 2013 at 3:50 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
def chunks(total, step): assert total >= step while total > step: yield step; total -= step; if total: yield total
chunks(12, 4) [4, 4, 4] chunks(13, 4) [4, 4, 4, 1]
I'm not sure how appropriate "chunks" is as a name for such a function.
This name is better to be reserved for chunking actual data rather than indexes: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenl...
Now I wonder, would it make sense to have something like this into itertools module?
-- anatoly t.
participants (12)
-
anatoly techtonik
-
Andrew Barnert
-
Antoine Pitrou
-
Carlo Pires
-
Giampaolo Rodolà
-
João Bernardo
-
Mark Lawrence
-
Mathias Panzenböck
-
MRAB
-
Nathan Schneider
-
Serhiy Storchaka
-
Steven D'Aprano