Mailman 3 itertools.chunks(iterable, size, fill=None) - Python-ideas

itertools.chunks(iterable, size, fill=None)

anatoly techtonik

June 26, 2012

8:03 a.m.

Now that Python 3 is all about iterators (which is a user killer feature for Python according to StackOverflow - http://stackoverflow.com/questions/tagged/python) would it be nice to introduce more first class functions to work with them? One function to be exact to split string into chunks. itertools.chunks(iterable, size, fill=None) Which is the 33th most voted Python question on SO - http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenl... P.S. CC'ing to python-dev@ to notify about the thread in python-ideas.

Show replies by date

Georg Brandl

June 2012

8:39 a.m.

On 26.06.2012 10:03, anatoly techtonik wrote:

...

+1. This is already a recipe in the itertools docs (see grouper() on http://docs.python.org/library/itertools#recipes), but it is so often requested (and used) that it is a very good candidate for a stdlib function. Georg

Tal Einat

10:34 a.m.

On Tue, Jun 26, 2012 at 11:03 AM, anatoly techtonik <techtonik@gmail.com> wrote:

...

+1 When working with iterators I have needed this often, and have implemented a similar utility function in many projects. As an example, this is a basic building block in my RunnincCalcs[1] module. - Tal Einat [1] http://bitbucket.org/taleinat/runningcalcs/src/5bf8816d944b/RunningCalcs.py#...

Joao S. O. Bueno

12:42 p.m.

On 26 June 2012 07:34, Tal Einat <taleinat@gmail.com> wrote:

...

On Tue, Jun 26, 2012 at 11:03 AM, anatoly techtonik <techtonik@gmail.com> wrote:

...

...
itertools.chunks(iterable, size, fill=None)

What about

tertools.chunks(iterable, size=None, separator=None, fill=None) Requiring at leas one of size or separator to be set? This would also work for "for x in text.split('\n')" case. js -><-

Simon Sapin

12:58 p.m.

Le 26/06/2012 14:42, Joao S. O. Bueno a écrit :

...

I think that splitting an iterable on some separators or on a chunck size are two completely different functions. Having the same function do either is a bit confusing and I don’t see the benefit. Or is there an use case in passing both parameters? What would it do then, end the chunck after `size` elements or at `separator`, whichever comes first? Regards, -- Simon Sapin

Georg Brandl

8:32 p.m.

On 26.06.2012 10:03, anatoly techtonik wrote:

...

Anatoly, so far there were no negative votes -- would you care to go another step and propose a patch? Georg

Mike Graham

9:36 p.m.

On Fri, Jun 29, 2012 at 4:32 PM, Georg Brandl <g.brandl@gmx.net> wrote:

...

so far there were no negative votes

As far as I know, Raymond Hettinger is the itertools maintainer and he has repeatedly objected to this idea in the past (e.g. http://bugs.python.org/issue6021 ). Hopefully we can get his input again. Mike

Terry Reedy

10:57 p.m.

On 6/29/2012 5:36 PM, Mike Graham wrote:

...

See my other post on this thread. I think people should really spend a few minutes researching before posting repeated ideas. In this case, read the itertools doc to see if such a function exists. It does, in the recipes, as 'grouper'. Has anyone ever before proposed that the recipe be added as a function? Search the tracker for 'itertools grouper'. Besides #6021, there are also http://bugs.python.org/issue1643 'Add group() to itertools' Raymond: "Sorry, I'm not interested in adding this to the module. Discussions to- date on the subject seem to show more interest in playing with grouper variants than in actual use cases. While the recipe given in the docs is somewhat opaque, it runs at C-speed (zero trips around the eval- loop) and it is encapsulated in a re-usable function. Writing this in C does nothing to improve the situation. Also, when people like to play with variants, there is no general agreement on useful requirements (like fill-in behavior or raising an exception on uneven length inputs). Trying to write option to meet all needs (n=2, step=1) makes the code more difficult to learn and use -- see several variants in Alex's Python Cookbook. Another issue is that we have to be very selective about adding tools to the module. Each addition makes the overall toolset harder to use -- it is better to have a good set of basic building blocks." http://bugs.python.org/issue13095 "Support for splitting lists/tuples into chunks" Raymond: "These have been rejected before. There is always a trade-off in adding tools such as this -- it can take more time to learn and remember them than to write a trivial piece of code to do it yourself. Another issue is that people tend to disagree on how to handle an odd sized left-over group -- different use cases require different handling. We're trying to keep the core toolset reasonably small so that python remains simple and learnable. That raises the threshold for adding new tools." There are now 18 itertools, up from the original. Grouper, or any generic function, may not be the best for what one wants with a list. I proposed in my other post that we *do* need a new doc section or how-to on this topic. (I am working on an outline.) -- Terry Jan Reedy

Terry Reedy

9:09 p.m.

On 6/29/2012 4:32 PM, Georg Brandl wrote:

...

On 26.06.2012 10:03, anatoly techtonik wrote:

...
Now that Python 3 is all about iterators (which is a user killer feature for Python according to StackOverflow - http://stackoverflow.com/questions/tagged/python) would it be nice to introduce more first class functions to work with them? One function to be exact to split string into chunks.

Nothing special about strings.

...

...
itertools.chunks(iterable, size, fill=None)

This is a renaming of itertools.grouper in 9.1.2. Itertools Recipes. You should have mentioned this. I think of 'blocks' rather than 'chunks', but I notice several SO questions with 'chunk(s)' in the title.

...

...
Which is the 33th most voted Python question on SO - http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenl...

I am curious how you get that number. I do note that there are about 15 other Python SO questions that seem to be variations on the theme. There might be more if 'blocks' and 'groups' were searched for.

...

Anatoly, so far there were no negative votes -- would you care to go another step and propose a patch?

That is because Raymond H. is not reading either list right now ;-) Hence the Cc:. Also because I did not yet respond to a vague, very incomplete idea. From Raymond's first message on http://bugs.python.org/issue6021 , add grouper: "This has been rejected before. * It is not a fundamental itertool primitive. The recipes section in the docs shows a clean, fast implementation derived from zip_longest(). * There is some debate on a correct API for odd lengths. Some people want an exception, some want fill-in values, some want truncation, and some want a partially filled-in tuple. The alone is reason enough not to set one behavior in stone. * There is an issue with having too many itertools. The module taken as a whole becomes more difficult to use as new tools are added." --- This is not to say that the question should not be re-considered. Given the StackOverflow experience in addition to that of the tracker and python-list (and maybe python-ideas), a special exception might be made in relation to points 1 and 3. --- It regard to point 2: many 'proposals', including Anatoly's, neglect this detail. But the function has to do *something* when seqlen % grouplen != 0. So an 'idea' is not really a concrete programmable proposal until 'something' is specified. Exception -- not possible for an itertool until the end of the iteration (see below). To raise immediately for sequences, one could wrap grouper. def exactgrouper(sequence, k): # untested if len(sequence) % k: raise ValueError('Sequence length {} must be a multiple of group length {}'.format(len(sequence), k) else: return itertools.grouper(sequence, k) Of course, sequences can also be directly sequentially sliced (but should the result be an iterable or sequence of blocks?). But we do not have a seqtools module and I do not think there should be another method added to the seq protocol. Fill -- grouper always does this, with a default of None. Truncate, Remainder -- grouper (zip_longest) cannot directly do this and no recipes are given in the itertools docs. (More could be, see below.) Discussions on python-list gives various implementations either for sequences or iterables. For the latter, one approach is "it = iter(iterable)" followed by repeated islice of the first n items. Another is to use a sentinal for the 'fill' to detect a final incomplete block (tuple for grouper). def grouper_x(n, iterable): # untested sentinal = object() for g in grouper(n, iterable, sentinal): if g[-1] != sentinal: yield g else: # pass to truncate # yield g[:g.index(sentinal) for remainer # raise ValueError for delayed exception --- The above discussion of point 2 touches on point 4, which Raymond neglected in the particular message above but which has come up before: What are the allowed input and output types? An idea is not a programmable proposal until the domain, range, and mapping are specified. Possible inputs are a specific sequence (string, for instance), any sequence, any iterable. Possible outputs are a sequence or iterator of sequence or iterator. The various python-list and stackoverflow posts questions asks for various combinations. zip_longest and hence grouper takes any iterable and returns an iterator of tuples. (An iterator of maps might be more useful as a building block.) This is not what one usually wants with string input, for instance, nor with range input. To illustrate: import itertools as it def grouper(n, iterable, fillvalue=None): "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return it.zip_longest(*args, fillvalue=fillvalue) print(*(grouper(3, 'ABCDEFG', 'x'))) # probably not wanted print(*(''.join(g) for g in grouper(3, 'ABCDEFG', 'x'))) # ('A', 'B', 'C') ('D', 'E', 'F') ('G', 'x', 'x') ABC DEF Gxx -- What to do? One could easily write 20 different functions. So more thought is needed before adding anything. -1 on the idea as is. For the doc, I think it would be helpful here and in most module subchapters if there were a subchapter table of contents at the top (under 9.1 in this case). Even though just 2 lines here (currently, but see below), it would let people know that there *is* a recipes section. After the appropriate tables, mention that there are example uses in the recipe section. Possibly add similar tables in the recipe section. Another addition could be a new subsection on grouping (chunking) that would discuss post-processing of grouper (as discussed above), as well as other recipes, including ones specific to strings and sequences. It would essentially be a short how-to. Call it 9.1.3 "Grouping, Blocking, or Chunking Sequences and Iterables". The synonyms will help external searching. A toc would let people who have found this doc know to look for this at the bottom. -- Terry Jan Reedy

Stefan Behnel

July 2012

5:44 a.m.

Terry Reedy, 30.06.2012 23:09:

...

+1, the recipes are interesting enough to be highly visible, both as usage examples and to solve actual problems.

...

If it really is such an important use case for so many people, I agree that it's worth special casing it in the docs. It's not a trivial algorithmic step from a sequential iterable to a grouped iterable. Stefan

Raymond Hettinger

7:07 a.m.

On Jun 30, 2012, at 10:44 PM, Stefan Behnel wrote:

...

I'm not too keen on adding a section like this to the itertools docs. Instead, I would be open adding "further reading" section with external links to interesting iterator writeups in blogs, cookbooks, stack overflow answers, wikis, etc. If one of you wants to craft an elegant blog post on "Grouping, Blocking, or Chunking Sequences and Iterables", I would be happy to link to it. Raymond

anatoly techtonik

1:36 p.m.

Before anything else I must apologize for significant lags in my replies. I can not read all of them to hold in my head, so I reply one by one as it goes trying not to miss a single point out there. It would be much easier to do this in unified interface for threaded discussions, but for now there is no capabilities for that neither in Mailman nor in GMail. And when it turns out that the amount of text is too big, and I spend a lot of time trying to squeeze it down and then it becomes pointless to send at all. Now back on the topic: On Sun, Jul 1, 2012 at 12:09 AM, Terry Reedy <tjreedy@udel.edu> wrote:

...

On 6/29/2012 4:32 PM, Georg Brandl wrote:

...
On 26.06.2012 10:03, anatoly techtonik wrote:

...
Now that Python 3 is all about iterators (which is a user killer feature for Python according to StackOverflow - http://stackoverflow.com/questions/tagged/python) would it be nice to introduce more first class functions to work with them? One function to be exact to split string into chunks.

Nothing special about strings.

It seemed so, but it just appeared that grouper recipe didn't work for me.

...

...
...
itertools.chunks(iterable, size, fill=None)

This is a renaming of itertools.grouper in 9.1.2. Itertools Recipes. You should have mentioned this. I think of 'blocks' rather than 'chunks', but I notice several SO questions with 'chunk(s)' in the title.

I guess `block` gives too low signal/noize ration in search results. That's why it probably also called chunks in other languages, where `block` stand for something else (I speak of Ruby blocks).

...

...
...
Which is the 33th most voted Python question on SO -

http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenl...

I am curious how you get that number. I do note that there are about 15 other Python SO questions that seem to be variations on the theme. There might be more if 'blocks' and 'groups' were searched for.

It's easy: 1. Go http://stackoverflow.com/ 2. Search [python] 3. Click `votes` tab 4. Choose `30 per page` at the bottom 5. Jump to the second page, there it is 4th from the top: http://stackoverflow.com/questions/tagged/python?page=2&sort=votes&pagesize=30 As for duplicates - feel free to mark them as such. SO allows everybody to do this (unlike Roundup).

...

...
Anatoly, so far there were no negative votes -- would you care to go another step and propose a patch?

That is because Raymond H. is not reading either list right now ;-) Hence the Cc:. Also because I did not yet respond to a vague, very incomplete idea.

From Raymond's first message on http://bugs.python.org/issue6021 , add grouper:

"This has been rejected before.

I quite often see such arguments and I can't stand to repeat that these are not arguments. It is good to know, but when people use that as a reason to close tickets - that's just disgusting. To the Raymond's honor he cares to explain.

...

* It is not a fundamental itertool primitive. The recipes section in the docs shows a clean, fast implementation derived from zip_longest().

What is the definition of 'fundamental primitive'? To me the fact that top answer for chunking strings on SO has 2+ times more votes than itertools versions is a clear 5 sigma indicator that something is wrong with this Standard model without chunks boson.

...

* There is some debate on a correct API for odd lengths. Some people want an exception, some want fill-in values, some want truncation, and some want a partially filled-in tuple. The alone is reason enough not to set one behavior in stone.

use case 3.1: odd lengths exception (CHOOSE ONE) 1. I see that no itertools function throws exceptions, check manually: len(iterable) / float(size) == len(iterable) // float(size) 2. Explicitly - itertools.chunks(iterable, size, fill=None) + itertools.chunks(iterable, size, fill=None, exception=False) use case 3.2. fill in value. it is here (SOLVED) use case 3.3: truncation no itertools support truncation, do manually chunks(iter, size)[:len(iter)//size) use case 4: partially filled-in tuple What should be there?

...

...
...
chunks('ABCDEFG', 3, 'x') |

More replies and workarounds to some of the raised points are below.

...

* There is an issue with having too many itertools. The module taken as a whole becomes more difficult to use as new tools are added."

There can be only two reasons to that: * chosen basis is bad (many functions that are rarely used or easily emulated) * basis is good, but insufficient, because iterators universe is more complicated than we think

...

This is not to say that the question should not be re-considered. Given the StackOverflow experience in addition to that of the tracker and python-list (and maybe python-ideas), a special exception might be made in relation to points 1 and 3.

--[offtopic about Python enhancements / proposals feedback]-- Yes, without SO I probably wouldn't trigger this at all. Because tracker doesn't help with raising importance - there are no votes, no feature proposals, no "stars". And what I "like" the most is that very "nice" resolution status - "committed/rejected" - which doesn't say anything at all. Python list? I try not to disrupt the frequency there. Python ideas? Too low participation level for gathering signals. There are many people that read, support, but don't want to reply (don't want to stand out or just lazy). There are many outside who don't want to be subscribed at all. There are 2000+ people spending time on Python conferences all over the world each year we see only a couple of reactions for every Python idea here. Quite often there are mistakes and omissions that would be nice to correct and you can't. So StackOverflow really helps here, but it is a Q&A tool, which is still much better than ML that are solely for chatting, brainstorming and all the crazy reading / writing stuff. They don't help to develop ideas collaboratively. Quite often I am just lost in amount of text to handle. --[/offtopic]--

...

It regard to point 2: many 'proposals', including Anatoly's, neglect this detail. But the function has to do *something* when seqlen % grouplen != 0. So an 'idea' is not really a concrete programmable proposal until 'something' is specified.

Exception -- not possible for an itertool until the end of the iteration (see below). To raise immediately for sequences, one could wrap grouper.

def exactgrouper(sequence, k): # untested if len(sequence) % k: raise ValueError('Sequence length {} must be a multiple of group length {}'.format(len(sequence), k) else: return itertools.grouper(sequence, k)

Right. Iterator is not a sequence, because it doesn't know the length of its sequence. The method should not belong to itertools at all then. Python 3 is definitely become more complicated. I'd prefer to keep separated from iterator stuff, but it seems more harder with every iteration.

...

Of course, sequences can also be directly sequentially sliced (but should the result be an iterable or sequence of blocks?). But we do not have a seqtools module and I do not think there should be another method added to the seq protocol.

I'd expect strings chunked into strings and lists into lists. Don't want to know anything about protocols.

...

Fill -- grouper always does this, with a default of None.

Truncate, Remainder -- grouper (zip_longest) cannot directly do this and no recipes are given in the itertools docs. (More could be, see below.)

Discussions on python-list gives various implementations either for sequences or iterables. For the latter, one approach is "it = iter(iterable)" followed by repeated islice of the first n items. Another is to use a sentinal for the 'fill' to detect a final incomplete block (tuple for grouper).

def grouper_x(n, iterable): # untested sentinal = object() for g in grouper(n, iterable, sentinal): if g[-1] != sentinal: yield g else: # pass to truncate # yield g[:g.index(sentinal) for remainer # raise ValueError for delayed exception

We need a simple function to split a sequence into chunks(). Now we face with the problem to apply that technique to a sequence of infinite length when a last element of infinite sequence is encountered. You might be thinking now that this is a reduction to absurdity. But I'd say it is an exit from the trap. Mathematically this problem can't be solved. I am not ignoring your solution - I think it's quite feasible, but isn't it an overcomplication? I mean 160 people out of 149 who upvoted the question are pretty happy with an answer that just outputs the last chunk as-is: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenl... chunks('ABCDEFG', 3) --> 'ABC' 'DEF' 'G' And it is quite nice solution to me, because you're free to do anything you'd like if you expect you data to be odd: for chunk in chunks('ABCDEFG', size): if len(chunk) < size: raise Tail You can make a helper iterator out of it too.

...

--- The above discussion of point 2 touches on point 4, which Raymond neglected in the particular message above but which has come up before: What are the allowed input and output types? An idea is not a programmable proposal until the domain, range, and mapping are specified.

Domain? Mapping? I am not ignoring existing knowledge and experience. I just don't want to complicate and don't see appropriate `import usecase` in current context, so I won't try to guess what this means. in string -> out list of strings in list -> out list of lists

...

Possible inputs are a specific sequence (string, for instance), any sequence, any iterable. Possible outputs are a sequence or iterator of sequence or iterator. The various python-list and stackoverflow posts questions asks for various combinations. zip_longest and hence grouper takes any iterable and returns an iterator of tuples. (An iterator of maps might be more useful as a building block.) This is not what one usually wants with string input, for instance, nor with range input. To illustrate:

Allright. Got it. Sequences have a length and can be sliced with [i:j], iterator can't be sliced (and hence no chunks can be made). So this function doesn't belong to itertools - it is a missing string or sequence method. We can't have a chunk with an iterator, because iterator over a string decomposes it into a group of pieces with no reverse function. We can have a group and then join the group into something. But this requires the knowledge of appropriate join() function for the iterator, and probably not efficient. As there are no such function (must be that Mapping you referenced above) - the recomposition into chunks is impossible.

...

import itertools as it

def grouper(n, iterable, fillvalue=None): "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return it.zip_longest(*args, fillvalue=fillvalue)

print(*(grouper(3, 'ABCDEFG', 'x'))) # probably not wanted print(*(''.join(g) for g in grouper(3, 'ABCDEFG', 'x'))) # ('A', 'B', 'C') ('D', 'E', 'F') ('G', 'x', 'x') ABC DEF Gxx

-- What to do? One could easily write 20 different functions. So more thought is needed before adding anything. -1 on the idea as is.

I've learned a new English type of argument - "straw man" (I used to call this "hijacking"). This -1 doesn't belong to original idea. It belongs to proposal of itertools.chunks() with a long list of above points and completely different user stories (i.e. not "split string into chunks"). I hope you still +1 with 160 people on SO that think Python needs an easy way to chunk sequences.

...

For the doc, I think it would be helpful here and in most module subchapters if there were a subchapter table of contents at the top (under 9.1 in this case). Even though just 2 lines here (currently, but see below), it would let people know that there *is* a recipes section. After the appropriate tables, mention that there are example uses in the recipe section. Possibly add similar tables in the recipe section.

Unfortunately, it appeared that grouper() is not chunks(). It doesn't delivers list of list of chars given string as an input instead of list of chunks.

...

Another addition could be a new subsection on grouping (chunking) that would discuss post-processing of grouper (as discussed above), as well as other recipes, including ones specific to strings and sequences. It would essentially be a short how-to. Call it 9.1.3 "Grouping, Blocking, or Chunking Sequences and Iterables". The synonyms will help external searching. A toc would let people who have found this doc know to look for this at the bottom.

This makes matters pretty ugly. In ideal language there should be less docs, not more.

Steven D'Aprano

4:09 p.m.

New subject: [Python-Dev] itertools.chunks(iterable, size, fill=None)

anatoly techtonik wrote:

...

Yes. I don't think this is particularly significant. Have a look at some of the questions with roughly the same number of votes: #26 "How can I remove (chomp) a newline in Python?" 176 votes #33 "How do you split a list into evenly sized chunks in Python?" 149 votes #36 "Accessing the index in Python for loops" 144 votes Being 33rd most voted question doesn't really mean much. By the way, why is this discussion going to both python-dev and python-ideas? -- Steven

Stephen J. Turnbull

5:26 a.m.

Annoying cross-post trimmed. anatoly techtonik writes:

...

Before anything else I must apologize for significant lags in my replies.

No, that's nothing you need to apologize for. Taking time to formulate a decent reply is common courtesy, and commendable. We're really *not* in a hurry here; Python is going to be around for a few more decades at least. What you need to apologize for is the major faux pas of cross-posting. Please cut it out, and in particular trim your own address list when replying, or use Reply-To (and maybe Mail-Followup-To) to redirect others' replies to an appropriate list. IMO, the right list for this discussion is python-ideas, but at the very least choose one. and again he writes:

...

This is a terrible idea for python-dev and python-ideas. While it is frustrating to get a "been there, done that, rejected with extreme prejudice" reply, and there's no question that searching the archives is a hit-and-mostly-miss kind of thing because of the difficulty of choosing good search terms, it's really not that costly to come back with "I'm sorry, I couldn't find the thread". Rather than spend effort on writing a FAQ that would rather quickly turn into a monstrosity hardly more easy to search than the archives themselves, and almost never be read, we should devote any effort to improving the capability for searching archives (and the wiki and the issue tracker). The problem with trying to put everything into the FAQ is that it's a terribly unrewarding thing to do. Most of the stuff you will write will be ignored and profit nobody. It really needs to be selected, by somebody with taste and knowledge of user needs. Presumably O'Reilly or somebody has a "Python Hacks" book -- go buy it, which will encourage the author to keep up the good work. If they don't, why don't you propose it to them, and write it yourself? Then you can measure how good an idea it is by your royalties. Or if you don't feel like writing it yourself, maybe you can convince O'Reilly to come up with a big enough advance to interest somebody like Terry Reedy or Steven d'Aprano or even Raymond Hettinger or David Beazley. Regards,

Mike Meyer

6 a.m.

"Stephen J. Turnbull" <stephen@xemacs.org> wrote:

...

Python-dev i'm not familiar enough with to formulate an opinion, but I think it could work well for python-ideas. While better search tools always help, a mail list is mostly unorganized, and a wiki need a purpose (possibly provided by a person or small group of people who take on the responsibility of maintaining the thing), or it becomes an even less organized mess. A FAQ, on the other hand, *has* a purpose. It also has a natural source of new material in the list. Python.org already has multiple FAQs with an integrated search engine (that are hopefully integrated into the python.org search engine), and last time I looked (admitadly long ago), a tool for maintaining FAQs. While I don't think I'm quaified to write answer for python-ideas faqs, I believe I can decide whether or not something is appropriate as a python-ideas FAQ entry. If the tool for maintinaing them isn't a memory error and is still in use, I'd certainly be willing to do that. But that's the easy job. The trick is getting the list members to play along. If they provide what is most people consider the definitive answer a couple of times, then the next time the question comes up, write it up as a FAQ entry and submit it as such as well to the list. The time after that, just provide a pointer to the FAQ entry. This worked very well for the FreeBSD FAQ, which is a fairly large document. They didn't even have a spiffy tool for maintaining it, but had to take submissions and add or fix markup and then check them in to the document tree. -- Sent from my Android tablet. Please excuse my swyping.

Stephen J. Turnbull

9:22 a.m.

Mike Meyer writes:

...

Python-dev i'm not familiar enough with to formulate an opinion, but I think it could work well for python-ideas.

You're missing the point. Improving the FAQ for python-*list* is a great idea, but Mr. techtonik proposes a FAQ for python-*ideas*. IOW, if the question is actually on-topic, the answers are reasons why something isn't going to ever be in Python. True, often the explanation of why a proposed feature is inappropriate is of the form "this three-line function does the job", but if the question is really a FAQ, it presumably has been asked multiple times on python-list, and should be picked up there.

...

This worked very well for the FreeBSD FAQ, which is a fairly large document.

No, it didn't. The FreeBSD FAQ is quite obviously not oriented to telling developers what isn't going to go into the next version of FreeBSD and why not. If people want to move ahead with this discussion, it really ought to move to python-list IMO.

Mike Meyer

3:38 p.m.

On Fri, 06 Jul 2012 18:22:03 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:

...

Mike Meyer writes:

...
Python-dev i'm not familiar enough with to formulate an opinion, but I think it could work well for python-ideas. You're missing the point. Improving the FAQ for python-*list* is a great idea, but Mr. techtonik proposes a FAQ for python-*ideas*. IOW, if the question is actually on-topic, the answers are reasons why something isn't going to ever be in Python.

That point I got.

...

True, often the explanation of why a proposed feature is inappropriate is of the form "this three-line function does the job", but if the question is really a FAQ, it presumably has been asked multiple times on python-list, and should be picked up there.

That I didn't, and it's fair enough. But the onus to fix this *still* falls on the people who are answering the same question over and over. They need to start submitting the answers to the FAQ (there is a history & design FAQ that would seem to be an appropriate place for them) rather than just to the list.

...

...
This worked very well for the FreeBSD FAQ, which is a fairly large document. No, it didn't. The FreeBSD FAQ is quite obviously not oriented to telling developers what isn't going to go into the next version of FreeBSD and why not.

If you're claiming that the FreeBSD FAQ technique didn't work well for it, then I disagree. That the FreeBSD FAQ has a different orientation is true, but not clearly relevant.

...

If people want to move ahead with this discussion, it really ought to move to python-list IMO.

Not in mine. The problem is repeated questions on -ideas. My proposed solution has to be implemented by -ideas readers. They need to start submitting frequently typed answers to the appropriate FAQ, not just to the list. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Stephen J. Turnbull

6:53 p.m.

Mike Meyer writes:

...

That I didn't, and it's fair enough. But the onus to fix this *still* falls on the people who are answering the same question over and over. They need to start submitting the answers to the FAQ (there is a history & design FAQ that would seem to be an appropriate place for them) rather than just to the list.

I disagree that it's appropriate. The History & Design FAQ isn't intended to be encyclopedic, and I don't see that making it so would be useful. The point is that if "This has been rejected before" is sufficient answer, the proponent withdraws, everybody happy. And if not, they have to go to the archives anyway. They need to answer the defects brought up in the list discussion, point by point, or they're not going to get a hearing.

...

If you're claiming that the FreeBSD FAQ technique didn't work well for it, then I disagree.

No, I'm claiming that AFAICS the FreeBSD FAQ didn't even try to fix the problem that you and anatoly percieve.

...

The problem is repeated questions on -ideas.

Indeed!<wink/> But I think the solution is to encourage asking those questions on python-list first.[1] The experts there are both willing to answer such questions, and have a better sense of FAQ-iness (relatively few users interested in the question will propose a general solution). And they have an incentive to document answers if they notice a FAQ. If no suitable idiom appears, then work up a change proposal and post to -ideas. Footnotes: [1] Yes, I'm being mildly ironic; people typically won't know it's a FAQ until they ask. Nevertheless, asking on python-list before posting to either python-ideas or python-dev should be encouraged ("unless you're Dutch"<wink/>).

anatoly techtonik

9:57 a.m.

On Fri, Jun 29, 2012 at 11:32 PM, Georg Brandl <g.brandl@gmx.net> wrote:

...

Was about to say "no problem", but in fact - there is. Sorry from whining from my side and thanks for nudging. The only thought that a simple task of copy/pasting relevant code from http://docs.python.org/library/itertools.html?highlight=itertools#recipes will require a few hours waiting of download (still not everybody has a high-speed internet) makes me switch to other less time consuming tasks before getting around to it. These tasks become more important in a few hours, and basically I've passed through this many times before. It then becomes quite hard to switch back. I absolutely don't mind someone else being credited for the idea, because ideas usually worthless without implementation. It will be interesting to design how the process could work in a separate thread. For now the best thing I can do (I don't risk even to mention anything with 3.3) is to copy/paste code from the docs here: from itertools import izip_longest def chunks(iterable, size, fill=None): """Split an iterable into blocks of fixed-length""" # chunks('ABCDEFG', 3, 'x') --> ABC DEF Gxx args = [iter(iterable)] * size return izip_longest(fillvalue=fill, *args) BTW, this doesn't work as expected (at least for strings). Expected is: chunks('ABCDEFG', 3, 'x') --> 'ABC' 'DEF' 'Gxx' got: chunks('ABCDEFG', 3, 'x') --> ('A' 'B' 'C') ('D' 'E' 'F') ('G' 'x' 'x') Needs more round tuits definitely.

anatoly techtonik

September 2012

6:27 a.m.

I've run into the necessity of implementing chunks() again. Here is the code I've made from scratch. def chunks(seq, size): '''Cut sequence into chunks of given size. If `seq` length is not divisible by `size` without reminder, last chunk will have length less than size. >>> list( chunks([1,2,3,4,5,6,7], 3) ) [[1, 2, 3], [4, 5, 6], [7]] ''' endlen = len(seq)//size for i in range(endlen): yield [seq[i*size+n] for n in range(size)] if len(seq) % size: yield seq[endlen*size:] -- anatoly t. On Fri, Jun 29, 2012 at 11:32 PM, Georg Brandl <g.brandl@gmx.net> wrote:

...

Miki Tebeka

2:42 p.m.

See the "grouper" example in http://docs.python.org/library/itertools.html On Friday, August 31, 2012 11:28:33 PM UTC-7, anatoly techtonik wrote:

...

anatoly techtonik

11:36 a.m.

On Sat, Sep 1, 2012 at 5:42 PM, Miki Tebeka <miki.tebeka@gmail.com> wrote:

...

See the "grouper" example in http://docs.python.org/library/itertools.html

As was discussed before, the problem is visibility of the solution, not the implementation. If we can divide core Python API into levels where 0 is the less important and 10 is more, then `chunks` should be level above than it is now. -- anatoly t.

Stephen J. Turnbull

9:01 a.m.

anatoly techtonik writes:

...

Well, no, it's apparently not. You should be well aware of the solution since you were one of the most ardent posters in this thread the last time it came up.[1] Yet you say "I had to *re*implement chunks". IOW, the implementations which you were already aware of were inappropriate for your situation. That suggests that no, there are no generic solutions suitable for the stdlib yet, and you personally aren't convinced that any of the implementations belong in your own private library, either. You really need to get over those humps before you have a case for a "higher-visibility" placement of any particular implementation. Footnotes: [1] And I think you cross-posted that time, too, but that's another issue. In any case, please stop cross-posting. Pick one or the other. (IMHO, this discussion belongs here on -ideas (or maybe on python-list), not on python-dev. Or submit an issue and a patch and discuss it there.)

MRAB

4:39 p.m.

On 01/09/2012 07:27, anatoly techtonik wrote:

...

Here's a lazy version: def chunks(seq, size): '''Cut sequence into chunks of given size. If `seq` length is not divisible by `size` without reminder, last chunk will have length less than size. >>> list( chunks([1,2,3,4,5,6,7], 3) ) [[1, 2, 3], [4, 5, 6], [7]] ''' if size < 1: raise ValueError("chunk size less than 1") it = iter(seq) try: while True: chunk = [] for _ in range(size): chunk.append(next(it)) yield chunk except StopIteration: if chunk: yield chunk

Michele Lacchia

5:16 p.m.

+ 1 for the lazy version. Why not using itertools.islice instead of the innermost for loop?

MRAB

7:02 p.m.

On 01/09/2012 18:16, Michele Lacchia wrote:

...

+ 1 for the lazy version. Why not using itertools.islice instead of the innermost for loop?

OK, here's a lazy version using islice: from itertools import islice def chunks(seq, size): '''Cut sequence into chunks of given size. If `seq` length is not divisible by `size` without reminder, last chunk will have length less than size. >>> list( chunks([1,2,3,4,5,6,7], 3) ) [[1, 2, 3], [4, 5, 6], [7]] ''' if size < 1: raise ValueError("chunk size less than 1") it = iter(seq) while True: chunk = list(islice(it, 0, size)) if not chunk: break yield chunk

Georg Brandl

June 2012

8:39 a.m.

On 26.06.2012 10:03, anatoly techtonik wrote:

...

Tal Einat

10:34 a.m.

On Tue, Jun 26, 2012 at 11:03 AM, anatoly techtonik <techtonik@gmail.com> wrote:

...

Joao S. O. Bueno

12:42 p.m.

On 26 June 2012 07:34, Tal Einat <taleinat@gmail.com> wrote:

...

On Tue, Jun 26, 2012 at 11:03 AM, anatoly techtonik <techtonik@gmail.com> wrote:

...

...
itertools.chunks(iterable, size, fill=None)

What about

tertools.chunks(iterable, size=None, separator=None, fill=None) Requiring at leas one of size or separator to be set? This would also work for "for x in text.split('\n')" case. js -><-

Simon Sapin

12:58 p.m.

Le 26/06/2012 14:42, Joao S. O. Bueno a écrit :

...

Georg Brandl

8:32 p.m.

On 26.06.2012 10:03, anatoly techtonik wrote:

...

Anatoly, so far there were no negative votes -- would you care to go another step and propose a patch? Georg

Mike Graham

9:36 p.m.

On Fri, Jun 29, 2012 at 4:32 PM, Georg Brandl <g.brandl@gmx.net> wrote:

...

so far there were no negative votes

Terry Reedy

June 2012

10:57 p.m.

On 6/29/2012 5:36 PM, Mike Graham wrote:

...

Terry Reedy

9:09 p.m.

On 6/29/2012 4:32 PM, Georg Brandl wrote:

...

On 26.06.2012 10:03, anatoly techtonik wrote:

...
Now that Python 3 is all about iterators (which is a user killer feature for Python according to StackOverflow - http://stackoverflow.com/questions/tagged/python) would it be nice to introduce more first class functions to work with them? One function to be exact to split string into chunks.

Nothing special about strings.

...

...
itertools.chunks(iterable, size, fill=None)

...

...
Which is the 33th most voted Python question on SO - http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenl...

...

Anatoly, so far there were no negative votes -- would you care to go another step and propose a patch?

Stefan Behnel

July 2012

5:44 a.m.

Terry Reedy, 30.06.2012 23:09:

...

+1, the recipes are interesting enough to be highly visible, both as usage examples and to solve actual problems.

...

Raymond Hettinger

7:07 a.m.

On Jun 30, 2012, at 10:44 PM, Stefan Behnel wrote:

...

anatoly techtonik

1:36 p.m.

...

On 6/29/2012 4:32 PM, Georg Brandl wrote:

...
On 26.06.2012 10:03, anatoly techtonik wrote:

...
Now that Python 3 is all about iterators (which is a user killer feature for Python according to StackOverflow - http://stackoverflow.com/questions/tagged/python) would it be nice to introduce more first class functions to work with them? One function to be exact to split string into chunks.

Nothing special about strings.

It seemed so, but it just appeared that grouper recipe didn't work for me.

...

...
...
itertools.chunks(iterable, size, fill=None)

This is a renaming of itertools.grouper in 9.1.2. Itertools Recipes. You should have mentioned this. I think of 'blocks' rather than 'chunks', but I notice several SO questions with 'chunk(s)' in the title.

I guess `block` gives too low signal/noize ration in search results. That's why it probably also called chunks in other languages, where `block` stand for something else (I speak of Ruby blocks).

...

...
...
Which is the 33th most voted Python question on SO -

http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenl...

I am curious how you get that number. I do note that there are about 15 other Python SO questions that seem to be variations on the theme. There might be more if 'blocks' and 'groups' were searched for.

...

...
Anatoly, so far there were no negative votes -- would you care to go another step and propose a patch?

That is because Raymond H. is not reading either list right now ;-) Hence the Cc:. Also because I did not yet respond to a vague, very incomplete idea.

From Raymond's first message on http://bugs.python.org/issue6021 , add grouper:

"This has been rejected before.

...

* It is not a fundamental itertool primitive. The recipes section in the docs shows a clean, fast implementation derived from zip_longest().

...

* There is some debate on a correct API for odd lengths. Some people want an exception, some want fill-in values, some want truncation, and some want a partially filled-in tuple. The alone is reason enough not to set one behavior in stone.

...

...
...
chunks('ABCDEFG', 3, 'x') |

More replies and workarounds to some of the raised points are below.

...

* There is an issue with having too many itertools. The module taken as a whole becomes more difficult to use as new tools are added."

...

This is not to say that the question should not be re-considered. Given the StackOverflow experience in addition to that of the tracker and python-list (and maybe python-ideas), a special exception might be made in relation to points 1 and 3.

...

It regard to point 2: many 'proposals', including Anatoly's, neglect this detail. But the function has to do *something* when seqlen % grouplen != 0. So an 'idea' is not really a concrete programmable proposal until 'something' is specified.

Exception -- not possible for an itertool until the end of the iteration (see below). To raise immediately for sequences, one could wrap grouper.

def exactgrouper(sequence, k): # untested if len(sequence) % k: raise ValueError('Sequence length {} must be a multiple of group length {}'.format(len(sequence), k) else: return itertools.grouper(sequence, k)

...

Of course, sequences can also be directly sequentially sliced (but should the result be an iterable or sequence of blocks?). But we do not have a seqtools module and I do not think there should be another method added to the seq protocol.

I'd expect strings chunked into strings and lists into lists. Don't want to know anything about protocols.

...

Fill -- grouper always does this, with a default of None.

Truncate, Remainder -- grouper (zip_longest) cannot directly do this and no recipes are given in the itertools docs. (More could be, see below.)

Discussions on python-list gives various implementations either for sequences or iterables. For the latter, one approach is "it = iter(iterable)" followed by repeated islice of the first n items. Another is to use a sentinal for the 'fill' to detect a final incomplete block (tuple for grouper).

def grouper_x(n, iterable): # untested sentinal = object() for g in grouper(n, iterable, sentinal): if g[-1] != sentinal: yield g else: # pass to truncate # yield g[:g.index(sentinal) for remainer # raise ValueError for delayed exception

...

--- The above discussion of point 2 touches on point 4, which Raymond neglected in the particular message above but which has come up before: What are the allowed input and output types? An idea is not a programmable proposal until the domain, range, and mapping are specified.

...

Possible inputs are a specific sequence (string, for instance), any sequence, any iterable. Possible outputs are a sequence or iterator of sequence or iterator. The various python-list and stackoverflow posts questions asks for various combinations. zip_longest and hence grouper takes any iterable and returns an iterator of tuples. (An iterator of maps might be more useful as a building block.) This is not what one usually wants with string input, for instance, nor with range input. To illustrate:

...

import itertools as it

def grouper(n, iterable, fillvalue=None): "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return it.zip_longest(*args, fillvalue=fillvalue)

print(*(grouper(3, 'ABCDEFG', 'x'))) # probably not wanted print(*(''.join(g) for g in grouper(3, 'ABCDEFG', 'x'))) # ('A', 'B', 'C') ('D', 'E', 'F') ('G', 'x', 'x') ABC DEF Gxx

-- What to do? One could easily write 20 different functions. So more thought is needed before adding anything. -1 on the idea as is.

...

For the doc, I think it would be helpful here and in most module subchapters if there were a subchapter table of contents at the top (under 9.1 in this case). Even though just 2 lines here (currently, but see below), it would let people know that there *is* a recipes section. After the appropriate tables, mention that there are example uses in the recipe section. Possibly add similar tables in the recipe section.

Unfortunately, it appeared that grouper() is not chunks(). It doesn't delivers list of list of chars given string as an input instead of list of chunks.

...

Another addition could be a new subsection on grouping (chunking) that would discuss post-processing of grouper (as discussed above), as well as other recipes, including ones specific to strings and sequences. It would essentially be a short how-to. Call it 9.1.3 "Grouping, Blocking, or Chunking Sequences and Iterables". The synonyms will help external searching. A toc would let people who have found this doc know to look for this at the bottom.

This makes matters pretty ugly. In ideal language there should be less docs, not more.

Steven D'Aprano

4:09 p.m.

New subject: [Python-Dev] itertools.chunks(iterable, size, fill=None)

anatoly techtonik wrote:

...

Stephen J. Turnbull

July 2012

5:26 a.m.

Annoying cross-post trimmed. anatoly techtonik writes:

...

Before anything else I must apologize for significant lags in my replies.

...

Mike Meyer

6 a.m.

"Stephen J. Turnbull" <stephen@xemacs.org> wrote:

...

Stephen J. Turnbull

9:22 a.m.

Mike Meyer writes:

...

Python-dev i'm not familiar enough with to formulate an opinion, but I think it could work well for python-ideas.

...

This worked very well for the FreeBSD FAQ, which is a fairly large document.

Mike Meyer

3:38 p.m.

On Fri, 06 Jul 2012 18:22:03 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:

...

Mike Meyer writes:

...
Python-dev i'm not familiar enough with to formulate an opinion, but I think it could work well for python-ideas. You're missing the point. Improving the FAQ for python-*list* is a great idea, but Mr. techtonik proposes a FAQ for python-*ideas*. IOW, if the question is actually on-topic, the answers are reasons why something isn't going to ever be in Python.

That point I got.

...

True, often the explanation of why a proposed feature is inappropriate is of the form "this three-line function does the job", but if the question is really a FAQ, it presumably has been asked multiple times on python-list, and should be picked up there.

...

...
This worked very well for the FreeBSD FAQ, which is a fairly large document. No, it didn't. The FreeBSD FAQ is quite obviously not oriented to telling developers what isn't going to go into the next version of FreeBSD and why not.

If you're claiming that the FreeBSD FAQ technique didn't work well for it, then I disagree. That the FreeBSD FAQ has a different orientation is true, but not clearly relevant.

...

If people want to move ahead with this discussion, it really ought to move to python-list IMO.

Stephen J. Turnbull

6:53 p.m.

Mike Meyer writes:

...

That I didn't, and it's fair enough. But the onus to fix this *still* falls on the people who are answering the same question over and over. They need to start submitting the answers to the FAQ (there is a history & design FAQ that would seem to be an appropriate place for them) rather than just to the list.

...