
Before anything else I must apologize for significant lags in my replies. I can not read all of them to hold in my head, so I reply one by one as it goes trying not to miss a single point out there. It would be much easier to do this in unified interface for threaded discussions, but for now there is no capabilities for that neither in Mailman nor in GMail. And when it turns out that the amount of text is too big, and I spend a lot of time trying to squeeze it down and then it becomes pointless to send at all.
Now back on the topic:
On Sun, Jul 1, 2012 at 12:09 AM, Terry Reedy tjreedy@udel.edu wrote:
On 6/29/2012 4:32 PM, Georg Brandl wrote:
On 26.06.2012 10:03, anatoly techtonik wrote:
Now that Python 3 is all about iterators (which is a user killer feature for Python according to StackOverflow - http://stackoverflow.com/questions/tagged/python) would it be nice to introduce more first class functions to work with them? One function to be exact to split string into chunks.
Nothing special about strings.
It seemed so, but it just appeared that grouper recipe didn't work for me.
itertools.chunks(iterable, size, fill=None)
This is a renaming of itertools.grouper in 9.1.2. Itertools Recipes. You should have mentioned this. I think of 'blocks' rather than 'chunks', but I notice several SO questions with 'chunk(s)' in the title.
I guess `block` gives too low signal/noize ration in search results. That's why it probably also called chunks in other languages, where `block` stand for something else (I speak of Ruby blocks).
Which is the 33th most voted Python question on SO -
http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenl...
I am curious how you get that number. I do note that there are about 15 other Python SO questions that seem to be variations on the theme. There might be more if 'blocks' and 'groups' were searched for.
It's easy: 1. Go http://stackoverflow.com/ 2. Search [python] 3. Click `votes` tab 4. Choose `30 per page` at the bottom 5. Jump to the second page, there it is 4th from the top: http://stackoverflow.com/questions/tagged/python?page=2&sort=votes&p...
As for duplicates - feel free to mark them as such. SO allows everybody to do this (unlike Roundup).
Anatoly, so far there were no negative votes -- would you care to go another step and propose a patch?
That is because Raymond H. is not reading either list right now ;-) Hence the Cc:. Also because I did not yet respond to a vague, very incomplete idea.
From Raymond's first message on http://bugs.python.org/issue6021 , add grouper:
"This has been rejected before.
I quite often see such arguments and I can't stand to repeat that these are not arguments. It is good to know, but when people use that as a reason to close tickets - that's just disgusting. To the Raymond's honor he cares to explain.
- It is not a fundamental itertool primitive. The recipes section in
the docs shows a clean, fast implementation derived from zip_longest().
What is the definition of 'fundamental primitive'? To me the fact that top answer for chunking strings on SO has 2+ times more votes than itertools versions is a clear 5 sigma indicator that something is wrong with this Standard model without chunks boson.
- There is some debate on a correct API for odd lengths. Some people
want an exception, some want fill-in values, some want truncation, and some want a partially filled-in tuple. The alone is reason enough not to set one behavior in stone.
use case 3.1: odd lengths exception (CHOOSE ONE) 1. I see that no itertools function throws exceptions, check manually: len(iterable) / float(size) == len(iterable) // float(size) 2. Explicitly - itertools.chunks(iterable, size, fill=None) + itertools.chunks(iterable, size, fill=None, exception=False)
use case 3.2. fill in value. it is here (SOLVED)
use case 3.3: truncation no itertools support truncation, do manually chunks(iter, size)[:len(iter)//size)
use case 4: partially filled-in tuple What should be there?
chunks('ABCDEFG', 3, 'x') |
More replies and workarounds to some of the raised points are below.
- There is an issue with having too many itertools. The module taken as
a whole becomes more difficult to use as new tools are added."
There can be only two reasons to that: * chosen basis is bad (many functions that are rarely used or easily emulated) * basis is good, but insufficient, because iterators universe is more complicated than we think
This is not to say that the question should not be re-considered. Given the StackOverflow experience in addition to that of the tracker and python-list (and maybe python-ideas), a special exception might be made in relation to points 1 and 3.
--[offtopic about Python enhancements / proposals feedback]-- Yes, without SO I probably wouldn't trigger this at all. Because tracker doesn't help with raising importance - there are no votes, no feature proposals, no "stars". And what I "like" the most is that very "nice" resolution status - "committed/rejected" - which doesn't say anything at all. Python list? I try not to disrupt the frequency there. Python ideas? Too low participation level for gathering signals. There are many people that read, support, but don't want to reply (don't want to stand out or just lazy). There are many outside who don't want to be subscribed at all. There are 2000+ people spending time on Python conferences all over the world each year we see only a couple of reactions for every Python idea here. Quite often there are mistakes and omissions that would be nice to correct and you can't. So StackOverflow really helps here, but it is a Q&A tool, which is still much better than ML that are solely for chatting, brainstorming and all the crazy reading / writing stuff. They don't help to develop ideas collaboratively. Quite often I am just lost in amount of text to handle. --[/offtopic]--
It regard to point 2: many 'proposals', including Anatoly's, neglect this detail. But the function has to do *something* when seqlen % grouplen != 0. So an 'idea' is not really a concrete programmable proposal until 'something' is specified.
Exception -- not possible for an itertool until the end of the iteration (see below). To raise immediately for sequences, one could wrap grouper.
def exactgrouper(sequence, k): # untested if len(sequence) % k: raise ValueError('Sequence length {} must be a multiple of group length {}'.format(len(sequence), k) else: return itertools.grouper(sequence, k)
Right. Iterator is not a sequence, because it doesn't know the length of its sequence. The method should not belong to itertools at all then.
Python 3 is definitely become more complicated. I'd prefer to keep separated from iterator stuff, but it seems more harder with every iteration.
Of course, sequences can also be directly sequentially sliced (but should the result be an iterable or sequence of blocks?). But we do not have a seqtools module and I do not think there should be another method added to the seq protocol.
I'd expect strings chunked into strings and lists into lists. Don't want to know anything about protocols.
Fill -- grouper always does this, with a default of None.
Truncate, Remainder -- grouper (zip_longest) cannot directly do this and no recipes are given in the itertools docs. (More could be, see below.)
Discussions on python-list gives various implementations either for sequences or iterables. For the latter, one approach is "it = iter(iterable)" followed by repeated islice of the first n items. Another is to use a sentinal for the 'fill' to detect a final incomplete block (tuple for grouper).
def grouper_x(n, iterable): # untested sentinal = object() for g in grouper(n, iterable, sentinal): if g[-1] != sentinal: yield g else: # pass to truncate # yield g[:g.index(sentinal) for remainer # raise ValueError for delayed exception
We need a simple function to split a sequence into chunks(). Now we face with the problem to apply that technique to a sequence of infinite length when a last element of infinite sequence is encountered. You might be thinking now that this is a reduction to absurdity. But I'd say it is an exit from the trap. Mathematically this problem can't be solved. I am not ignoring your solution - I think it's quite feasible, but isn't it an overcomplication?
I mean 160 people out of 149 who upvoted the question are pretty happy with an answer that just outputs the last chunk as-is: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenl...
chunks('ABCDEFG', 3) --> 'ABC' 'DEF' 'G'
And it is quite nice solution to me, because you're free to do anything you'd like if you expect you data to be odd:
for chunk in chunks('ABCDEFG', size): if len(chunk) < size: raise Tail
You can make a helper iterator out of it too.
The above discussion of point 2 touches on point 4, which Raymond neglected in the particular message above but which has come up before: What are the allowed input and output types? An idea is not a programmable proposal until the domain, range, and mapping are specified.
Domain? Mapping? I am not ignoring existing knowledge and experience. I just don't want to complicate and don't see appropriate `import usecase` in current context, so I won't try to guess what this means.
in string -> out list of strings in list -> out list of lists
Possible inputs are a specific sequence (string, for instance), any sequence, any iterable. Possible outputs are a sequence or iterator of sequence or iterator. The various python-list and stackoverflow posts questions asks for various combinations. zip_longest and hence grouper takes any iterable and returns an iterator of tuples. (An iterator of maps might be more useful as a building block.) This is not what one usually wants with string input, for instance, nor with range input. To illustrate:
Allright. Got it. Sequences have a length and can be sliced with [i:j], iterator can't be sliced (and hence no chunks can be made). So this function doesn't belong to itertools - it is a missing string or sequence method. We can't have a chunk with an iterator, because iterator over a string decomposes it into a group of pieces with no reverse function. We can have a group and then join the group into something. But this requires the knowledge of appropriate join() function for the iterator, and probably not efficient. As there are no such function (must be that Mapping you referenced above) - the recomposition into chunks is impossible.
import itertools as it
def grouper(n, iterable, fillvalue=None): "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return it.zip_longest(*args, fillvalue=fillvalue)
print(*(grouper(3, 'ABCDEFG', 'x'))) # probably not wanted print(*(''.join(g) for g in grouper(3, 'ABCDEFG', 'x'))) # ('A', 'B', 'C') ('D', 'E', 'F') ('G', 'x', 'x') ABC DEF Gxx
-- What to do? One could easily write 20 different functions. So more thought is needed before adding anything. -1 on the idea as is.
I've learned a new English type of argument - "straw man" (I used to call this "hijacking"). This -1 doesn't belong to original idea. It belongs to proposal of itertools.chunks() with a long list of above points and completely different user stories (i.e. not "split string into chunks"). I hope you still +1 with 160 people on SO that think Python needs an easy way to chunk sequences.
For the doc, I think it would be helpful here and in most module subchapters if there were a subchapter table of contents at the top (under 9.1 in this case). Even though just 2 lines here (currently, but see below), it would let people know that there *is* a recipes section. After the appropriate tables, mention that there are example uses in the recipe section. Possibly add similar tables in the recipe section.
Unfortunately, it appeared that grouper() is not chunks(). It doesn't delivers list of list of chars given string as an input instead of list of chunks.
Another addition could be a new subsection on grouping (chunking) that would discuss post-processing of grouper (as discussed above), as well as other recipes, including ones specific to strings and sequences. It would essentially be a short how-to. Call it 9.1.3 "Grouping, Blocking, or Chunking Sequences and Iterables". The synonyms will help external searching. A toc would let people who have found this doc know to look for this at the bottom.
This makes matters pretty ugly. In ideal language there should be less docs, not more.