
On 6/29/2012 4:32 PM, Georg Brandl wrote:
On 26.06.2012 10:03, anatoly techtonik wrote:
Now that Python 3 is all about iterators (which is a user killer feature for Python according to StackOverflow - http://stackoverflow.com/questions/tagged/python) would it be nice to introduce more first class functions to work with them? One function to be exact to split string into chunks.
Nothing special about strings.
itertools.chunks(iterable, size, fill=None)
This is a renaming of itertools.grouper in 9.1.2. Itertools Recipes. You should have mentioned this. I think of 'blocks' rather than 'chunks', but I notice several SO questions with 'chunk(s)' in the title.
Which is the 33th most voted Python question on SO - http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenl...
I am curious how you get that number. I do note that there are about 15 other Python SO questions that seem to be variations on the theme. There might be more if 'blocks' and 'groups' were searched for.
Anatoly, so far there were no negative votes -- would you care to go another step and propose a patch?
That is because Raymond H. is not reading either list right now ;-) Hence the Cc:. Also because I did not yet respond to a vague, very incomplete idea. From Raymond's first message on http://bugs.python.org/issue6021 , add grouper: "This has been rejected before. * It is not a fundamental itertool primitive. The recipes section in the docs shows a clean, fast implementation derived from zip_longest(). * There is some debate on a correct API for odd lengths. Some people want an exception, some want fill-in values, some want truncation, and some want a partially filled-in tuple. The alone is reason enough not to set one behavior in stone. * There is an issue with having too many itertools. The module taken as a whole becomes more difficult to use as new tools are added." --- This is not to say that the question should not be re-considered. Given the StackOverflow experience in addition to that of the tracker and python-list (and maybe python-ideas), a special exception might be made in relation to points 1 and 3. --- It regard to point 2: many 'proposals', including Anatoly's, neglect this detail. But the function has to do *something* when seqlen % grouplen != 0. So an 'idea' is not really a concrete programmable proposal until 'something' is specified. Exception -- not possible for an itertool until the end of the iteration (see below). To raise immediately for sequences, one could wrap grouper. def exactgrouper(sequence, k): # untested if len(sequence) % k: raise ValueError('Sequence length {} must be a multiple of group length {}'.format(len(sequence), k) else: return itertools.grouper(sequence, k) Of course, sequences can also be directly sequentially sliced (but should the result be an iterable or sequence of blocks?). But we do not have a seqtools module and I do not think there should be another method added to the seq protocol. Fill -- grouper always does this, with a default of None. Truncate, Remainder -- grouper (zip_longest) cannot directly do this and no recipes are given in the itertools docs. (More could be, see below.) Discussions on python-list gives various implementations either for sequences or iterables. For the latter, one approach is "it = iter(iterable)" followed by repeated islice of the first n items. Another is to use a sentinal for the 'fill' to detect a final incomplete block (tuple for grouper). def grouper_x(n, iterable): # untested sentinal = object() for g in grouper(n, iterable, sentinal): if g[-1] != sentinal: yield g else: # pass to truncate # yield g[:g.index(sentinal) for remainer # raise ValueError for delayed exception --- The above discussion of point 2 touches on point 4, which Raymond neglected in the particular message above but which has come up before: What are the allowed input and output types? An idea is not a programmable proposal until the domain, range, and mapping are specified. Possible inputs are a specific sequence (string, for instance), any sequence, any iterable. Possible outputs are a sequence or iterator of sequence or iterator. The various python-list and stackoverflow posts questions asks for various combinations. zip_longest and hence grouper takes any iterable and returns an iterator of tuples. (An iterator of maps might be more useful as a building block.) This is not what one usually wants with string input, for instance, nor with range input. To illustrate: import itertools as it def grouper(n, iterable, fillvalue=None): "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return it.zip_longest(*args, fillvalue=fillvalue) print(*(grouper(3, 'ABCDEFG', 'x'))) # probably not wanted print(*(''.join(g) for g in grouper(3, 'ABCDEFG', 'x'))) # ('A', 'B', 'C') ('D', 'E', 'F') ('G', 'x', 'x') ABC DEF Gxx -- What to do? One could easily write 20 different functions. So more thought is needed before adding anything. -1 on the idea as is. For the doc, I think it would be helpful here and in most module subchapters if there were a subchapter table of contents at the top (under 9.1 in this case). Even though just 2 lines here (currently, but see below), it would let people know that there *is* a recipes section. After the appropriate tables, mention that there are example uses in the recipe section. Possibly add similar tables in the recipe section. Another addition could be a new subsection on grouping (chunking) that would discuss post-processing of grouper (as discussed above), as well as other recipes, including ones specific to strings and sequences. It would essentially be a short how-to. Call it 9.1.3 "Grouping, Blocking, or Chunking Sequences and Iterables". The synonyms will help external searching. A toc would let people who have found this doc know to look for this at the bottom. -- Terry Jan Reedy