How to iterate the input over a particular size?

John Posner jjposner at optimum.net
Tue Dec 29 12:54:37 EST 2009


On Tue, 29 Dec 2009 01:35:41 -0500, Steven D'Aprano
<steven at remove.this.cybersource.com.au> wrote:

> On Tue, 29 Dec 2009 00:49:50 -0500, John Posner wrote:
>
>> On Sun, 27 Dec 2009 09:44:17 -0500, joy99 <subhakolkata1234 at gmail.com>
>> wrote:
>>
>>> Dear Group,
>>>
>>> I am encountering a small question.
>>>
>>> Suppose, I write the following code,
>>>
>>> input_string=raw_input("PRINT A STRING:")
>>> string_to_word=input_string.split()
>>> len_word_list=len(string_to_word)
>>> if len_word_list>9:
>>>              rest_words=string_to_word[9:]
>>>              len_rest_word=len(rest_words)
>>>              if len_rest_word>9:
>>>                       remaining_words=rest_words[9:]
>>>
>>>
>> Here's an issue that has not, I think, been addressed in this thread.
>> The OP's problem is:
>>
>> 1. Start with an indefinitely long string.
>>
>> 2. Convert the string to a list, splitting on whitespace.
>>
>> 3. Repeatedly return subslices of the list, until the list is exhausted.
>>
>> This thread has presented one-chunk-at-a-time (e.g. generator/itertools)
>> approaches to Step #3, but what about Step #2? I've looked in the Python
>> documentation, and I've done some Googling, but I haven't found a
>> generator version of the string function split(). Am I missing
>> something?
>
> "Indefinitely long" doesn't mean you can't use split.
>
> But if you want a lazy splitter, here's a version which should do what
> you want:
>
>
> def lazy_split(text):
>     accumulator = []
>     for c in text:
>         if c in string.whitespace:
>             if accumulator:
>                 yield ''.join(accumulator)
>                 accumulator = []
>         else:
>             accumulator.append(c)
>     if accumulator:
>         yield ''.join(accumulator)
>

Thanks -- I coded a similar generator, too. Yours handles the corner cases
and end-of-string more elegantly, so I won't share mine. :-)

>
> Other alternatives are to use a regex to find runs of whitespace
> characters, then yield everything else; or to use the itertools.groupby
> function.

Yup, that approach occurred to me as I was falling asleep last night (OMG,
get a life, buddy!):

def groupby_split(text):
       whitespace_grouper = itertools.groupby(
                               text,
                               lambda(c): c not in string.whitespace)
       for boolval, group_iter in itertools.ifilter(
                                     lambda pair: pair[0] == True,
                                     whitespace_grouper):
           yield "".join(group_iter)

Tx,
John



More information about the Python-list mailing list