Looping [was Re: Python and the need for speed]

Jussi Piitulainen jussi.piitulainen at helsinki.fi
Tue Apr 18 05:02:57 EDT 2017


Christian Gollwitzer writes:

> Am 18.04.17 um 08:21 schrieb Chris Angelico:
>> On Tue, Apr 18, 2017 at 4:06 PM, Christian Gollwitzer <auriocus at gmx.de> wrote:
>>> Am 18.04.17 um 02:18 schrieb Ben Bacarisse:
>>>
>>>> Thanks (and to Grant).  IO seems to be the canonical example.  Where
>>>> some languages would force one to write
>>>>
>>>>   c = sys.stdin.read(1)
>>>>   while c == ' ':
>>>>       c = sys.stdin.read(1)
>>>
>>> repeat
>>>         c  = sys.stdin.read(1)
>>> until c != ' '
>>
>> Except that there's processing code after it.
>>
>
> Sorry, I misread it then - Ben's code did NOT have it, it looks like a
> "skip the whitespace" loop.

It also reads the first character that is not whitespace, so it's not
usable to *merely* skip the whitespace.

>> while True:
>>     c = sys.stdin.read(1)
>>     if not c: break
>>     if c.isprintable(): text += c
>>     elif c == "\x08": text = text[:-1]
>>     # etc
>>
>> Can you write _that_ as a do-while?
>
> No. This case OTOH looks like an iteration to me and it would be most
> logical to write
>
> for c in sys.stdin:
>      if c.isprintable(): text += c
>      elif c == "\x08": text = text[:-1]
>      # etc
>
> except that this iterates over lines. Is there an analogous iterator
> for chars? For "lines" terminated by something else than "\n"?
> "for c in get_chars(sys.stdin)" and
> "for c in get_string(sys.stdin, terminate=':')" would be nicely
> readable IMHO. Or AWK-like processing:
>
> for fields in get_fields(open('/etc/passwd'), RS='\n', FS=':'):
> 	if fields[2]=='0':
> 		print 'Super-User found:', fields[0]

I don't know if those exist in some standard library, but they are easy
to write, and I do it all the time. I don't need the chars one, but I do
tab-separated fields and line-separated groups of tab-separated fields,
and variations.

for s, sentence in enumerate(sentences(sys.stdin)):
    for k, token in enumerate(sentence):
        ...
        token[LEMMA] or warn('empty LEMMA', s, k, sentence)
        ...

The wrapper function around sys.stdin or other text source is different
depending on the data format. Sometimes it's messy, sometimes not. Any
messy details are hidden the wrapper.


More information about the Python-list mailing list