Scanning a file character by character

MRAB google at
Wed Feb 11 00:20:05 CET 2009

Steven D'Aprano wrote:
> On Tue, 10 Feb 2009 16:46:30 -0600, Tim Chase wrote:
>>>> Or for a slightly less simple minded splitting you could try re.split:
>>>>>>> re.split("(\w+)", "The quick brown fox jumps, and falls
>>>>>>> over.")[1::2]
>>>> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
>>> Perhaps I'm missing something, but the above regex does the exact same
>>> thing as line.split() except it is significantly slower and harder to
>>> read.
> ...
>> Note the difference in "jumps" vs. "jumps,"  (extra comma in the
>> string.split() version) and likewise the period after "over". Thus not
>> quite "the exact same thing as line.split()".
> Um... yes. I'll just slink away quietly now... nothing to see here...
You could've used str.translate to strip out the unwanted characters.

More information about the Python-list mailing list