Scanning a file character by character

Tue Feb 10 18:20:05 EST 2009

Steven D'Aprano wrote:
> On Tue, 10 Feb 2009 16:46:30 -0600, Tim Chase wrote:
> 
>>>> Or for a slightly less simple minded splitting you could try re.split:
>>>>
>>>>>>> re.split("(\w+)", "The quick brown fox jumps, and falls
>>>>>>> over.")[1::2]
>>>> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
>>>
>>> Perhaps I'm missing something, but the above regex does the exact same
>>> thing as line.split() except it is significantly slower and harder to
>>> read.
> 
> ...
> 
>> Note the difference in "jumps" vs. "jumps,"  (extra comma in the
>> string.split() version) and likewise the period after "over". Thus not
>> quite "the exact same thing as line.split()".
> 
> Um... yes. I'll just slink away quietly now... nothing to see here...
> 
You could've used str.translate to strip out the unwanted characters.