Scanning a file character by character
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Tue Feb 10 17:02:57 EST 2009
On Tue, 10 Feb 2009 12:06:06 +0000, Duncan Booth wrote:
> Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au> wrote:
>
>> On Mon, 09 Feb 2009 19:10:28 -0800, Spacebar265 wrote:
>>
>>> How would I do separate lines into words without scanning one
>>> character at a time?
>>
>> Scan a line at a time, then split each line into words.
>>
>>
>> for line in open('myfile.txt'):
>> words = line.split()
>>
>>
>> should work for a particularly simple-minded idea of words.
>>
> Or for a slightly less simple minded splitting you could try re.split:
>
>>>> re.split("(\w+)", "The quick brown fox jumps, and falls over.")[1::2]
> ['The', 'quick', 'brown', 'fox', 'jumps', 'and', 'falls', 'over']
Perhaps I'm missing something, but the above regex does the exact same
thing as line.split() except it is significantly slower and harder to
read.
Neither deal with quoted text, apostrophes, hyphens, punctuation or any
other details of real-world text. That's what I mean by "simple-minded".
--
Steven
More information about the Python-list
mailing list