Extracting words from a string : *fast*
Thomas Weholt
thomas at gatsoft.no
Tue Jun 19 05:25:16 EDT 2001
Hi,
I need to extract words from a string. This method will be used extensivly
in a indexer so it needs to be as fast as possible.
It needs to split words by case, numbers, spaces and chars like ,.-_/\*'
etc. Returns a list of lower-case entries of the words found or a dictionary
of were the words are keys and number of occurences are values.
Ex.
s = 'This is a.test for ThomasWeholt - magic42'
print getWords(s)
-----------------------------------------------------
['this','is','a','test','for','thomas','weholt','magic','magic42']
The text to be processed are mostly small in size but can also be huge, etc.
1-10MB.
Thomas
More information about the Python-list
mailing list