Extracting words from a string : *fast*

Thomas Weholt thomas at gatsoft.no
Tue Jun 19 05:25:16 EDT 2001


I need to extract words from a string. This method will be used extensivly
in a indexer so it needs to be as fast as possible.

It needs to split words by case, numbers, spaces and chars like ,.-_/\*'
etc. Returns a list of lower-case entries of the words found or a dictionary
of were the words are keys and number of occurences are values.


s = 'This is a.test for ThomasWeholt - magic42'
print getWords(s)

The text to be processed are mostly small in size but can also be huge, etc.


More information about the Python-list mailing list