help make it faster please

Lonnie Princehouse finite.automaton at gmail.com
Thu Nov 10 14:36:58 EST 2005


The word_finder regular expression defines what will be considered a
word.

"[a-z0-9_]" means "match a single character from the set {a through z,
0 through 9, underscore}".
The + means "match as many as you can, minimum of one"

To match @ as well, add it to the set of characters to match:

   word_finder = re.compile('[a-z0-9_@]+', re.I)

The re.I flag makes the expression case insensitive.
See the documentation for re for more information.


Also--- It looks like I forgot to lowercase matched words.  The line
   word = match.group(0)
should read:
   word = match.group(0).lower()




More information about the Python-list mailing list