re.compile for names
brad
byte8bits at gmail.com
Mon May 21 09:46:33 EDT 2007
I am developing a list of 3 character strings like this:
and
bra
cam
dom
emi
mar
smi
...
The goal of the list is to have enough strings to identify files that
may contain the names of people. Missing a name in a file is unacceptable.
For example, the string 'mar' would get marc, mark, mary, maria... 'smi'
would get smith, smiley, smit, etc. False positives are OK (getting
common words instead of people's names is OK).
I may end up with a thousand or so of these 3 character strings. Is that
too much for an re.compile to handle? Also, is this a bad way to
approach this problem? Any ideas for improvement are welcome!
I can provide more info off-list for those who would like.
Thank you for your time,
Brad
More information about the Python-list
mailing list