Finding Peoples' Names in Files
John J. Lee
jjl at pobox.com
Thu Oct 11 21:25:16 CEST 2007
brad <byte8bits at gmail.com> writes:
> Crazy question, but has anyone attempted this or seen Python code that
> does? For example, if a text file contained 'Guido' and or 'Robert'
> and or 'Susan', then we should return True, otherwise return False.
A few ideas:
1. If you don't have a list of names, find a list of words that
doesn't contain proper nouns (there are a few word lists out there,
not sure if any exclude people's names, though). Look for short runs
of two or three "words" (punctuation-separated tokens) in the email
that aren't in the dictionary. Some of them will be people's names.
2. Send the text through Google translate and look for runs of words
that are unchanged. Some of them will be people's names.
3. Search the literature and look for fancy algorithms. Here are some
papers (the last mentions some commercial software to do this):
More information about the Python-list