Building a word list from multiple files
steven.bethard at gmail.com
Thu Nov 18 20:56:55 CET 2004
Larry Bates wrote:
> 2) Are the words in the file separated with some consistent
> character (e.g. space, tab, csv, etc).
> If not, you will probably need to use regular expressions
> to handle all different punctuations that might separate
> the words. Things like quotes, commas, periods, colons,
> semi-colons, etc. Simple string split won't handle these
If you go this way, you probably ought to read this thread:
which suggests finding words with a regexp something like r'[^\W\d_]+'.
(If you're not concerned about internationalization, it could be simpler.)
More information about the Python-list