Extracting "true" words

candide candide at free.invalid
Sat Apr 2 15:18:44 CEST 2011

Le 02/04/2011 01:10, Chris Rebert a écrit :

> "Word" presumably/intuitively; hence the non-standard "[:word:]"
> POSIX-like character class alias for \w in some environments.


> Are you intentionally excluding CJK ideographs (as not "letters"/alphabetic)?

Yes, CJK ideographs don't belong to the locale I'm working with ;)

> And what of hyphenated terms (e.g. "re-lock")?

I'm interested only with ascii letters and ascii letters with diacritics

Thanks for your response.

More information about the Python-list mailing list