Extracting "true" words

candide candide at free.invalid
Sat Apr 2 15:18:44 CEST 2011


Le 02/04/2011 01:10, Chris Rebert a écrit :

> "Word" presumably/intuitively; hence the non-standard "[:word:]"
> POSIX-like character class alias for \w in some environments.

OK


> Are you intentionally excluding CJK ideographs (as not "letters"/alphabetic)?

Yes, CJK ideographs don't belong to the locale I'm working with ;)


> And what of hyphenated terms (e.g. "re-lock")?


I'm interested only with ascii letters and ascii letters with diacritics


Thanks for your response.




More information about the Python-list mailing list