[Tutor] Unicode and regexes
Michael Broe
mbroe at columbus.rr.com
Sun Mar 12 02:46:54 CET 2006
Thanks Kent, for breaking the bad news. I'm not angry, just terribly,
terribly disappointed. :)
"From http://www.unicode.org/unicode/reports/tr18/ I see that \p{L} is
intended to select Unicode letters, and it is part of a large number of
selectors based on Unicode character properties."
Yeah, that's the main cite, and yeah, a large, large number. The only
sane way to use regexes with Unicode. Also see Friedl's 'Mastering
Regular Expressions' Chapter 3: or actually, if you are a Python only
person, don't: it will make you weep.
"Python doesn't support this syntax. It has limited support for
Unicode character properties [...]".
Umm Earth to Python-guys, you *have heard* of Unicode, right? Call me
crazy, but in this day and age, I assume a scripting language with
regex support will implement standard Unicode conventions, unless
there is a compelling reason not to. Very odd.
Back to Perl. Right now. Just kidding. Not. Sheesh. This is a big
flaw in Python, IMHO. I never saw it coming.
More information about the Tutor
mailing list