[Tutor] Unicode and regexes

Michael Broe mbroe at columbus.rr.com
Sun Mar 12 02:46:54 CET 2006


Thanks Kent, for breaking the bad news. I'm not angry, just terribly,  
terribly disappointed. :)

"From http://www.unicode.org/unicode/reports/tr18/ I see that \p{L} is
intended to select Unicode letters, and it is part of a large number of
selectors based on Unicode character properties."

Yeah, that's the main cite, and yeah, a large, large number. The only  
sane way to use regexes with Unicode. Also see Friedl's 'Mastering  
Regular Expressions' Chapter 3: or actually, if you are a Python only  
person, don't: it will make you weep.

"Python doesn't support this syntax. It has limited support for  
Unicode character properties [...]".

Umm Earth to Python-guys, you *have heard* of Unicode, right? Call me  
crazy, but in this day and age, I assume a scripting language with  
regex support will implement standard Unicode conventions, unless  
there is a compelling reason not to. Very odd.

Back to Perl. Right now. Just kidding. Not. Sheesh. This is a big  
flaw in Python, IMHO. I never saw it coming.




More information about the Tutor mailing list