[issue1693050] \w not helpful for non-Roman scripts
Martin v. Löwis
report at bugs.python.org
Fri Nov 28 22:33:41 CET 2008
Martin v. Löwis <martin at v.loewis.de> added the comment:
Unicode TR#18 defines \w as a shorthand for
\p{alpha}
\p{gc=Mark}
\p{digit}
\p{gc=Connector_Punctuation}
which would include all marks. We should recursively check whether we
follow the recommendation (e.g. \p{alpha} refers to all character having
the Alphabetic derived core property, which is Lu+Ll+Lt+Lm+Lo+Nl +
Other_Alphabetic, where Other_Alphabetic is a selected list of
additional character - all from Mn/Mc)
----------
nosy: +loewis
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1693050>
_______________________________________
More information about the Python-bugs-list
mailing list