Unicode regular expressions -- buggy?

Fredrik Lundh fredrik at pythonware.com
Thu Aug 11 04:08:24 EDT 2005


Christopher Subich wrote:

> I don't think the python regular expression module correctly handles
> combining marks; it gives inconsistent results between equivalent forms
> of some regular expressions:

> Is this a limitation-by-design, or a bug?

limitation by design.  if you want correct results, make sure to use
early normalization everywhere.

cf. http://www.w3.org/TR/charmod-norm/

</F> 






More information about the Python-list mailing list