Unicode classes of characters in Pythons' re's like in Perl?

Roman Suzi rnd at onego.ru
Tue Jul 30 08:10:33 EDT 2002


On 30 Jul 2002, Martin v. [iso-8859-15] LЖwis wrote:

> Roman Suzi <rnd at onego.ru> writes:

> So far, nobody has proposed to support Unicode categories in SRE. You
> can easily implement this yourself by means of using
> unicodedata.category, e.g.

OK. Thanks.

Probably, there should be pre-compiled categories somewhere in
the standard library... Say, in RE module.

> It turns out that those categories are useless for XML, since the XML
> character classes (in XML 1.0) have been defined using a different
> Unicode versions (XML uses the Unicode 2.0 database). The same appears
> to be the case for XML Schema: They use the Unicode 3.1 database;
> Python 2.2 has the Unicode 3.0 database.

Isn't Python one of the best choices for XML processing ;-)
 
> So to implement XML Schema, you probably have to parse the specific
> version of the Unicode database yourself, and construct the re class
> from that.
 
> Regards,
> Martin

Sincerely yours, Roman A.Suzi
-- 
 - Petrozavodsk - Karelia - Russia - mailto:rnd at onego.ru -
 





More information about the Python-list mailing list