[Python-Dev] Regular expressions, Unicode etc.

"Martin v. Löwis" martin at v.loewis.de
Wed Aug 8 20:41:33 CEST 2007


> My second one is about Unicode.  I really, but REALLY regard it as
> a serious defect that there is no escape for printing characters.
> Any code that checks arbitrary text is likely to need them - yes,
> I know why Perl and hence PCRE doesn't have that, but let's skip
> that.  That is easy to add, though choosing a letter is tricky.
> Currently \c and \C, for 'character' (I would prefer 'text' or
> 'printable', but \t is obviously insane and \P is asking for
> incompatibility with Perl and Java).

Before discussing the escape, I'd like to see a specification of
it first - what characters precisely would classify as "printing"?

> But attempting to rebuild the Unicode database hasn't worked.
> Tools/unicode is, er, a trifle incomplete and out of date.  The
> only file I need to change is Objects/unicodetype_db.h, but the
> init attempts to run Tools/unicode/makeunicodedata.py have not
> been successful.
> 
> I may be able to reverse engineer the mechanism enough to get
> the files off the Unicode site and run it, but I don't want to
> spend forever on it.  Any clues?

I see that you managed to do something here, so I'm not sure
what kind of help you still need.

Regards,
Martin


More information about the Python-Dev mailing list