[Python-Dev] Regular expressions, Unicode etc.

"Martin v. Löwis" martin at v.loewis.de
Wed Aug 8 22:38:03 CEST 2007


>> Before discussing the escape, I'd like to see a specification of
>> it first - what characters precisely would classify as "printing"?
> 
> For basic ASCII and locale-based testing, whatever isprint() says.
> Just as for isalpha().

In the mediate term, locale-based testing will go away/be not
implementable (in particular, Py3k won't have a byte-oriented
character string type, so we can't use isprint). In general,
isprint is unsuitable since it doesn't support multi-byte
character sets.

> For Unicode, whatever people agree!  I use the criterion that it
> has a defined category that doesn't start with 'C' - which is what
> I think that most people will accept.

-1. There must be a better specification than that.

Can you please explain the concept of "printing character"? If
you have a Unicode code point, how do you determine whether it
is printing? If rendering it would generate black pixels on white
background?

Regards,
Martin


More information about the Python-Dev mailing list