[Python-ideas] Py3k invalid unicode idea

Terry Reedy tjreedy at udel.edu
Thu Oct 9 20:42:48 CEST 2008


Dillon Collins wrote:
> On Thursday 09 October 2008, Stephen J. Turnbull wrote:

> With my proposal, unicode strings would have a valid flag, and one could 
> easily modify PyUnicode_AS_UNICODE to return NULL (and a UnicodeError) if the 
> string is invalid, and make a PyUnicode_AS_RAWUNICODE that wouldn't.  Or you 
> could simply document that libraries need to call a PyUnicode_ISVALID to 
> determine whether or not the string contains invalid codes.

Would it make any sense to have a Filename subclass or a BadFilename 
subclass or more generally a PUAcode subclass for any unicode generated 
by the core that uses the PUA?  In either of the latter cases, any app 
using the PUA would/could know not to mix PUAcode instances into their 
own unicode.  And leakage without re-encoding into bytes could be 
inhibited more easily.

tjr




More information about the Python-ideas mailing list