[Python-Dev] bytes / unicode

Thu Jun 24 19:07:01 CEST 2010

At 05:12 PM 6/24/2010 +0900, Stephen J. Turnbull wrote:
>Guido van Rossum writes:
>
>  > For example: how we can make the suite of functions used for URL
>  > processing more polymorphic, so that each developer can choose for
>  > herself how URLs need to be treated in her application.
>
>While you have come down on the side of polymorphism (as opposed to
>separate functions), I'm a little nervous about it.  Specifically,
>Philip Eby expressed a desire for earlier type errors, while
>polymorphism seems to ensure that you'll need to Look Before You Leap
>to get early error detection.

This doesn't have to be in the functions; it can be in the 
*types*.  Mixed-type string operations have to do type checking and 
upcasting already, but if the protocol were open, you could make an 
encoded-bytes type that would handle the error checking.

(Btw, in some earlier emails, Stephen, you implied that this could be 
fixed with codecs -- but it can't, because the problem isn't with the 
bytes containing invalid Unicode, it's with the Unicode containing 
invalid bytes -- i.e., characters that can't be encoded to the 
ultimate codec target.)