[Python-Dev] bytes / unicode

P.J. Eby pje at telecommunity.com
Sat Jun 26 20:17:44 CEST 2010


At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote:
>What I'm saying here is that if bytes are the signal of validity, and
>the stdlib functions preserve validity, then it's better to have the
>stdlib functions object to unicode data as an argument.  Compare the
>alternative: it returns a unicode object which might get passed around
>for a while before one of your functions receives it and identifies it
>as unvalidated data.

I still don't follow, since passing in bytes should return 
bytes.  Returning unicode would be an error, in the case of a 
"polymorphic" function (per Guido).


>But you agree that there are better mechanisms for validation
>(although not available in Python yet), so I don't see this as an
>potential obstacle to polymorphism now.

Nope.  I'm just saying that, given two bytestrings to url-join or 
path join or whatever, a polymorph should hand back a 
bytestring.  This seems pretty uncontroversial.


>  > What I want is for the stdlib to create stringlike objects of a
>  > type determined by the types of the inputs --
>
>In general this is a hard problem, though.  Polymorphism, OK, one-way
>tainting OK, but in general combining related types is pretty
>arbitrary, and as in the encoded-bytes case, the result type often
>varies depending on expectations of callers, not the types of the
>data.

But the caller can enforce those expectations by passing in arguments 
whose types do what they want in such cases, as long as the string 
literals used by the function don't get to override the relevant 
parts of the string protocol(s).

The idea that I'm proposing is that the basic string and byte types 
should defer to "user-defined" string types for mixed type 
operations, so that polymorphism of string-manipulation functions is 
the *default* case, rather than a *special* case.  This makes 
tainting easier to implement, as well as optimizing and other special 
cases (like my "source string w/file and line info", or a string with 
font/formatting attributes).




>_______________________________________________
>Python-Dev mailing list
>Python-Dev at python.org
>http://mail.python.org/mailman/listinfo/python-dev
>Unsubscribe: 
>http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com



More information about the Python-Dev mailing list