[Python-Dev] thoughts on the bytes/string discussion

Stephen J. Turnbull stephen at xemacs.org
Fri Jun 25 12:06:33 CEST 2010


Ian Bicking writes:

 > We've setup a system where we think of text as natively unicode, with
 > encodings to put that unicode into a byte form.  This is certainly
 > appropriate in a lot of cases.  But there's a significant class of problems
 > where bytes are the native structure.  Network protocols are what we've been
 > discussing, and are a notable case of that.  That is, b'/' is the most
 > native sense of a path separator in a URL, or b':' is the most native sense
 > of what separates a header name from a header value in HTTP.

IMHO, URIs don't have a native language in this sense.  Network
programmers do, however, and it is bytes.  Text-handling programmers
also do, and it is str.

 > So with this idea in mind it makes more sense to me that *specific pieces of
 > text* can be reasonably treated as both bytes and text.  All the string
 > literals in urllib.parse.urlunspit() for example.
 > 
 > The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does not
 > become special('/x')) and special('/')+x=='/x' (again it becomes str).  This
 > avoids some of the cases of unicode or str infecting a system as they did in
 > Python 2 (where you might pass in unicode and everything works fine until
 > some non-ASCII is introduced).

I think you need to give explicit examples where this actually helps
in terms of "type contagion".  I expect that it doesn't help at all,
especially not for the people whose native language for URIs is bytes.
These specials are still going to flip to unicode as soon as it comes
in, and that will be incompatible with the bytes they'll need later.
So they're still going to need to filter out unicode on input.

It looks like it would be useful for programmers of polymorphic
functions, though.


More information about the Python-Dev mailing list