
Ian Bicking writes:
We've setup a system where we think of text as natively unicode, with encodings to put that unicode into a byte form. This is certainly appropriate in a lot of cases. But there's a significant class of problems where bytes are the native structure. Network protocols are what we've been discussing, and are a notable case of that. That is, b'/' is the most native sense of a path separator in a URL, or b':' is the most native sense of what separates a header name from a header value in HTTP.
IMHO, URIs don't have a native language in this sense. Network programmers do, however, and it is bytes. Text-handling programmers also do, and it is str.
So with this idea in mind it makes more sense to me that *specific pieces of text* can be reasonably treated as both bytes and text. All the string literals in urllib.parse.urlunspit() for example.
The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does not become special('/x')) and special('/')+x=='/x' (again it becomes str). This avoids some of the cases of unicode or str infecting a system as they did in Python 2 (where you might pass in unicode and everything works fine until some non-ASCII is introduced).
I think you need to give explicit examples where this actually helps in terms of "type contagion". I expect that it doesn't help at all, especially not for the people whose native language for URIs is bytes. These specials are still going to flip to unicode as soon as it comes in, and that will be incompatible with the bytes they'll need later. So they're still going to need to filter out unicode on input. It looks like it would be useful for programmers of polymorphic functions, though.