
On Fri, Jun 25, 2010 at 5:06 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
So with this idea in mind it makes more sense to me that *specific pieces of text* can be reasonably treated as both bytes and text. All the string literals in urllib.parse.urlunspit() for example.
The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does not become special('/x')) and special('/')+x=='/x' (again it becomes str). This avoids some of the cases of unicode or str infecting a system as they did in Python 2 (where you might pass in unicode and everything works fine until some non-ASCII is introduced).
I think you need to give explicit examples where this actually helps in terms of "type contagion". I expect that it doesn't help at all, especially not for the people whose native language for URIs is bytes. These specials are still going to flip to unicode as soon as it comes in, and that will be incompatible with the bytes they'll need later. So they're still going to need to filter out unicode on input.
It looks like it would be useful for programmers of polymorphic functions, though.
I'm proposing these specials would be used in polymorphic functions, like the functions in urllib.parse. I would not personally use them in my own code (unless of course I was writing my own polymorphic functions). This also makes it less important that the objects be a full stand-in for text, as their use should be isolated to specific functions, they aren't objects that should be passed around much. So you can easily identify and quickly detect if you use unsupported operations on those text-like objects. (This is all a very different use case from bytes+encoding, I think) -- Ian Bicking | http://blog.ianbicking.org