[Python-Dev] bytes / unicode

James Y Knight foom at fuhm.net
Tue Jun 22 20:07:18 CEST 2010


On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote:
> Similarly I'd expect (from experience) that a programmer using  
> Python to want to take the same approach, sticking with unencoded  
> data in nearly all situations.

Yeah. This is a real issue I have with the direction Python3 went: it  
pushes you into decoding everything to unicode early, even when you  
don't care -- all you really wanted to do is pass it from one API to  
another, with some well-defined transformations, which don't actually  
depend on it having being decoded properly. (For example, extracting  
the path from the URL and attempting to open it as a file on the  
filesystem.)

This means that Python3 programs can become *more* fragile in the face  
of random data you encounter out in the real world, rather than less  
fragile, which was the goal of the whole exercise.

The surrogateescape method is a nice workaround for this, but I can't  
help thinking that it might've been better to just treat stuff as  
possibly-invalid-but-probably-utf8 byte-strings from input, through  
processing, to output. It seems kinda too late for that, though: next  
time someone designs a language, they can try that. :)

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100622/7f537c1e/attachment-0001.html>


More information about the Python-Dev mailing list