[Python-3000] Unicode and OS strings

Greg Ewing greg.ewing at canterbury.ac.nz
Fri Sep 14 07:08:04 CEST 2007


Stephen J. Turnbull wrote:
> You can't win that, because Unicode is the only encoding that attempts
> to guarantee even the possibility of round-tripping.

Rubbish -- I can do print [ord(c) for c in my_unicode_string]
and get perfect round-trippability if I want.

You can ask people to use pre-existing officially-sanctioned
encodings for their unicode data, but you can't force them to.

> The main problem with this scheme that I know of is that if you have a
> Python string that contains such a code point, you'll need to somehow
> include the information about the original encoding when pickling and
> the like.

That's exactly the sort of thing I'm talking about. It
would be surprising if pickling worked reliably for all
strings *except* ones that happened to come in as a
command line argument.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+


More information about the Python-3000 mailing list