string to unicode

Tim Roberts timr at probo.com
Tue Aug 16 20:32:12 EDT 2011


Artie Ziff <artie.ziff at gmail.com> wrote:
>
>if I am using the standard csv library to read contents of a csv file 
>which contains Unicode strings (short example: 
>'\xe8\x9f\x92\xe8\x9b\x87'),

You need to be rather precise when talking about this.  That's not a
"Unicode string" in Python terms.  It's an 8-bit string.  It might be UTF-8
encoding.  If so, it maps to two Unicode code points, U+87D2 and U+86C7,
which are both CJK ideograms.  Is that what you expected?

  C:\Dev\videology\sw\viewer>python
  Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
(Intel)] on win32
  Type "help", "copyright", "credits" or "license" for more information.
  >>> x = '\xe8\x9f\x92\xe8\x9b\x87'
  >>> x.decode('utf8')
  u'\u87d2\u86c7'
-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.



More information about the Python-list mailing list