[Tutor] Decode and Encode
kwpolska at gmail.com
Wed Jan 28 11:26:43 CET 2015
On Wed, Jan 28, 2015 at 10:35 AM, Sunil Tech <sunil.techspk at gmail.com> wrote:
> Hi All,
> When i copied a text from web and pasted in the python-terminal, it
> automatically coverted into unicode(i suppose)
> can anyone tell me how it does?
>>>> p = "你好"
>>>> o = 'ªîV'
No, it didn’t. You created a bytestring, that contains some bytes.
Python does NOT think of `p` as a unicode string of 2 characters, it’s
a bytestring of 6 bytes. You cannot use that byte string to reliably
get only the first character, for example — `p` will get you
garbage ('\xe4' which will render as a question mark on an UTF-8
In order to get a real unicode string, you must do one of the following:
(a) prepend it with u''. This works only if your locale is set
correctly and Python knows you use UTF-8. For example:
>>> p = u"你好"
(b) Use decode on the bytestring, which is safer and does not depend
on a properly configured system.
>>> p = "你好".decode('utf-8')
However, this does not apply in Python 3. Python 3 defaults to
Unicode strings, so you can do:
>>> p = "你好"
and have proper Unicode handling, assuming your system locale is set
correctly. If it isn’t,
>>> p = b"你好".decode('utf-8')
would do it.
Chris Warrick <https://chriswarrick.com/>
More information about the Tutor