byte count unicode string
Gabriel Genellina
gagsl-py at yahoo.com.ar
Wed Sep 20 19:59:38 EDT 2006
At Wednesday 20/9/2006 19:53, willie wrote:
>What is the proper way to describe "ustr" below?
>
> >>> ustr = buf.decode('UTF-8')
> >>> type(ustr)
><type 'unicode'>
>
>
>Is it a "unicode object that contains a UTF-8 encoded
>string object?"
ustr is an unicode object. Period. An unicode object contains
characters (not bytes).
buf, apparently, is a string - a string of bytes. Those bytes
apparently represent some unicode characters encoded using the UTF-8
encoding. So, you can decode them -using the decode() method- to get
the unicode object.
Very roughly, the difference is like that of an integer and its
representations:
w = 1
x = 0x0001
y = 001
z = struct.unpack('>h','\x00\x01')
All three objects are the *same* integer, 1.
There is no way of knowing *how* the integer was spelled, i.e., from
which representation it comes from - like the unicode object, it has
no "encoding" by itself.
You can go back and forth between an integer number and its decimal
representation - like astring.decode() and ustring.encode()
Gabriel Genellina
Softlab SRL
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
More information about the Python-list
mailing list