Can I get the 8bit-string representation of any unicode string
Kent Johnson
kent at kentsjohnson.com
Sun Feb 12 10:36:58 EST 2006
wanghz at gmail.com wrote:
> Hello, everyone.
>
> I have a problem when I'm processing unicode strings. Is it possible
> to get the 8bit-string representation of any unicode string?
Yes, if you can be more precise about what you mean by '8bit-string
representation'. Likely candidates are
b.encode('utf-8')
b.encode('utf_16_be')
b.encode('utf_16_le')
Kent
>
> Suppose I get a unicode string:
> a = u'\xc8\xce\xcf\xcd\xc6\xeb';
> then, by
> a.encode('latin-1');
> I can get the 8bit-string representation of it, that is, the physical
> storage format of this string.
>
> But for another kind of unicode string, say:
> b = u'\u4efb\u8d24\u9f50';
> I have to:
> b.encode('utf-8')
> to get the 8bit-string format of it.
>
> Since these unicode strings are given by an external library function,
> I don't know which kind a unicode string belongs to before I get it at
> runtime. So, I wonder if there is a unified way to get the 8bit-string
> representation, say, byte-by-byte, of any unicode string?
>
> Thank you very much.
>
More information about the Python-list
mailing list