byte count unicode string

John Machin sjmachin at lexicon.net
Wed Sep 20 02:35:29 EDT 2006


willie wrote:
> # What's the correct way to get the
> # byte count of a unicode (UTF-8) string?
> # I couldn't find a builtin method
> # and the following is memory inefficient.
>
> ustr = "example\xC2\x9D".decode('UTF-8')
>
> num_chars = len(ustr)    # 8
>
> buf = ustr.encode('UTF-8')
>
> num_bytes = len(buf)     # 9

num_bytes = len("example\xC2\x9D")

This produces 9; isn't that what you want?
If not, please explain, with examples, what you mean by "the
byte count of a unicode (UTF-8) string".

HTH,
John




More information about the Python-list mailing list