how to get size of unicode string/string in bytes ?

Stefan Behnel stefan.behnel-n05pAM at web.de
Tue Aug 1 04:53:14 EDT 2006


pattreeya at gmail.com wrote:
>   how can I get the number of byte of the string in python?
> with "len(string)", it doesn't work to get the size of the string in
> bytes if I have the unicode string but just the length. (it only works
> fine for ascii/latin1) In data structure, I have to store unicode
> string for many languages and must know exactly how big of my string
> which is stored so I can read back later.

I do not quite know what you could possibly need that for, but AFAICT Python
only uses two different unicode encodings depending on the platform.

If 'sys.maxunicode' is bigger than 65536, you're on a 32 bit unicode platform
(UCS4), otherwise you're on UCS. For UCS4, you can multiply the length of the
unicode string by 4 to get the length of the internal memory buffer, otherwise
multiply it by 2.

Normally, however, you should not need to deal with this kind of detail. Since
you say "read back later", maybe what you actually want is a serialisation of
the unicode string in, say, UTF-8 or something, that you can actually write to
a file and read back.

Stefan



More information about the Python-list mailing list