how to get size of unicode string/string in bytes ?

Tue Aug 1 06:12:20 EDT 2006

Diez B. Roggisch wrote
> Stefan Behnel wrote:
> 
>> pattreeya at gmail.com wrote:
>>>   how can I get the number of byte of the string in python?
>>> with "len(string)", it doesn't work to get the size of the string in
>>> bytes if I have the unicode string but just the length. (it only works
>>> fine for ascii/latin1) In data structure, I have to store unicode
>>> string for many languages and must know exactly how big of my string
>>> which is stored so I can read back later.
>> I do not quite know what you could possibly need that for, but AFAICT
>> Python only uses two different unicode encodings depending on the
>> platform.
> 
> It is very important for relational databases, as these usually constrain
> the amount of bytes per column - so you need the size of bytes, not the
> number of unicode characters.

So then the easiest thing to do is: take the maximum length of a unicode
string you could possibly want to store, multiply it by 4 and make that the
length of the DB field.

However, I'm pretty convinced it is a bad idea to store Python unicode strings
directly in a DB, especially as they are not portable. I assume that some DB
connectors honour the local platform encoding already, but I'd still say that
UTF-8 is your best friend here.

Stefan