how to get size of unicode string/string in bytes ?

Diez B. Roggisch deets at nospam.web.de
Tue Aug 1 09:34:59 EDT 2006


> So then the easiest thing to do is: take the maximum length of a unicode
> string you could possibly want to store, multiply it by 4 and make that
> the length of the DB field.
 
> However, I'm pretty convinced it is a bad idea to store Python unicode
> strings directly in a DB, especially as they are not portable. I assume
> that some DB connectors honour the local platform encoding already, but
> I'd still say that UTF-8 is your best friend here.

It was your assumption that the OP wanted to store the "real"
unicode-strings. A moot point anyway, at it is afaik not possible to get
their contents in byte form (except from a C-extension).

And assuming 4 bytes per character is a bit dissipative I'd say - especially
when you have some > 80% ascii-subset in your text as european and american
languages have.

The solution was given before: chose an encoding (utf-8 is certainly the
most favorable one), and compute the byte-string length.

Diez



More information about the Python-list mailing list