Most pythonic way to truncate unicode?
John Machin
sjmachin at lexicon.net
Fri May 29 00:09:53 EDT 2009
John Machin <sjmachin <at> lexicon.net> writes:
> Andrew Fong <FongAndrew <at> gmail.com> writes:
> Are
> > there any built-in ways to do something like this already? Or do I
> > just have to iterate over the unicode string?
>
> Converting each character to utf8 and checking the
> total number of bytes so far?
> Ooooh, sloooowwwwww!
>
Somewhat faster:
u8len = 0
for u in unicode_string:
if u <= u'\u007f':
u8len += 1
elif u <= u'\u07ff':
u8len += 2
elif u <= u'\uffff':
u8len += 3
else:
u8len += 4
Cheers,
John
More information about the Python-list
mailing list