unicode, bytes redux
Leif K-Brooks
eurleif at ecritters.biz
Mon Sep 25 04:11:52 EDT 2006
Paul Rubin wrote:
> Duncan Booth explains why that doesn't work. But I don't see any big
> problem with a byte count function that lets you specify an encoding:
>
> u = buf.decode('UTF-8')
> # ... later ...
> u.bytes('UTF-8') -> 3
> u.bytes('UCS-4') -> 4
>
> That avoids creating a new encoded string in memory, and for some
> encodings, avoids having to scan the unicode string to add up the
> lengths.
It requires a fairly large change to code and API for a relatively
uncommon problem. How often do you need to know how many bytes an
encoded Unicode string takes up without needing the encoded string itself?
More information about the Python-list
mailing list