unicode, bytes redux
eurleif at ecritters.biz
Mon Sep 25 10:11:52 CEST 2006
Paul Rubin wrote:
> Duncan Booth explains why that doesn't work. But I don't see any big
> problem with a byte count function that lets you specify an encoding:
> u = buf.decode('UTF-8')
> # ... later ...
> u.bytes('UTF-8') -> 3
> u.bytes('UCS-4') -> 4
> That avoids creating a new encoded string in memory, and for some
> encodings, avoids having to scan the unicode string to add up the
It requires a fairly large change to code and API for a relatively
uncommon problem. How often do you need to know how many bytes an
encoded Unicode string takes up without needing the encoded string itself?
More information about the Python-list