unicode, bytes redux

Duncan Booth duncan.booth at invalid.invalid
Mon Sep 25 09:33:39 CEST 2006


willie <willie at jamots.com> wrote:

> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
> So that it's feasible to calculate the number
> of bytes that make up the unicode code points.

So what sort of output do you expect from this:

>>> a = '\xc9'.decode('latin1')
>>> b = '\xc3\x89'.decode('utf8')
>>> print (a+b).bytes()
???

And if you say that's an unfair question because you expected all the byte 
strings to be using the same encoding then there's no point storing it on 
every unicode object; you might as well store it once globally.



More information about the Python-list mailing list