unicode, bytes redux
Duncan Booth
duncan.booth at invalid.invalid
Mon Sep 25 03:33:39 EDT 2006
willie <willie at jamots.com> wrote:
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
> So that it's feasible to calculate the number
> of bytes that make up the unicode code points.
So what sort of output do you expect from this:
>>> a = '\xc9'.decode('latin1')
>>> b = '\xc3\x89'.decode('utf8')
>>> print (a+b).bytes()
???
And if you say that's an unfair question because you expected all the byte
strings to be using the same encoding then there's no point storing it on
every unicode object; you might as well store it once globally.
More information about the Python-list
mailing list