Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints)
Fredrik Lundh replied to himself in c.l.py:
as far as I can tell, it's supposed to be a feature.
if you mix 8-bit strings with unicode strings, python 1.6a2 attempts to interpret the 8-bit string as an utf-8 encoded unicode string.
but yes, I also think it's a bug. but this far, my attempts to get someone else to fix it has failed. might have to do it myself... ;-)
postscript: the powers-that-be has decided that this is not a bug. if you thought that strings were just sequences of characters, just as in Perl and Tcl, you're in for one big surprise in Python 1.6...
I just read the last few posts of the powers-that-be-list on this subject (Thanks to Christian for pointing out the archives in c.l.py ;-), and I must say I completely agree with Fredrik. The current situation sucks. A string should always be a sequence of characters. A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". An 8-bit string should never be assumed to be utf-8 because of that distinction. (The default encoding for the builtin unicode() function may be another story.) Just
participants (1)
-
Just van Rossum