Binary strings, unicode and encodings
peter at engcorp.com
Fri Jan 16 15:21:46 CET 2004
Laurent Therond wrote:
> So, your test revealed that StringIO converts to byte strings.
> Does that mean:
> - If the input string contains characters that cannot be encoded
> in ASCII, bencode_rec will fail?
Yes, if your default encoding is ASCII.
> Yet, if your locale specifies UTF-8 as the default encoding, it should
> not fail, right?
True, provided you are actually creating UTF-8 strings... just sticking
in a character that has the 8th bit set doesn't mean the string is UTF-8
> Hence, I conclude your test was made on a system that uses ASCII/ISO
> 8859-1 as its default encoding. Is that right?
Correct, Windows 98, sys.getdefaultencoding() returns 'ascii'.
> > > c) It depends on the system locale/it depends on what the site module
> > > specifies using setdefaultencoding(name)
> > Yes, as it always does if you are using Unicode but converting to byte strings
> > as it appears StringIO does.
> Umm...not sure here...I think StringIO must behave differently
> depending on your locale and depending on how you assigned the string.
It's always possible that StringIO takes locale into account in some
special way, but I suspect it does not. As for "how you assigned the string"
I'm not sure I understand what that might mean. How many ways do you know
to assign a string in Python?
More information about the Python-list