![](https://secure.gravatar.com/avatar/56f108518d7ee2544412cc80978e3182.jpg?s=120&d=mm&r=g)
kyrian (List) wrote:
The gist of the problem seems to be that you need to treat the strings as utf-8 or iso-8859-1 encoded 'objects' rather than standard ASCII string types within the code, and I don't know for sure how to do that.
And you have to know which because there are iso-8859-1 encoded characters which aren't valid utf-8 codes and there are utf-8 encoded characters which get garbled if decoded as iso-8859-1.
Thus, code like
try:
unicode(value, "ascii")
except UnicodeError:
value = unicode(value, "utf-8")
else:
# value was valid ASCII data
pass
which I think is no different from simply
value = unicode(value, "utf-8")
since if value is ascii to begin with, calling it utf-8 is OK,
doesn't work if value is actually iso-8859-1 encoded and contains bytes which aren't valid utf-8 or which decode differently from utf-8.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan