UTF-8 Encoding Error
subhabangalore at gmail.com
subhabangalore at gmail.com
Fri Dec 30 01:11:56 EST 2016
On Friday, December 30, 2016 at 7:16:25 AM UTC+5:30, Steve D'Aprano wrote:
> On Sun, 25 Dec 2016 04:50 pm, Grady Martin wrote:
>
> > On 2016年12月22日 22時38分, wrote:
> >>I am getting the error:
> >>UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15:
> >>invalid start byte
> >
> > The following is a reflex of mine, whenever I encounter Python 2 Unicode
> > errors:
> >
> > import sys
> > reload(sys)
> > sys.setdefaultencoding('utf8')
>
>
> This is a BAD idea, and doing it by "reflex" without very careful thought is
> just cargo-cult programming. You should not thoughtlessly change the
> default encoding without knowing what you are doing -- and if you know what
> you are doing, you won't change it at all.
>
> The Python interpreter *intentionally* removes setdefaultencoding at startup
> for a reason. Changing the default encoding can break the interpreter, and
> it is NEVER what you actually need. If you think you want it because it
> fixes "Unicode errors", all you are doing is covering up bugs in your code.
>
> Here is some background on why setdefaultencoding exists, and why it is
> dangerous:
>
> https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/
>
> If you have set the Python 2 default encoding to anything but ASCII, you are
> now running a broken system with subtle bugs, including in data structures
> as fundamental as dicts.
>
> The standard behaviour:
>
> py> d = {u'café': 1}
> py> for key in d:
> ... print key == 'caf\xc3\xa9'
> ...
> False
>
>
> As we should expect: the key in the dict, u'café', is *not* the same as the
> byte-string 'caf\xc3\xa9'. But watch how we can break dictionaries by
> changing the default encoding:
>
> py> reload(sys)
> <module 'sys' (built-in)>
> py> sys.setdefaultencoding('utf-8') # don't do this
> py> for key in d:
> ... print key == 'caf\xc3\xa9'
> ...
> True
>
>
> So Python now thinks that 'caf\xc3\xa9' is a key. Or does it?
>
> py> d['caf\xc3\xa9']
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> KeyError: 'caf\xc3\xa9'
>
> By changing the default encoding, we now have something which is both a key
> and not a key of the dict at the same time.
>
>
>
> > A relevant Stack Exchange thread awaits you here:
> >
> > http://stackoverflow.com/a/21190382/2230956
>
> And that's why I don't trust StackOverflow. It's not bad for answering
> simple questions, but once the question becomes more complex the quality of
> accepted answers goes down the toilet. The highest voted answer is *wrong*
> and *dangerous*.
>
> And then there's this comment:
>
> Until this moment I was forced to include "# -- coding: utf-8 --" at
> the begining of each document. This is way much easier and works as
> charm
>
> I have no words for how wrong that is. And this comment:
>
> ty, this worked for my problem with python throwing UnicodeDecodeError
> on var = u"""vary large string"""
>
> No it did not. There is no possible way that Python will throw that
> exception on assignment to a Unicode string literal.
>
> It is posts like this that demonstrate how untrustworthy StackOverflow can
> be.
>
>
>
> --
> Steve
> “Cheer up,” they said, “things could be worse.” So I cheered up, and sure
> enough, things got worse.
Thanks for your detailed comment. The code is going all fine sometimes, and sometimes giving out errors. If any one may see how I am doing the problem.
More information about the Python-list
mailing list