UTF-8 Encoding Error

subhabangalore at gmail.com subhabangalore at gmail.com
Fri Dec 30 01:11:56 EST 2016


On Friday, December 30, 2016 at 7:16:25 AM UTC+5:30, Steve D'Aprano wrote:
> On Sun, 25 Dec 2016 04:50 pm, Grady Martin wrote:
> 
> > On 2016年12月22日 22時38分, wrote:
> >>I am getting the error:
> >>UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15:
> >>invalid start byte
> > 
> > The following is a reflex of mine, whenever I encounter Python 2 Unicode
> > errors:
> > 
> > import sys
> > reload(sys)
> > sys.setdefaultencoding('utf8')
> 
> 
> This is a BAD idea, and doing it by "reflex" without very careful thought is
> just cargo-cult programming. You should not thoughtlessly change the
> default encoding without knowing what you are doing -- and if you know what
> you are doing, you won't change it at all.
> 
> The Python interpreter *intentionally* removes setdefaultencoding at startup
> for a reason. Changing the default encoding can break the interpreter, and
> it is NEVER what you actually need. If you think you want it because it
> fixes "Unicode errors", all you are doing is covering up bugs in your code.
> 
> Here is some background on why setdefaultencoding exists, and why it is
> dangerous:
> 
> https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/
> 
> If you have set the Python 2 default encoding to anything but ASCII, you are
> now running a broken system with subtle bugs, including in data structures
> as fundamental as dicts.
> 
> The standard behaviour:
> 
> py> d = {u'café': 1}
> py> for key in d:
> ...     print key == 'caf\xc3\xa9'
> ...
> False
> 
> 
> As we should expect: the key in the dict, u'café', is *not* the same as the
> byte-string 'caf\xc3\xa9'. But watch how we can break dictionaries by
> changing the default encoding:
> 
> py> reload(sys)
> <module 'sys' (built-in)>
> py> sys.setdefaultencoding('utf-8')  # don't do this
> py> for key in d:
> ...     print key == 'caf\xc3\xa9'
> ...
> True
> 
> 
> So Python now thinks that 'caf\xc3\xa9' is a key. Or does it?
> 
> py> d['caf\xc3\xa9']
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> KeyError: 'caf\xc3\xa9'
> 
> By changing the default encoding, we now have something which is both a key
> and not a key of the dict at the same time.
> 
> 
> 
> > A relevant Stack Exchange thread awaits you here:
> > 
> > http://stackoverflow.com/a/21190382/2230956
> 
> And that's why I don't trust StackOverflow. It's not bad for answering
> simple questions, but once the question becomes more complex the quality of
> accepted answers goes down the toilet. The highest voted answer is *wrong*
> and *dangerous*.
> 
> And then there's this comment:
> 
>     Until this moment I was forced to include "# -- coding: utf-8 --" at 
>     the begining of each document. This is way much easier and works as
>     charm
> 
> I have no words for how wrong that is. And this comment:
> 
>     ty, this worked for my problem with python throwing UnicodeDecodeError
>     on var = u"""vary large string"""
> 
> No it did not. There is no possible way that Python will throw that
> exception on assignment to a Unicode string literal.
> 
> It is posts like this that demonstrate how untrustworthy StackOverflow can
> be.
> 
> 
> 
> -- 
> Steve
> “Cheer up,” they said, “things could be worse.” So I cheered up, and sure
> enough, things got worse.

Thanks for your detailed comment. The code is going all fine sometimes, and sometimes giving out errors. If any one may see how I am doing the problem.


More information about the Python-list mailing list