UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>

Sun Jun 10 19:55:56 EDT 2018

On Sunday, 10 June 2018 17:29:59 UTC-4, Cameron Simpson  wrote:
> On 10Jun2018 13:04, bellcanadardp at gmail.com <bellcanadardp at gmail.com> wrote:
> >here is the full error once again
> >to summarize, my script works fine in python2
> >i get this error trying to run it in python3
> >plz see below after the error, my settings for python 2 and python 3
> >for me it seems i need to change some settings to 'utf-8'..either just in python 3, since thats where i am having issues or change the settings to 'utf-8' both in python 2 and 3....i would appreciate feedback b4 i do some trial and error
> >thanks for the consideration
> >tommy
> >
> >***********************************************
> >Traceback (most recent call last):
> >File "createIndex.py", line 132, in <module>
> >c.createindex()
> >File "creatIndex.py", line 102, in createIndex
> >pagedict=self.parseCollection()
> >File "createIndex.py", line 47, in parseCollection
> >for line in self.collFile:
> >File 
> >"C:\Users\Robert\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", 
> >line 23, in decode
> >return codecs.charmap_decode(input,self.errors,decoding_table[0]
> >UnicodeDecodeError: 'charmap'codec can't decode byte 0x9d in position 7414: character maps to <undefined>
> 
> Ok, this is more helpful. It says that the decoding error, which occurred in 
> ...\cp1252.py, was decoding lines from the file self.collFile.
> 
> What is that file? And how was it opened?
> 
> Also, your settings below may indeed be important.
> 
> >***************************************************
> >python 3 settings
> >import sys
> > import locale
> >locale.getpreferredencoding()
> >'cp1252'
> 
> The setting above is the default encoding used when you open a file in text 
> mode in Python 3, but you can override it.
> 
> In Python 3 this matters a lot, because Python 3 strings are Unicode. In Python 
> 2, strings are just bytes, and are not "decoded" (there is a whole separate 
> "unicode" type for that when it matters).
> 
> So in Python 3 the text file reader is decoding the text in the file according 
> to what it expects the encoding to be.
> 
> Find the place where self.collFile is opened. You can specify the decoding 
> method there by adding the "encoding=" parameter to the open() call. It is 
> defaulting to "cp1252" because that is what locale.getpreferredencoding() 
> returns, but presumably the actual file data are not encoded that way.
> 
> You can (a) find out what encoding _is_ used in the file and specify that or 
> (b) tell Python to be less picky. Choice (a) is better if it is feasible.
> 
> If you have to guess because you don't know the encoding, one possibility is 
> that collFile contains utf-8 or utf-16; of these 2, utf-8 seems more likely 
> given the 0x9d byte causing the trouble.  Try adding:
> 
>   encoding='utf-8'
> 
> to the open() call, eg:
> 
>   self.collFile = open('path-to-the-coll-file', encoding='utf-8')
> 
> at the appropriate place.
> 
> If that just produces a different decoding error, you have 2 choices: pick an 
> encoding where every byte is "valid", such as 'iso8859-1', or to tell the 
> decode to just cope with th errors by adding the errors="replace" or 
> "errors="ignore" or errors="namereplace" parameter to the open() call.
> 
> Both these choices have downsides.
> 
> There are several ISO8859 encodings, and they might all be wrong for your file, 
> leading to _incorrect_ text lines.
> 
> The errors="..." parameter also has downsides: you will also end up with 
> missing (errors="ignore") or incorrect (errors="replace" or 
> errors="namereplace") text, because the decoder has to do something with the 
> data: drop it or replace it with something wrong. The former loses data while 
> the latter puts in bad data, but at least it is visible if you inspect the data 
> later.
> 
> The full documentation for Python 3's open() call is here:
> 
>   https://docs.python.org/3/library/functions.html#open
> 
> where the various encoding= and errors= choices are described.
> 
> Cheers,
> Cameron Simpson <cs at cskk.id.au>

thank you for the reply
let me try these tips and suggestions and i will update here
thanxz alot
and thnxz also to all who post ..i appreciate it..
regards
tommy