UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>
bellcanadardp at gmail.com
bellcanadardp at gmail.com
Sun Jun 10 19:55:56 EDT 2018
On Sunday, 10 June 2018 17:29:59 UTC-4, Cameron Simpson wrote:
> On 10Jun2018 13:04, bellcanadardp at gmail.com <bellcanadardp at gmail.com> wrote:
> >here is the full error once again
> >to summarize, my script works fine in python2
> >i get this error trying to run it in python3
> >plz see below after the error, my settings for python 2 and python 3
> >for me it seems i need to change some settings to 'utf-8'..either just in python 3, since thats where i am having issues or change the settings to 'utf-8' both in python 2 and 3....i would appreciate feedback b4 i do some trial and error
> >thanks for the consideration
> >tommy
> >
> >***********************************************
> >Traceback (most recent call last):
> >File "createIndex.py", line 132, in <module>
> >c.createindex()
> >File "creatIndex.py", line 102, in createIndex
> >pagedict=self.parseCollection()
> >File "createIndex.py", line 47, in parseCollection
> >for line in self.collFile:
> >File
> >"C:\Users\Robert\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py",
> >line 23, in decode
> >return codecs.charmap_decode(input,self.errors,decoding_table[0]
> >UnicodeDecodeError: 'charmap'codec can't decode byte 0x9d in position 7414: character maps to <undefined>
>
> Ok, this is more helpful. It says that the decoding error, which occurred in
> ...\cp1252.py, was decoding lines from the file self.collFile.
>
> What is that file? And how was it opened?
>
> Also, your settings below may indeed be important.
>
> >***************************************************
> >python 3 settings
> >import sys
> > import locale
> >locale.getpreferredencoding()
> >'cp1252'
>
> The setting above is the default encoding used when you open a file in text
> mode in Python 3, but you can override it.
>
> In Python 3 this matters a lot, because Python 3 strings are Unicode. In Python
> 2, strings are just bytes, and are not "decoded" (there is a whole separate
> "unicode" type for that when it matters).
>
> So in Python 3 the text file reader is decoding the text in the file according
> to what it expects the encoding to be.
>
> Find the place where self.collFile is opened. You can specify the decoding
> method there by adding the "encoding=" parameter to the open() call. It is
> defaulting to "cp1252" because that is what locale.getpreferredencoding()
> returns, but presumably the actual file data are not encoded that way.
>
> You can (a) find out what encoding _is_ used in the file and specify that or
> (b) tell Python to be less picky. Choice (a) is better if it is feasible.
>
> If you have to guess because you don't know the encoding, one possibility is
> that collFile contains utf-8 or utf-16; of these 2, utf-8 seems more likely
> given the 0x9d byte causing the trouble. Try adding:
>
> encoding='utf-8'
>
> to the open() call, eg:
>
> self.collFile = open('path-to-the-coll-file', encoding='utf-8')
>
> at the appropriate place.
>
> If that just produces a different decoding error, you have 2 choices: pick an
> encoding where every byte is "valid", such as 'iso8859-1', or to tell the
> decode to just cope with th errors by adding the errors="replace" or
> "errors="ignore" or errors="namereplace" parameter to the open() call.
>
> Both these choices have downsides.
>
> There are several ISO8859 encodings, and they might all be wrong for your file,
> leading to _incorrect_ text lines.
>
> The errors="..." parameter also has downsides: you will also end up with
> missing (errors="ignore") or incorrect (errors="replace" or
> errors="namereplace") text, because the decoder has to do something with the
> data: drop it or replace it with something wrong. The former loses data while
> the latter puts in bad data, but at least it is visible if you inspect the data
> later.
>
> The full documentation for Python 3's open() call is here:
>
> https://docs.python.org/3/library/functions.html#open
>
> where the various encoding= and errors= choices are described.
>
> Cheers,
> Cameron Simpson <cs at cskk.id.au>
thank you for the reply
let me try these tips and suggestions and i will update here
thanxz alot
and thnxz also to all who post ..i appreciate it..
regards
tommy
More information about the Python-list
mailing list