UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>
pjmclenon at gmail.com
pjmclenon at gmail.com
Sun Oct 21 14:33:03 EDT 2018
On Saturday, October 20, 2018 at 1:23:50 PM UTC-4, Terry Reedy wrote:
> On 10/20/2018 8:24 AM, pjmclenon at gmail.com wrote:
> > On Saturday, October 13, 2018 at 7:24:14 PM UTC-4, MRAB wrote:
>
> > i have a sort of decode error
> > UnicodeDecodeError; 'utf-8' can't decode byte 0xb0 in position 83064: invalid start byte
> > *****************
> > and it seems to refer to my code line:
> > ***********
> > data = f.read()
> > ***************
> > which is part of this block of code
> > ********************
> > # Read content of files
> > for path in files:
> > with open(join("docs", path), encoding="utf-8") as f:
> > #with open(join("docs", path)) as f:
> > data = f.read()
> > process_data(data)
> > ***********************************************
> >
> > would the solution fix be this?
> > **********************
> > data = f.read(), decoding = "utf-8" #OR
> > data = f.read(), decoding = "ascii" # is this the right fix or previous or both wrong??
>
> Both statements are invalid syntax. The encoding is set in the open
> statement.
>
> What you need to find out: is '0xb0' a one-byte error or is 'utf-8' the
> wrong encoding? Things I might do:
>
> 1. Change the encoding in open() to 'ascii' and see if the exception
> message still refers to position 83064 or if there is a non-ascii
> character earlier in the file. The latter would mean that there is at
> least one earlier non-ascii sequence that was decoded as uft-8. This
> would suggest that 'utf-8' might be correct and that the '0xb0' byte is
> an error.
>
> 2. In the latter case, add "errors='handler'", where 'handler' is
> something other than the default 'strict'. Look in the doc or see
> help(open) for alternatives.
>
> 3. In open(), replace "encoding='utf-8'" with "mode='rb'" so that
> f.read() creates data as bytes instead of a text string. Then print,
> say, data[83000:83200] to see the context of the non-ascii byte.
>
> 4. Change to encoding in open() to 'latin-1'. The file will then be
> read as text without error, even if latin-1 is the wrong encoding.
>
>
>
> --
> Terry Jan Reedy
hello terry
just want to add
that i tried also setting in notepad ++ encoding to utf-8 from ansi and then i encoded utf-8 in my file but i got same error
then i tried encoding ascii in my file and it worked
so encdoong ascii and latin-1 work
not sure why utf-8 gives an error when thats the most wide all caracters inclusive right?/
thxz
jessica
More information about the Python-list
mailing list