UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>

Sun Oct 21 14:33:03 EDT 2018

On Saturday, October 20, 2018 at 1:23:50 PM UTC-4, Terry Reedy wrote:
> On 10/20/2018 8:24 AM, pjmclenon at gmail.com wrote:
> > On Saturday, October 13, 2018 at 7:24:14 PM UTC-4, MRAB wrote:
> 
> > i have a sort of decode error
> > UnicodeDecodeError; 'utf-8' can't decode byte 0xb0 in position 83064: invalid start byte
> > *****************
> > and it seems to refer to my code line:
> > ***********
> > data = f.read()
> > ***************
> > which is part of this block of code
> > ********************
> > # Read content of files
> >      for path in files:
> >          with open(join("docs", path), encoding="utf-8") as f:
> >          #with open(join("docs", path)) as f:
> >              data = f.read()
> >              process_data(data)
> > ***********************************************
> > 
> > would the solution fix be this?
> > **********************
> > data = f.read(), decoding = "utf-8"  #OR
> > data = f.read(), decoding = "ascii" # is this the right fix or previous or both wrong??
> 
> Both statements are invalid syntax.  The encoding is set in the open 
> statement.
> 
> What you need to find out: is '0xb0' a one-byte error or is 'utf-8' the 
> wrong encoding?  Things I might do:
> 
> 1. Change the encoding in open() to 'ascii' and see if the exception 
> message still refers to position 83064 or if there is a non-ascii 
> character earlier in the file.  The latter would mean that there is at 
> least one earlier non-ascii sequence that was decoded as uft-8.  This 
> would suggest that 'utf-8' might be correct and that the '0xb0' byte is 
> an error.
> 
> 2. In the latter case, add "errors='handler'", where 'handler' is 
> something other than the default 'strict'.  Look in the doc or see 
> help(open) for alternatives.
> 
> 3. In open(), replace "encoding='utf-8'" with "mode='rb'" so that 
> f.read() creates data as bytes instead of a text string.  Then print, 
> say, data[83000:83200] to see the context of the non-ascii byte.
> 
> 4. Change to encoding in open() to 'latin-1'.  The file will then be 
> read as text without error, even if latin-1 is the wrong encoding.
> 
> 
> 
> -- 
> Terry Jan Reedy

hello terry
just want to add
that i tried also setting in notepad ++ encoding to utf-8 from ansi and then i encoded utf-8 in my file but i got same error

then i tried encoding ascii in my file and it worked
so encdoong ascii and latin-1 work
not sure why utf-8 gives an error when thats the most wide all caracters inclusive right?/

thxz
jessica