UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>
bellcanadardp at gmail.com
bellcanadardp at gmail.com
Sun Jun 3 19:36:12 EDT 2018
On Tuesday, 22 May 2018 17:23:55 UTC-4, Peter J. Holzer wrote:
> On 2018-05-20 15:43:54 +0200, Karsten Hilbert wrote:
> > On Sun, May 20, 2018 at 04:59:12AM -0700, bellcanadardp at gmail.com wrote:
> >
> > > On Saturday, 19 May 2018 19:48:20 UTC-4, Skip Montanaro wrote:
> > > > As Chris indicated, you'll have to figure out the correct encoding. You
> > > > might want to check out the chardet module (available on PyPI, I believe)
> > > > and see if it can come up with a better guess. I imagine there are other
> > > > encoding guessers out there. That's just one I'm familiar with.
> > >
> > > thank you for the reply, but how exactly am i supposed to find oout what is the correct encodeing??
> >
> > One CAN NOT.
> >
> > The best you can do is to go ask the canonical source of the
> > file what encoding the file is _supposed_ to be in.
>
> I disagree on both counts.
>
> 1) For any given file it is almost always possible to find the correct
> encoding (or *a* correct encoding, as there may be more than one).
>
> This may require domain-specific knowledge (e.g. it may be necessary
> to recognize the human language and know at least some distinctive
> words, or to know some special symbols likely to be used in a data
> file), and it almost always takes a bit of detective work and trial
> and error. But I don't think I ever encountered a file where I
> couldn't figure out the encoding.
>
> (If you have several files in the same encoding, it may not be
> possible to figure out the encoding from a subset of them. For
> example, the files may all be in ISO-8859-2, but the subset you have
> contains only characters <= 0x7F. But if you have several files, they
> may not all be the same encoding, either).
>
> 2) The canonical source of the file may not know. This is quite frequent
> when the source is some non-technical person. Then you get answers
> like "it's ASCII" (although the file contains umlauts, which aren't
> in ASCII) or "it's ANSI" (which isn't an encoding, although Windows
> pretends it is). Or they may not be aware that the file is converted
> somewhere in the pipeline, to that the file they generated isn't
> actually the file you received. So ask (or check the docs), but
> verify!
>
> hp
>
> --
> _ | Peter J. Holzer | we build much bigger, better disasters now
> |_|_) | | because we have much more sophisticated
> | | | hjp at hjp.at | management tools.
> __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
hello peter ...how exactly would i solve this issue .....i have a script that works in python 2 but not pytho3..i did 2 to 3.py ...but i still get the errro...character undefieed..unicode decode error cant decode byte 1x09 in line 7414 from cp 1252..like would you have a sraright solution answer??..i cant get a straight answer..it was ported from ansi to python...so its utf-8 as far asi can see
More information about the Python-list
mailing list