UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>
Peter J. Holzer
hjp-python at hjp.at
Tue May 22 17:23:37 EDT 2018
On 2018-05-20 15:43:54 +0200, Karsten Hilbert wrote:
> On Sun, May 20, 2018 at 04:59:12AM -0700, bellcanadardp at gmail.com wrote:
>
> > On Saturday, 19 May 2018 19:48:20 UTC-4, Skip Montanaro wrote:
> > > As Chris indicated, you'll have to figure out the correct encoding. You
> > > might want to check out the chardet module (available on PyPI, I believe)
> > > and see if it can come up with a better guess. I imagine there are other
> > > encoding guessers out there. That's just one I'm familiar with.
> >
> > thank you for the reply, but how exactly am i supposed to find oout what is the correct encodeing??
>
> One CAN NOT.
>
> The best you can do is to go ask the canonical source of the
> file what encoding the file is _supposed_ to be in.
I disagree on both counts.
1) For any given file it is almost always possible to find the correct
encoding (or *a* correct encoding, as there may be more than one).
This may require domain-specific knowledge (e.g. it may be necessary
to recognize the human language and know at least some distinctive
words, or to know some special symbols likely to be used in a data
file), and it almost always takes a bit of detective work and trial
and error. But I don't think I ever encountered a file where I
couldn't figure out the encoding.
(If you have several files in the same encoding, it may not be
possible to figure out the encoding from a subset of them. For
example, the files may all be in ISO-8859-2, but the subset you have
contains only characters <= 0x7F. But if you have several files, they
may not all be the same encoding, either).
2) The canonical source of the file may not know. This is quite frequent
when the source is some non-technical person. Then you get answers
like "it's ASCII" (although the file contains umlauts, which aren't
in ASCII) or "it's ANSI" (which isn't an encoding, although Windows
pretends it is). Or they may not be aware that the file is converted
somewhere in the pipeline, to that the file they generated isn't
actually the file you received. So ask (or check the docs), but
verify!
hp
--
_ | Peter J. Holzer | we build much bigger, better disasters now
|_|_) | | because we have much more sophisticated
| | | hjp at hjp.at | management tools.
__/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20180522/22dd0b5b/attachment.sig>
More information about the Python-list
mailing list