UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>
Chris Angelico
rosuav at gmail.com
Tue May 29 05:47:37 EDT 2018
On Tue, May 29, 2018 at 6:34 PM, Peter J. Holzer <hjp-python at hjp.at> wrote:
> On 2018-05-23 06:03:38 +0000, Steven D'Aprano wrote:
>> Mojibake is especially difficult to deal with when you are dealing with
>> short text snippets like file names or user names which can contain
>> arbitrary characters, where there is rarely any way to recognise the
>> "correct" string.
>
> For single file names or user names, sure. But if you have a list of
> them, there is still a high probability that many of them will contain
> recognizable words which can be used to deduce the (or a) correct
> encoding. (Unless it's from the Ministry of Silly Names).
Ohh... are you assuming that, in a list of file names, all of them use
the same encoding? Ah, yes, well, that WOULD make it easier, wouldn't
it. Sadly, not the case.
ChrisA
More information about the Python-list
mailing list