Getting Unicode decode error using lxml.iterparse
Peter Otten
__peter__ at web.de
Wed May 23 17:05:37 EDT 2018
digitig at gmail.com wrote:
> I'm trying to read my iTunes library in Python using iterparse. My current
> stub is:
> parser.add_argument('infile', nargs='?',
> type=argparse.FileType('r'), default=sys.stdin)
> I'm getting an error on one part of the XML:
>
>
> File "C:\Users\digit\Anaconda3\lib\encodings\cp1252.py", line 23, in
> decode
> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
>
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position
> 202: character maps to <undefined>
> I suspect the issue is that it's using cp1252.py, which I don't think is
> UTF-8 as specified in the XML prolog. Is this an iterparse problem, or am
> I using it wrongly?
The wrong encoding is specified implicitly in argparse.FileType("r"). Try
FileType("rb") or FileType("r", encoding="utf-8") instead (my personal
approach is to avoid FileType completely).
More information about the Python-list
mailing list