BZip2 decompression and parsing XML
stefan_ml at behnel.de
Fri Jun 6 15:10:19 CEST 2008
> xml.parsers.expat.ExpatError: not well-formed (invalid token): line
> 538676, column 17
Looks like your XML file is broken in line 538676.
> handler = open(args, "r")
This should read
handler = open(args, "rb")
Maybe that's your problem.
BTW, since you seem to parse a pretty big chunk of XML there, you should
consider using lxml. It's faster, more memory friendly, more feature-rich and
easier to use than minidom. It can also parse directly from a gzip-ed file or
a file-like object as provided by the bz2 module.
More information about the Python-list