How to force SAX parser to ignore encoding problems

Stefan Behnel stefan_ml at
Fri Aug 7 08:40:35 CEST 2009

Łukasz wrote:
> I have a problem with my XML parser (created with libraries from
> xml.sax package). When parser finds a invalid character (in CDATA
> section) for example �, throws an exception SAXParseException.
> Is there any way to just ignore this kind of problem. Maybe there is a
> way to set up parser in less strict mode?
> I know that I can catch this exception and determine if this is this
> kind of problem and then ignore this, but I am asking about any global
> setting.

The parser from libxml2 that lxml provides has a recovery option, i.e. it
can keep parsing regardless of errors and will drop the broken content.

However, it is *always* better to fix the input, if you get any hand on it.
Broken XML is *not* XML at all. If you can't fix the source, you can never
be sure that the data you received is in any way complete or even usable.


