[XML-SIG] parsing chinese characters
fabian at syameses.com
Tue Oct 23 18:42:26 CEST 2007
the problem was in print statement, so I decided to catch and ignore that
exception error because chinese characters don't interest to me. The
document wasn't broken. Finally it works well.Parser worked well.
Thanks for your support!
2007/10/23, Stefan Behnel <stefan_ml at behnel.de>:
> Fabian L¨®pez wrote:
> > I am parsing an XML file that includes chineses characters, like ^
> > ÔuÔuà¢à¢²ÅÊÇ±w.¼ìéLï³²ÅÊÇÛ or ¥Ø¥¢¥¢¥¤¥í¥ó... The problem is that I get an error like:
> > UnicodeEncodeerror:'charmap' codec can't encode characters in
> > The thing is that I would like to ignore it and parse all the characters
> > less these ones. So, could anyone help me? I suppose that I can catch an
> > exception that ignores it or maybe use any function that detects this
> > chinese characters and after that ignore them.
> If the parser can't handle the characters here, it's because the document
> broken and does not declare the correct encoding.
> From your last post, I assume you're using lxml to do this (it's always
> helpful to state what software you use when you describe a problem with
> Since 2.0alpha3(?), you can override the encoding of the parsed file with
> "encoding" keyword that you can pass to the XMLParser class. So, for
> you can try parsing the document as usual and if that fails, try parsing
> with a different parser that is configured for a specific encoding
> Or you can determine the encoding based on some external source (like what
> HTTP protocol tells you), and then use an override parser right away, or
> that information as the first fallback.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the XML-SIG