[XML-SIG] parsing chinese characters

Fabian López fabian at syameses.com
Tue Oct 23 18:42:26 CEST 2007

Thanks  Stefan,
the problem was in print statement, so I decided to catch and ignore that
exception error because chinese characters don't interest to me. The
document wasn't broken. Finally it works well.Parser worked well.
Thanks for your support!

2007/10/23, Stefan Behnel <stefan_ml at behnel.de>:
> Fabian L¨®pez wrote:
> > I am parsing an XML file that includes chineses characters, like ^
> > ÔuÔuà¢à¢²ÅÊDZw.¼ššìéLï³²ÅÊÇÛ or ¥Ø¥¢¥¢¥¤¥í¥ó... The problem is that I get an error like:
> > UnicodeEncodeerror:'charmap' codec can't encode characters in
> position....
> > The thing is that I would like to ignore it and parse all the characters
> > less these ones. So, could anyone help me? I suppose that I can catch an
> > exception that ignores it or maybe use any function that detects this
> > chinese characters and after that ignore them.
> If the parser can't handle the characters here, it's because the document
> is
> broken and does not declare the correct encoding.
> From your last post, I assume you're using lxml to do this (it's always
> helpful to state what software you use when you describe a problem with
> it).
> Since 2.0alpha3(?), you can override the encoding of the parsed file with
> the
> "encoding" keyword that you can pass to the XMLParser class. So, for
> example,
> you can try parsing the document as usual and if that fails, try parsing
> it
> with a different parser that is configured for a specific encoding
> override.
> Or you can determine the encoding based on some external source (like what
> the
> HTTP protocol tells you), and then use an override parser right away, or
> use
> that information as the first fallback.
> Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20071023/420349c4/attachment.htm 

More information about the XML-SIG mailing list