[XML-SIG] parsing chinese characters

Fabian López fabian at syameses.com
Tue Oct 23 18:42:26 CEST 2007


Thanks  Stefan,
the problem was in print statement, so I decided to catch and ignore that
exception error because chinese characters don't interest to me. The
document wasn't broken. Finally it works well.Parser worked well.
Thanks for your support!
Fabian

2007/10/23, Stefan Behnel <stefan_ml at behnel.de>:
>
>
> Fabian L¨®pez wrote:
> > I am parsing an XML file that includes chineses characters, like ^
> > ÔuÔuà¢à¢²ÅÊDZw.¼ššìéLï³²ÅÊÇÛ or ¥Ø¥¢¥¢¥¤¥í¥ó... The problem is that I get an error like:
> > UnicodeEncodeerror:'charmap' codec can't encode characters in
> position....
> > The thing is that I would like to ignore it and parse all the characters
> > less these ones. So, could anyone help me? I suppose that I can catch an
> > exception that ignores it or maybe use any function that detects this
> > chinese characters and after that ignore them.
>
> If the parser can't handle the characters here, it's because the document
> is
> broken and does not declare the correct encoding.
>
> From your last post, I assume you're using lxml to do this (it's always
> helpful to state what software you use when you describe a problem with
> it).
> Since 2.0alpha3(?), you can override the encoding of the parsed file with
> the
> "encoding" keyword that you can pass to the XMLParser class. So, for
> example,
> you can try parsing the document as usual and if that fails, try parsing
> it
> with a different parser that is configured for a specific encoding
> override.
> Or you can determine the encoding based on some external source (like what
> the
> HTTP protocol tells you), and then use an override parser right away, or
> use
> that information as the first fallback.
>
> Stefan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20071023/420349c4/attachment.htm 


More information about the XML-SIG mailing list