[XML-SIG] parsing chinese characters
Luis Miguel Morillas
morillas at gmail.com
Mon Oct 22 22:52:16 CEST 2007
You must add the correct encoding info in the xml source file.
Ex. using amara:
chinese.xml
<?xml version="1.0" encoding="utf-8"?>
<test>ÔuÔuà¢à¢²ÅÊDZw.¼ššìéLï³²ÅÊÇÛ</test>
>>> import amara
>>> doc = amara.parse('chinese.xml')
>>> print unicode(doc.test)
>>> ÔuÔuà¢à¢²ÅÊDZw.¼ššìéLï³²ÅÊÇÛ
No problem with big5
>>> doc = amara.parse('http://xml.ascc.net/test/wfall/big5/test13.xml')
>>>
2007/10/22, Fabian L¨®pez <fabian at syameses.com>:
> Hi,
> I am parsing an XML file that includes chineses characters, like
> ^ÔuÔuà¢à¢²ÅÊDZw.¼ššìéLï³²ÅÊÇÛ or ¥Ø¥¢¥¢¥¤¥í¥ó... The problem is that I get an error like:
> UnicodeEncodeerror:'charmap' codec can't encode characters in position....
> The thing is that I would like to ignore it and parse all the characters
> less these ones. So, could anyone help me? I suppose that I can catch an
> exception that ignores it or maybe use any function that detects this
> chinese characters and after that ignore them.
>
> Thanks!!
> Fabian
>
> _______________________________________________
> XML-SIG maillist - XML-SIG at python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>
>
--
Saludos,
--
Luis Miguel
More information about the XML-SIG
mailing list