[4suite] Re: [XML-SIG] [URGENT] Problem with accent char
Uche Ogbuji
uche.ogbuji@fourthought.com
Wed, 10 Jan 2001 13:23:47 -0700
Lars Marius Garshol wrote:
> | What can I do, not to have this conversion made ? I don't want the
> | parser to modify my content !!!!
>
> You can use xmlproc, you can convert back to latin1 yourself, or you
> can use Python 2.0, where you'd get Unicode strings.
Bah. Just to illustrate I prepped the following:
----------------------------------%------------------------------------
from xml.dom.ext.reader import Sax2
from xml.sax.sax2exts import make_parser
p = make_parser("xml.sax.drivers2.drv_xmlproc")
reader = Sax2.Reader(parser=p)
src = """<?xml version="1.0" encoding="iso-8859-1"?>
<Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
<Head>
<Name>GB-OTAN-santé</Name>
<DateReleased>20010110T105314Z</DateReleased>
<Source>AFP</Source>
</Head>
<NewsLines>
<HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
Londres</HeadLine>
<DateLine>LONDRES</DateLine>
</NewsLines>
</Xafp>
"""
doc = reader.fromString(src)
nodes = doc.getElementsByTagName('HeadLine')
print repr(nodes[0].firstChild.nodeValue)
----------------------------------%------------------------------------
But on the fromString I get
>>> doc = reader.fromString(src)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.0/site-packages/Ft/Lib/ReaderBase.py",
line 49, in fromString
rt = self.fromStream(stream, ownerDoc)
File
"/usr/local/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 270, in fromStream
self.parser.parse(stream)
File
"/usr/local/lib/python2.0/site-packages/_xmlplus/sax/drivers2/drv_xmlproc.py",
line 88, in parse
parser.parse_resource(source.getSystemId()) # FIXME: rest!
AttributeError: getSystemId
Looks as if drv_xmlproc is broken for Sax2.
However, Oliver should be OK since the following works.
----------------------------------%------------------------------------
from xml.dom.ext.reader import Sax
from xml.sax.saxexts import make_parser
p = make_parser("xml.sax.drivers.drv_xmlproc")
reader = Sax.Reader(parser=p)
src = """<?xml version="1.0" encoding="iso-8859-1"?>
<Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
<Head>
<Name>GB-OTAN-santé</Name>
<DateReleased>20010110T105314Z</DateReleased>
<Source>AFP</Source>
</Head>
<NewsLines>
<HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
Londres</HeadLine>
<DateLine>LONDRES</DateLine>
</NewsLines>
</Xafp>
"""
doc = reader.fromString(src)
nodes = doc.getElementsByTagName('HeadLine')
print repr(nodes[0].firstChild.nodeValue)
----------------------------------%------------------------------------
I get
>>> print repr(nodes[0].firstChild.nodeValue)
"La pol\351mique loin d'\352tre apais\351e par l'annonce de tests
\340\012Londres"
Which is what I think Oliver wants.
Lars, is the Sax2 problem something you've fixed in your CVS tree? Any
chance of a quick fix? (I know you're still swamped).
Thanks.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python