[4suite] Re: [XML-SIG] [URGENT] Problem with accent char

Uche Ogbuji uche.ogbuji@fourthought.com
Wed, 10 Jan 2001 13:23:47 -0700


Lars Marius Garshol wrote:

> | What can I do, not to have this conversion made ? I don't want the
> | parser to modify my content !!!!
> 
> You can use xmlproc, you can convert back to latin1 yourself, or you
> can use Python 2.0, where you'd get Unicode strings.

Bah.  Just to illustrate I prepped the following:

----------------------------------%------------------------------------

from xml.dom.ext.reader import Sax2
from xml.sax.sax2exts import make_parser
p = make_parser("xml.sax.drivers2.drv_xmlproc")
reader = Sax2.Reader(parser=p)

src = """<?xml version="1.0" encoding="iso-8859-1"?>
<Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
  <Head>
    <Name>GB-OTAN-santé</Name>
    <DateReleased>20010110T105314Z</DateReleased>
    <Source>AFP</Source>
  </Head>
  <NewsLines>
    <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
Londres</HeadLine>
    <DateLine>LONDRES</DateLine>
  </NewsLines>
</Xafp>
"""

doc = reader.fromString(src)
nodes = doc.getElementsByTagName('HeadLine')
print repr(nodes[0].firstChild.nodeValue)

----------------------------------%------------------------------------

But on the fromString I get

>>> doc = reader.fromString(src)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python2.0/site-packages/Ft/Lib/ReaderBase.py",
line 49, in fromString
    rt = self.fromStream(stream, ownerDoc)
  File
"/usr/local/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 270, in fromStream
    self.parser.parse(stream)
  File
"/usr/local/lib/python2.0/site-packages/_xmlplus/sax/drivers2/drv_xmlproc.py",
line 88, in parse
    parser.parse_resource(source.getSystemId()) # FIXME: rest!
AttributeError: getSystemId


Looks as if drv_xmlproc is broken for Sax2.

However, Oliver should be OK since the following works.

----------------------------------%------------------------------------

from xml.dom.ext.reader import Sax
from xml.sax.saxexts import make_parser
p = make_parser("xml.sax.drivers.drv_xmlproc")
reader = Sax.Reader(parser=p)

src = """<?xml version="1.0" encoding="iso-8859-1"?>
<Xafp type="multimedia" uno="afp_wbs_doc_010110105314.g5kw25ak">
  <Head>
    <Name>GB-OTAN-santé</Name>
    <DateReleased>20010110T105314Z</DateReleased>
    <Source>AFP</Source>
  </Head>
  <NewsLines>
    <HeadLine>La polémique loin d'être apaisée par l'annonce de tests à
Londres</HeadLine>
    <DateLine>LONDRES</DateLine>
  </NewsLines>
</Xafp>
"""

doc = reader.fromString(src)
nodes = doc.getElementsByTagName('HeadLine')
print repr(nodes[0].firstChild.nodeValue)
----------------------------------%------------------------------------

I get

>>> print repr(nodes[0].firstChild.nodeValue)
"La pol\351mique loin d'\352tre apais\351e par l'annonce de tests
\340\012Londres"

Which is what I think Oliver wants.

Lars,  is the Sax2 problem something you've fixed in your CVS tree?  Any
chance of a quick fix?  (I know you're still swamped).

Thanks.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python