From chrish at cryptocard.com Mon Dec 1 13:51:07 2003
From: chrish at cryptocard.com (Chris Herborth)
Date: Mon Dec 1 13:49:38 2003
Subject: [XML-SIG] Provide your own SAX parser to the DOM?
Message-ID: <3FCB8D9B.9080900@cryptocard.com>
I've got PyXML 0.8.3 installed here, and I'm generating the DOM for some
documents thusly:
reader = xml.dom.ext.reader.Sax2.Reader()
# snipped: setting up an external entity resolver and error handler
dom = reader.fromStream( file( an_xml_filename ) )
Is it possible to use a different SAX parser and still get the advantages of
using the PyXML DOM goodness? I'm thinking ahead to when I want to use a
validating parser, although the xml.dom.ext.reader.Sax2.Reader() appears to
already dig through my DTD...
The reason why I'm asking is because I'm using the resulting DOM to generate
HTML 3.2 for JavaHelp. My DTD uses XHTML 1.0 entities and, for the most
part, I'd like to _not_ have the Sax2.Reader() translating the entities into
their Unicode characters (I've referenced the XHTML 1.0 entities from my DTD)...
I want to be able to leave the entities in place and/or translate them into
something myself. For example, JavaHelp 2.0 implements (most of) the
Latin-1 accented character entities, but almost none of the others, so I'll
have to handle ™ (for example) "by hand".
--
Chris Herborth chrish@cryptocard.com
Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/
Never send a monster to do the work of an evil scientist.
From dieter at handshake.de Tue Dec 2 13:41:22 2003
From: dieter at handshake.de (Dieter Maurer)
Date: Tue Dec 2 14:45:45 2003
Subject: [XML-SIG] Provide your own SAX parser to the DOM?
In-Reply-To: <3FCB8D9B.9080900@cryptocard.com>
References: <3FCB8D9B.9080900@cryptocard.com>
Message-ID: <16332.56530.567033.265903@gargle.gargle.HOWL>
Chris Herborth wrote at 2003-12-1 13:51 -0500:
> I've got PyXML 0.8.3 installed here, and I'm generating the DOM for some
> documents thusly:
>
> reader = xml.dom.ext.reader.Sax2.Reader()
>
> # snipped: setting up an external entity resolver and error handler
>
> dom = reader.fromStream( file( an_xml_filename ) )
>
> Is it possible to use a different SAX parser and still get the advantages of
> using the PyXML DOM goodness?
The "Reader" class has an optional "parser" argument.
Look at its source...
--
Dieter
From juhtolv at cc.jyu.fi Mon Dec 8 10:38:45 2003
From: juhtolv at cc.jyu.fi (Juhapekka Tolvanen)
Date: Mon Dec 8 10:38:49 2003
Subject: [XML-SIG] Any XBEL to OPML converters out there?
Message-ID: <20031208153844.GA11878@heresy.ainola.jyu.fi>
Some universal format for outline editors has been developed. It is called
OPML:
http://www.opml.org/
I'd like to find a way to convert my XBEL-bookmarks to OPML, too. Do you
know any software for that purpose? Or could you write it right now? It
would better be free (in the sense of freedom) software.
If I could convert my bookmarks to OPML-format, I could participate to
this:
http://www.superopendirectory.com/
But hey, how about creating system, that is just like SuperOpenDirectory,
but uses XBEL-format?
Here is some information of outline editors:
http://www.troubleshooters.com/tpromag/199911/199911.htm
http://www.outliners.com/
P.S: I don't subscribe to this list. I am smart enough to read archives
from WWW, but please, Cc: to me.
--
Juhapekka "naula" Tolvanen * http colon slash slash iki dot fi slash juhtolv
"Rakkaudesta ruikuttajat, halusta ulvojat kiert?? kaupungin syd?nt? vaanien
verta. Omiin synkkiin linnoihinsa vallitusten taa pelokkaammat piilee
hautomaan haamujaan." CMX
From walter at livinglogic.de Mon Dec 8 15:47:27 2003
From: walter at livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Mon Dec 8 15:47:32 2003
Subject: [XML-SIG] ANN: XIST 2.3
Message-ID: <3FD4E35F.5020403@livinglogic.de>
XIST 2.3 has been released!
What is it?
===========
XIST is an XML-based extensible HTML generator written in Python.
XIST is also a DOM parser (built on top of SAX2) with a very simple
and Pythonesque tree API. Every XML element type corresponds to a
Python class, and these Python classes provide a conversion method
to transform the XML tree (e.g., into HTML). XIST can be considered
"object oriented XSL".
What's new in version 2.3?
==========================
* Namespace handling has been rewritten to be more standard
compliant (no more namespace prefixes for entity references
or processing instructions).
* Global attributes will now always generate the appropriate
xmlns attributes.
* Support for uTidylib has been added and arguments
can be passed to tidy now.
* The HTMLParser can handle global attributes now.
* When parsing from an URL the base URL will be correct now
even if the request gets redirected
(thanks to ll-url 0.11.6).
* Various other small bugfixes and enhancements.
For changes in older versions see:
http://www.livinglogic.de/Python/xist/History.html
Where can I get it?
===================
XIST can be downloaded from http://ftp.livinglogic.de/xist/
or ftp://ftp.livinglogic.de/pub/livinglogic/xist/
Web pages are at
http://www.livinglogic.de/Python/xist/
ViewCVS access is available at
http://www.livinglogic.de/viewcvs/
Bye,
Walter D?rwald
From tpassin at comcast.net Tue Dec 9 22:29:28 2003
From: tpassin at comcast.net (Thomas B. Passin)
Date: Tue Dec 9 22:28:27 2003
Subject: [XML-SIG] Any XBEL to OPML converters out there?
In-Reply-To: <20031208153844.GA11878@heresy.ainola.jyu.fi>
References: <20031208153844.GA11878@heresy.ainola.jyu.fi>
Message-ID: <3FD69318.3050000@comcast.net>
Juhapekka Tolvanen wrote:
> Some universal format for outline editors has been developed. It is called
> OPML:
>
> http://www.opml.org/
>
> I'd like to find a way to convert my XBEL-bookmarks to OPML, too. Do you
> know any software for that purpose? Or could you write it right now? It
> would better be free (in the sense of freedom) software.
>
That should be fairly easy to do by means of an xslt stylesheet. I do
not know of any, but that is the way I would do it. This has actually
been the subject of a homework assignment - see
http://cscisl.dce.harvard.edu/assignments/2
OPLM is not a particularly well-designed format, so I would not
recommend it unless you plan to use it with some system that requires it
(which it seems you do).
Cheers,
Tom P
From lalleman at mfps.com Wed Dec 10 16:02:29 2003
From: lalleman at mfps.com (Alleman, Lowell)
Date: Wed Dec 10 16:03:27 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
Message-ID: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
Hi,
I'm working with an application that is very picky about the XML it accepts
(basically it's non-compliant). The company's support team isn't giving me
many options. Certain things that the XML spec say the parser shouldn't
care about, this utility cares about. Things like the order of attributes
and whether an empty element is written as "" or "" need to be
presented in a specific way.
Any ideas on how to work around some of these issues. Python XML tools
would be preferred, but at this point all ideas and/or tools are welcome.
All I need is to be able to dictate the order in which the attributes appear
and whether or not empty elements should be written using the shortcut
('') form.
The changes I am making to the XML document are rather trivial. I've
considered simply using a slew of string.replace() and few regular
expressions to get job done, but there maybe a few cases where the DOM
approach would be preferable over the raw text manipulation approach.
FYI: So far I have tried using minidom and 4DOM (the one from PyXML 0.8.2).
I haven't seen the flexibility that I require so far, but I'm not very
familiar with either parser. minidom would be my preference, since it is
installed as part of the standard library.
Thanks in advance,
- Lowell Alleman
From rsalz at datapower.com Wed Dec 10 16:15:59 2003
From: rsalz at datapower.com (Rich Salz)
Date: Wed Dec 10 16:10:12 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
Message-ID: <3FD78D0F.9010304@datapower.com>
> Any ideas on how to work around some of these issues
You might take a look at the c14n code in dom/ext/c14n.py; it does more
than what you want, but it shows how to walk a dom, sort attributes, etc.
/r$
--
Rich Salz, Chief Security Architect
DataPower Technology http://www.datapower.com
XS40 XML Security Gateway http://www.datapower.com/products/xs40.html
XML Security Overview http://www.datapower.com/xmldev/xmlsecurity.html
From fredrik at pythonware.com Thu Dec 11 02:31:52 2003
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu Dec 11 02:40:22 2003
Subject: [XML-SIG] Re: Working with non-compliant XML utilities
References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
Message-ID:
Lowell Alleman wrote:
> I'm working with an application that is very picky about the XML it accepts
> (basically it's non-compliant). The company's support team isn't giving me
> many options. Certain things that the XML spec say the parser shouldn't
> care about, this utility cares about. Things like the order of attributes
> and whether an empty element is written as "" or "" need to be
> presented in a specific way.
>
> Any ideas on how to work around some of these issues. Python XML tools
> would be preferred, but at this point all ideas and/or tools are welcome.
> All I need is to be able to dictate the order in which the attributes appear
> and whether or not empty elements should be written using the shortcut
> ('') form.
sounds like you need a custom XML writer.
a quick solution is to take a copy of the writexml() method from the
minidom's Element class and make it into a function (i.e. operate on
element nodes instead of self, change the recursive writexml method
call to a recursive function call, and use the _write_data from the
minidom module).
from xml.dom import minidom
from xml.dom import Node
def writexml(node, writer, indent="", addindent="", newl=""):
if node.nodeType != Node.ELEMENT_NODE:
# use standard serializer for everything but elements
node.writexml(writer, indent, addindent, newl)
return
writer.write(indent+"<" + node.tagName)
attrs = node._get_attributes()
a_names = attrs.keys()
a_names.sort()
for a_name in a_names:
writer.write(" %s=\"" % a_name)
minidom._write_data(writer, attrs[a_name].value)
writer.write("\"")
if node.childNodes:
writer.write(">%s"%(newl))
for node in node.childNodes:
writexml(node,writer,indent+addindent,addindent,newl)
writer.write("%s%s>%s" % (indent,node.tagName,newl))
else:
writer.write("/>%s"%(newl))
usage example:
import sys
node = minidom.parseString("hello")
writexml(node, sys.stdout)
when this works, tweak the code (it's trivial) until it does exactly
what you want.
hope this helps!
From and-xml at doxdesk.com Thu Dec 11 12:46:05 2003
From: and-xml at doxdesk.com (Andrew Clover)
Date: Thu Dec 11 13:04:46 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
Message-ID: <20031211174605.GA4930@doxdesk.com>
Lowell Alleman wrote:
> Certain things that the XML spec say the parser shouldn't
> care about, this utility cares about. Things like the order of attributes
Urgh. Nasty.
Well, you could try pxdom:
http://www.doxdesk.com/software/py/pxdom.html
A special feature of this DOM implementation is that it will maintain a
fixed order of attributes, so you can rely on the output being in the order
you want.
> and whether an empty element is written as "" or "" need to be
> presented in a specific way.
Is it always one way or always the other, or a mix?
pxdom will use the short form where possible, unless you ask it to do
canonicalisation (using the DOM Level 3 'canonical-form' parameter).
Unfortunately if you did canonicalisation, the attribute order would be
changed. I might add a separate option as a non-standard extension to turn
off short-forms in 1.0 if anyone else would find it useful - alteratively,
hack line 4193 in version 0.9.
If you need to output short forms in some cases but not in others, that's a
bit more work. What you could do to fool the serialiser is put a Text node
of an empty string inside every element that you want to be output in the
longer form, eg.:
element.appendChild(element.ownerDocument.createTextNode(''))
Just don't normalise it before you serialise or the empty text nodes will
disappear!
Actually, it looks like this trick works in minidom, too.
--
Andrew Clover
mailto:and@doxdesk.com
http://www.doxdesk.com/
From lalleman at mfps.com Thu Dec 11 14:04:25 2003
From: lalleman at mfps.com (Alleman, Lowell)
Date: Thu Dec 11 14:05:22 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
Message-ID: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com>
> -----Original Message-----
> From: Andrew Clover [mailto:and-xml@doxdesk.com]
> Sent: Thursday, December 11, 2003 12:46 PM
> To: xml-sig@python.org
> Subject: Re: [XML-SIG] Working with non-compliant XML utilities
>
>
> > and whether an empty element is written as "" or
> "" need to be
> > presented in a specific way.
>
> Is it always one way or always the other, or a mix?
It is per-element. For example element 'a' would always be , but 'b'
would have to be shown as ''. If 'a' was written as ' or 'b' as
, the application chokes. It's pretty annoying.
The good news is that when it comes down to actuality, only a few elements
need to be tweaked. It's always in the form of forcing "" to be written
as "", but never the other way around.
Thanks for your suggestions.
- Lowell
From Alexandre.Fayolle at logilab.fr Thu Dec 11 15:21:22 2003
From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle)
Date: Thu Dec 11 15:21:27 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com>
Message-ID: <20031211202122.GE30399@calvin>
On Thu, Dec 11, 2003 at 02:04:25PM -0500, Alleman, Lowell wrote:
> It is per-element. For example element 'a' would always be , but 'b'
> would have to be shown as ''. If 'a' was written as ' or 'b' as
> , the application chokes. It's pretty annoying.
>
> The good news is that when it comes down to actuality, only a few elements
> need to be tweaked. It's always in the form of forcing "" to be written
> as "", but never the other way around.
This reminds me of DTD validation of EMPTY elements:
if an element is declared EMPTY in a DTD, then it has to use the
shortcut notation, otherwise the document is not valid.
Now I agree that mandating some elements to use the notation
denotes a severely broken parser.
--
Alexandre Fayolle
LOGILAB, Paris (France).
http://www.logilab.com http://www.logilab.fr http://www.logilab.org
D?veloppement logiciel avanc? - Intelligence Artificielle - Formations
From lalleman at mfps.com Thu Dec 11 15:54:58 2003
From: lalleman at mfps.com (Alleman, Lowell)
Date: Thu Dec 11 15:55:57 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
Message-ID: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com>
Unfortunately, it looks like I have to do the exact opposite. Most XML
writers automatically condense to the form. I need to tell the writer
not to do so for certain elements.
The sad part about all of this really is that the tool that I'm having these
issues with is a data translation tool (sometimes called data mapping).
It's primary job is converting and processing data in various formats.
Speaking of DTDs.... I have some new questions:
The order that the attributes should appear happens to be the same order
that they are listed in the in the DTD. I've tried to pull out
the DTD info using 4DOM and minidom, but haven't had much success. (I
confess that I didn't spend too much time trying to find the appropriate
documentation.) If I can pullout the information in the , I can
quickly build a dictionary of elements which contain a list of ordered
attributes. (I've tested this idea building a small dictionary manually,
but it would be nice to do this using the DTD.)
FYI: I tried pulling in the DTD info using an external reference as well as
placing it inline. (I tried the inline DTD when using for minidom. I
assumed that minidom wouldn't pick it up automatically, as it is not a
validating parser. But I wasn't sure if it would simply ignore the DTD).
I did notice that 4DOM seemed to choke on ENTITY references ( %entity_ref; )
when the DTD was inline. Can anyone confirm that?
Feel free to send URLs.
Thanks again,
- Lowell
-----Original Message-----
From: Alexandre Fayolle [mailto:Alexandre.Fayolle@logilab.fr]
Sent: Thursday, December 11, 2003 3:21 PM
To: xml-sig@python.org
Subject: Re: [XML-SIG] Working with non-compliant XML utilities
On Thu, Dec 11, 2003 at 02:04:25PM -0500, Alleman, Lowell wrote:
> It is per-element. For example element 'a' would always be , but
'b'
> would have to be shown as ''. If 'a' was written as ' or 'b' as
> , the application chokes. It's pretty annoying.
>
> The good news is that when it comes down to actuality, only a few elements
> need to be tweaked. It's always in the form of forcing "" to be
written
> as "", but never the other way around.
This reminds me of DTD validation of EMPTY elements:
if an element is declared EMPTY in a DTD, then it has to use the
shortcut notation, otherwise the document is not valid.
Now I agree that mandating some elements to use the notation
denotes a severely broken parser.
--
Alexandre Fayolle
LOGILAB, Paris (France).
http://www.logilab.com http://www.logilab.fr http://www.logilab.org
D?veloppement logiciel avanc? - Intelligence Artificielle - Formations
_______________________________________________
XML-SIG maillist - XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig
From martin at v.loewis.de Thu Dec 11 15:59:37 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Thu Dec 11 16:00:00 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <20031211202122.GE30399@calvin>
References: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com>
<20031211202122.GE30399@calvin>
Message-ID:
Alexandre Fayolle writes:
> This reminds me of DTD validation of EMPTY elements:
> if an element is declared EMPTY in a DTD, then it has to use the
> shortcut notation, otherwise the document is not valid.
That is not the case. In XML 1.0 (second edition), after clause 43, we
find the definitions
[Definition: An element with no content is said to be empty.] The
representation of an empty element is either a start-tag immediately
followed by an end-tag, or an empty-element tag.
So an is also an empty element. After clause 44, we find
For interoperability, the empty-element tag should be used, and
should only be used, for elements which are declared EMPTY.
where "For interoperability" is defined as
for interoperability
[Definition: Marks a sentence describing a non-binding
recommendation included to increase the chances that XML documents
can be processed by the existing installed base of SGML processors
which predate the WebSGML Adaptations Annex to ISO 8879.]
So this is really "should", not "must".
Regards,
Martin
From martin at v.loewis.de Thu Dec 11 16:07:25 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Thu Dec 11 16:07:51 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com>
Message-ID:
"Alleman, Lowell" writes:
> The order that the attributes should appear happens to be the same order
> that they are listed in the in the DTD. I've tried to pull out
> the DTD info using 4DOM and minidom, but haven't had much success.
You should explicitly use xmlproc, and install a DTDListener. The
add_attribute callbacks will come in the order of attribute declaration.
> I did notice that 4DOM seemed to choke on ENTITY references ( %entity_ref; )
> when the DTD was inline. Can anyone confirm that?
No. 4DOM only uses some underlying parser, so it will never choke
itself - if something chokes, it is the underlying parser.
Regards,
Martin
From Alexandre.Fayolle at logilab.fr Fri Dec 12 03:05:35 2003
From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle)
Date: Fri Dec 12 03:05:41 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To:
References: <2F7747120C62D211AD4100805FA78E1AE697B7@mail2.mfps.com>
<20031211202122.GE30399@calvin>
Message-ID: <20031212080535.GA3080@calvin>
On Thu, Dec 11, 2003 at 09:59:37PM +0100, Martin v. L?wis wrote:
> Alexandre Fayolle writes:
>
> > This reminds me of DTD validation of EMPTY elements:
> > if an element is declared EMPTY in a DTD, then it has to use the
> > shortcut notation, otherwise the document is not valid.
>
> That is not the case. In XML 1.0 (second edition), after clause 43, we
> find the definitions
> So this is really "should", not "must".
Thanks a lot for the precision, Martin. I don't remember where I had got
the feeling of a 'must', here. I guess I should read XML 1.0 again
-- this is also really a 'should' ;-)
--
Alexandre Fayolle
LOGILAB, Paris (France).
http://www.logilab.com http://www.logilab.fr http://www.logilab.org
D?veloppement logiciel avanc? - Intelligence Artificielle - Formations
From and-xml at doxdesk.com Fri Dec 12 04:56:13 2003
From: and-xml at doxdesk.com (Andrew Clover)
Date: Fri Dec 12 05:14:52 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697B8@mail2.mfps.com>
Message-ID: <20031212095613.GA26268@doxdesk.com>
Lowell Alleman wrote:
> I need to tell the writer not to do so for certain elements.
(Speaking of which: the empty-text-node trick seems to work with 4DOM too.
Yay!)
> The sad part about all of this really is that the tool that I'm having these
> issues with is a data translation tool
Aye, that's a pretty poor data translation tool.
> The order that the attributes should appear happens to be the same order
> that they are listed in the in the DTD. I've tried to pull out
> the DTD info using 4DOM and minidom, but haven't had much success.
No, they don't make this available; as Martin says, you'll need to fiddle
with a processor to get at this info.
Alternatively, in another tiresome plug for my own imp, pxdom goes give one
access to the ATTLIST declararions, and guarantees the declarations will be
in document order. To get a list of attr names, you could say:
decls= document.doctype.pxdomAttlists.getNamedItem('tagName').declarations
attrNames= [decl.nodeName for decl in decls]
Or to sort an element's attributes in one go:
def sortAttributesByAttlistOrder(element):
doctype= element.ownerDocument.doctype
if doctype is not None:
attlist= doctype.pxdomAttlists.getNamedItem(el.tagName)
if attlist is not None:
for attdecl in attlists.declarations:
attr= element.getAttributeNode(attdecl.nodeName)
if attr is not None:
element.removeAttributeNode(attr)
element.setAttributeNode(attr)
The drawback is that pxdom doesn't (currently) use external entities,
including the DTD external subset, so you'd have to cram the s
into the internal subset for it to work.
> (I tried the inline DTD when using for minidom. I assumed that minidom
> wouldn't pick it up automatically, as it is not a validating parser.
Yes, minidom also does not use external entities.
> I did notice that 4DOM seemed to choke on ENTITY references ( %entity_ref; )
> when the DTD was inline.
Hmm. Using expat it (and minidom) seem to ignore parameter entities, but I
can't get it to choke as such. If you are getting an 'Illegal parameter
entity reference', that'll be because XML is stricter about where it allows
parameter entities in the internal subset than in an external DTD.
--
Andrew Clover
mailto:and@doxdesk.com
http://www.doxdesk.com/
From Alexandre.Fayolle at logilab.fr Fri Dec 12 07:27:14 2003
From: Alexandre.Fayolle at logilab.fr (Alexandre Fayolle)
Date: Fri Dec 12 07:27:18 2003
Subject: [XML-SIG] Working with non-compliant XML utilities
In-Reply-To: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
References: <2F7747120C62D211AD4100805FA78E1AE697AC@mail2.mfps.com>
Message-ID: <20031212122713.GF3080@calvin>
On Wed, Dec 10, 2003 at 04:02:29PM -0500, Alleman, Lowell wrote:
> FYI: So far I have tried using minidom and 4DOM (the one from PyXML 0.8.2).
> I haven't seen the flexibility that I require so far, but I'm not very
> familiar with either parser. minidom would be my preference, since it is
> installed as part of the standard library.
A way to getting what you need could probably be to use SAX to
translate the document you have to what your appplication will
understand. Get the content handler to produce the text representation
of the contents read by the parser seems feasible.
Some code to start from can be found in xml.sax.writer. The startElement
and endElement should be customized to produce attributes in the right
order, and to close elements correctly.
The complexity of the task will depend on the genericity you want to
achieve, of course.
--
Alexandre Fayolle
LOGILAB, Paris (France).
http://www.logilab.com http://www.logilab.fr http://www.logilab.org
D?veloppement logiciel avanc? - Intelligence Artificielle - Formations
From nhs at llnl.gov Fri Dec 12 12:51:42 2003
From: nhs at llnl.gov (Norman Samuelson)
Date: Fri Dec 12 12:51:51 2003
Subject: [XML-SIG] Re: Working with non-compliant XML utilities
In-Reply-To:
References:
Message-ID: <6.0.0.22.2.20031212094847.048637f8@popeye.llnl.gov>
One way you may be able to do what you want with minimal effort would be to
write the XML as usual, with whatever tool you care about, then process it
with XSL to produce the strange results you need.
- Norm -
From tpassin at comcast.net Fri Dec 12 18:26:04 2003
From: tpassin at comcast.net (Thomas B. Passin)
Date: Fri Dec 12 18:24:59 2003
Subject: [XML-SIG] Re: Working with non-compliant XML utilities
In-Reply-To: <6.0.0.22.2.20031212094847.048637f8@popeye.llnl.gov>
References:
<6.0.0.22.2.20031212094847.048637f8@popeye.llnl.gov>
Message-ID: <3FDA4E8C.3010604@comcast.net>
Norman Samuelson wrote:
> One way you may be able to do what you want with minimal effort would be
> to write the XML as usual, with whatever tool you care about, then
> process it with XSL to produce the strange results you need.
>
He can't do that - xslt will only produce normal xml, not the "strange
results" - no control over attribute order or empty element form unless
he writes his own serializer.
Cheers,
Tom P
From zhaoxinzhi at hotmail.com Sat Dec 13 05:22:26 2003
From: zhaoxinzhi at hotmail.com (Xinzhi Zhao)
Date: Sat Dec 13 05:22:32 2003
Subject: [XML-SIG] Parsing the XML file which has encoding 'gb2312' .
Message-ID:
Hi,
My XML files have to use other encoding instead of the default one, i.e.
'gb2312'. When I was parsing my XML files by dint of DOM or SAX , some
errors occurred. The Python xml packages can't do it now? Is there any way
can finish my job? How shall I do it? Please help me.
Thanks,
Xinzhi Zhao
zhaoxinzhi@hotmail.com
-------------------------------------------------------------------------------
-- My xml file is shown as below,
----------------------------------------------
¼òµ¥µÄ XML
December 12, 2003
Xinzhi Zhao
Parsing XML
This XML is available in IE6. However,parsing it in Python by DOM
or SAX will be failed.How shall I do it?
_________________________________________________________________
Add photos to your messages with MSN 8. Get 2 months FREE*.
http://join.msn.com/?page=features/featuredemail
From mike at skew.org Sat Dec 13 08:14:13 2003
From: mike at skew.org (Mike Brown)
Date: Sat Dec 13 08:14:17 2003
Subject: [XML-SIG] Parsing the XML file which has encoding 'gb2312' .
In-Reply-To: "from Xinzhi Zhao at Dec
13, 2003 10:22:26 am"
Message-ID: <200312131314.hBDDEDmi021838@chilled.skew.org>
Xinzhi Zhao wrote:
> Hi,
> My XML files have to use other encoding instead of the default one, i.e.
> 'gb2312'. When I was parsing my XML files by dint of DOM or SAX , some
> errors occurred. The Python xml packages can't do it now? Is there any way
> can finish my job? How shall I do it? Please help me.
Limitations of the underlying parser, Expat, prevent certain encodings from
being supported without an additional layer of code. GB2312 is among them.
I think you will have to transcode your document to one of the encodings that
is supported by Expat (UTF-16, UTF-16LE, UTF-16BE, UTF-8, ISO-8859-1, or
US-ASCII; you probably want UTF-8 or UTF-16), and then either rewrite the
encoding declaration in the XML, or find a way to make the declaration
externally. Expat does support external declaration of encoding, but I don't
know offhand how to do it from Python.
From martin at v.loewis.de Sat Dec 13 08:45:12 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Dec 13 08:45:34 2003
Subject: [XML-SIG] Parsing the XML file which has encoding 'gb2312' .
In-Reply-To: <200312131314.hBDDEDmi021838@chilled.skew.org>
References: <200312131314.hBDDEDmi021838@chilled.skew.org>
Message-ID:
Mike Brown writes:
> I think you will have to transcode your document to one of the encodings that
> is supported by Expat (UTF-16, UTF-16LE, UTF-16BE, UTF-8, ISO-8859-1, or
> US-ASCII
Alternatively, you can use xmlproc, which supports any encoding for
which you have a Python codec.
Regards,
Martin
From zhaoxz at founder.com Thu Dec 11 09:07:37 2003
From: zhaoxz at founder.com (=?ISO-8859-1?Q?=D5=D4=D0=C2=D6=BE?=)
Date: Sat Dec 13 09:41:54 2003
Subject: [XML-SIG] Parsing XML
Message-ID:
Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: face-3(2)(1).GIF
Type: image/gif
Size: 842 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20031211/8937ad4b/face-321.gif
From fredrik at pythonware.com Sat Dec 13 09:56:17 2003
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat Dec 13 09:56:21 2003
Subject: [XML-SIG] Re: Parsing XML
References:
Message-ID:
zhaoxz@founder.com wrote:
> My XML files have to use encoding 'iso-8859-1',which is different
> from the default encoding 'utf-8'.
>
> When I was using the package from 4DOM(pyxml.souceforge.net)
> to parse my XML files,errors occured. The package for parsing xml
> only supports encoding 'utf-8', right?
if your XML files use ISO-8859-1 encoding, they should contain
an encoding directive in the ?xml header; see
http://www.w3.org/TR/2000/REC-xml-20001006#NT-EncodingDecl
From mike at skew.org Sat Dec 13 10:11:32 2003
From: mike at skew.org (Mike Brown)
Date: Sat Dec 13 10:11:39 2003
Subject: [XML-SIG] Parsing XML
In-Reply-To:
"from =?ISO-8859-1?Q?=D5=D4=D0=C2=D6=BE?= at Dec 11, 2003 10:07:37 pm"
Message-ID: <200312131511.hBDFBW88022355@chilled.skew.org>
> My XML files have to use encoding 'iso-8859-1',which is different
> from the default encoding 'utf-8'.
Technically, there is no default, but conforming parsers assume utf-16 until
they see there's no byte-order mark (BOM) at the beginning, and then assume
utf-8 until they see something else declared in the prolog.
> When I was using the package from 4DOM(pyxml.souceforge.net)
> to parse my XML files,errors occured.
What errors, specifically?
Are you sure your XML files are actually iso-8859-1 encoded?
Note: it is the XML author's responsibility to ensure that the encoding
declaration in the prolog accurate reflects the actual encoding of the
document. If you had a gb2312 file and just changed the declaration to say
iso-8859-1, you didn't change the actual encoding of the document, you just
made the declaration be wrong, which an XML parser is required to treat as a
fatal error.
> The package for parsing xml
> only supports encoding 'utf-8', right?
No, the parser that 4DOM uses (Expat) supports other encodings, as I mentioned
in my other message today. iso-8859-1 should work just fine.
If you are still trying to parse gb2312-encoded XML, you need to do more than
just replace 'gb2312' with 'iso-8859-1' in the encoding declaration. Use
Python's codecs module to wrap your gb2312 stream, decoding from gb2312 to
Unicode, at which point you can safely rewrite the declaration in the prolog
if necessary, and then wrap again, encoding from Unicode to utf-8 (or utf-16).
This is what I meant by 'transcode'. You won't need to rewrite the declaration
if you can figure out how to make Expat accept the external encoding
declaration from Python. I was hoping a PyExpat expert would suggest the
answer.
-Mike
From KSBeattie at lbl.gov Mon Dec 22 22:00:02 2003
From: KSBeattie at lbl.gov (Keith Beattie)
Date: Mon Dec 22 22:00:17 2003
Subject: [XML-SIG] binding an unbound namespace prefix
Message-ID: <3FE7AFB2.50407@lbl.gov>
Hi all,
I'm trying to parse a string which is a segment of xml (in order to
canonicalize it) which doesn't have all it's namespaces bound in the segment
I'm trying to parse. How do I pass the namespaces into minidom.parseString(),
or Domlette.NonvalidatingReader.parseString(),such that they'll be happy with
the 'unbound prefix'? I hoped to see an nsdict kw arg or some such, but no
luck. Is building the dom myself the only way to do this?
Thanks,
ksb
From walter at livinglogic.de Tue Dec 23 04:18:43 2003
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Tue Dec 23 04:19:02 2003
Subject: [XML-SIG] binding an unbound namespace prefix
In-Reply-To: <3FE7AFB2.50407@lbl.gov>
References: <3FE7AFB2.50407@lbl.gov>
Message-ID: <3FE80873.2020102@livinglogic.de>
Keith Beattie wrote:
> Hi all,
>
> I'm trying to parse a string which is a segment of xml (in order to
> canonicalize it) which doesn't have all it's namespaces bound in the
> segment I'm trying to parse. How do I pass the namespaces into
> minidom.parseString(), or
> Domlette.NonvalidatingReader.parseString(),such that they'll be happy
> with the 'unbound prefix'? I hoped to see an nsdict kw arg or some
> such, but no luck. Is building the dom myself the only way to do this?
You could try XIST (http://www.livinglogic.de/Python/xist/), which
supports passing a prefix mapping to the parser:
from ll.xist import xsc, parsers
from ll.xist.ns import html, svg, fo
e = parsers.parseString(
"",
prefixes=xsc.Prefixes(fo, s=svg, h=html)
)
Unfortunately this doesn't return a standard DOM, but of course
you could convert it into one.
Bye,
Walter D?rwald
From fdrake at acm.org Tue Dec 23 08:35:05 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue Dec 23 08:35:26 2003
Subject: [XML-SIG] binding an unbound namespace prefix
In-Reply-To: <3FE7AFB2.50407@lbl.gov>
References: <3FE7AFB2.50407@lbl.gov>
Message-ID: <16360.17545.295816.495961@sftp.fdrake.net>
Keith Beattie writes:
> I'm trying to parse a string which is a segment of xml (in order to
> canonicalize it) which doesn't have all it's namespaces bound in
> the segment I'm trying to parse. How do I pass the namespaces into
> minidom.parseString(), or
> Domlette.NonvalidatingReader.parseString(),such that they'll be
> happy with the 'unbound prefix'? I hoped to see an nsdict kw arg
> or some such, but no luck. Is building the dom myself the only way
> to do this?
No, but working around the current API to do this is pretty painful at
the moment. Please file a feature request for better fragment
support; you can assign it to me if you like.
There is some code in xml.dom.expatbuilder that shows how to do this;
it may be a bit difficult to decipher. The code is mine, so feel free
to ask questions about it here on the XML-SIG mailing list.
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Zope Corporation
From csad7 at t-online.de Tue Dec 23 12:27:02 2003
From: csad7 at t-online.de (c.)
Date: Tue Dec 23 12:28:49 2003
Subject: [XML-SIG] empty EntityResolver for SAX
Message-ID: <3FE87AE6.6070501@cdot.de>
hi,
(the following description is a bit convoluted, sorry about that. i hope
you understand it anyway...)
i thought of providing an empty EntityResolver to my parse function that
if i encounter xml files with DTDs in them these will not be processed.
class EmptyEntityResolver(xml.sax.handler.EntityResolver):
def resolveEntity(self, publicId, systemId):
return "http://localhost/empty.txt"
p = xml.sax.make_parser()
p.setContentHandler(handler)
p.setEntityResolver(EmptyEntityResolver())
i could use
p.setFeature('http://xml.org/sax/features/external-general-entities',False)
of course but i thought something like the above might be better for my
purpose.
my problem now is that something like
return None
does not work. only the above with the dummy empty.txt file needs to be
present.
is there a simpler way of returning an empty InputSource?
thanks a lot
chris
From shunting at etopicality.com Tue Dec 23 16:32:42 2003
From: shunting at etopicality.com (Sam Hunting)
Date: Tue Dec 23 16:32:58 2003
Subject: [XML-SIG] Which version of PyXML do I install?
Message-ID:
Here are the first few lines from dmesg:
Linux version 2.4.23-xfs-031204 (...@...) (gcc version 2.95.4
20011002 (Debian prerelease)) #1 SMP Thu Dec 4 17:08:50 CET 2003
I'd prefer to use an rpm if possible.
Sam Hunting
eTopicality, Inc.
---------------------------------------------------------------------------
Co-editor: ISO Reference Model for Topic Maps
Topic map consulting and training: www.etopicality.com
Free open source topic map tools: www.gooseworks.org
XML Topic Maps: Creating and Using Topic Maps for the Web.
Addison-Wesley, ISBN 0-201-74960-2.
---------------------------------------------------------------------------
From and-xml at doxdesk.com Wed Dec 24 05:22:03 2003
From: and-xml at doxdesk.com (Andrew Clover)
Date: Wed Dec 24 05:41:18 2003
Subject: [XML-SIG] binding an unbound namespace prefix
In-Reply-To: <3FE7AFB2.50407@lbl.gov>
References: <3FE7AFB2.50407@lbl.gov>
Message-ID: <20031224102203.GA29545@doxdesk.com>
Keith Beattie wrote:
> How do I pass the namespaces into minidom.parseString(), or
> Domlette.NonvalidatingReader.parseString(), such that they'll be happy
> with the 'unbound prefix'?
I know of no convenient way of doing this with either minidom or domlette.
Probably the quickest solution is to hack the input content so it's
surrounded with an element declaring all the known namespaces, then ignore
the root element of the result.
Alternatively, the DOM Level 3 method parseWithContext would let you insert
directly into the relevant part of the document (with namespaces declared
above). pxdom supports this method and the domConfig parameter
'canonical-form', so that might be a possibility too.
--
Andrew Clover
mailto:and@doxdesk.com
http://www.doxdesk.com/
From list-matt at reprocessed.org Wed Dec 24 06:52:19 2003
From: list-matt at reprocessed.org (Matt Patterson)
Date: Wed Dec 24 06:52:25 2003
Subject: [XML-SIG] testing a document for validity against a schema,
not a DTD
Message-ID:
Hello all,
I'm looking for a way to validate an XML document against a schema:
nothing fancy, just a simple yes/no response from the parser would
probably do.
I can do it several ways with DTDs, but I'm unsure about XML Schema
support in Python.
Can anyone enlighten me?
Many thanks,
Matt Patterson
From chrish at cryptocard.com Wed Dec 24 11:06:28 2003
From: chrish at cryptocard.com (Chris Herborth)
Date: Wed Dec 24 11:03:28 2003
Subject: [XML-SIG] Validating parser
Message-ID: <3FE9B984.9040600@cryptocard.com>
I'm upgrading my XML application to use the validating parser; I've been
fixing previously-hidden bugs in my DTD and my document instances as I go...
but now I've gotten to one that is baffling me... must be the seasonal
distraction. ;-)
Here's the error:
Invalid XML, unable to continue.
book.xml, line 11, column 3: Not a valid name
And here are the first 11 lines of book.xml:
%book.entities;
]>
If I remove the book.ent bit, it still complains at the end of the DOCTYPE
declaration, so I'm guessing there's an invalid name somewhere in my DTD.
Although I'm not sure why this error wouldn't be reported until the end of
the declaration, instead of during DTD parsing like my other DTD-related
errors...
Any help is grealy appreciated, thanks!
--
Chris Herborth chrish@cryptocard.com
Documentation Overlord, CRYPTOCard Corp. http://www.cryptocard.com/
Never send a monster to do the work of an evil scientist.
From xml-sig at thewrittenword.com Mon Dec 29 18:04:20 2003
From: xml-sig at thewrittenword.com (Albert Chin)
Date: Mon Dec 29 18:04:28 2003
Subject: [XML-SIG] 4suite 1.0a3/PyXML 1.0a3 on HP-UX with Python 2.3.2]\
Message-ID: <20031229230420.GA56939@spuckler.il.thewrittenword.com>
I've installed PyXML 0.8.3 and 4Suite 1.0a3 on HP-UX 11.x and Solaris
2.x with GCC 3.3.2. The following program causes a failure on HP-UX
but works on Solaris:
$ cat a.xml
$ cat a.py
#!/opt/TWWfsw/python232/bin/python
from xml.dom.ext.reader import PyExpat
from Ft.Xml.XPath import Evaluate
fd = open('a.xml', 'r')
reader = PyExpat.Reader()
dom = reader.fromStream(fd)
$ python a.py
Traceback (most recent call last):
File "./a.py", line 8, in ?
dom = reader.fromStream(fd)
File "/opt/TWWfsw/python232p/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/PyExpat.py", line 65, in fromStream
success = self.parser.ParseFile(stream)
File "/opt/TWWfsw/python232p/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/PyExpat.py", line 120, in startElement
self._completeTextNode()
File "/opt/TWWfsw/python232p/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/PyExpat.py", line 104, in _completeTextNode
if self._currText and len(self._nodeStack) and
self._nodeStack[-1].nodeType != Node.DOCUMENT_NODE:
AttributeError: 'NoneType' object has no attribute 'nodeType'
I posted to the 4Suite-dev mailing list but the problem appears to be
a PyXML one. Any ideas?
--
albert chin (china@thewrittenword.com)
From zhaoxinzhi at hotmail.com Mon Dec 29 21:36:44 2003
From: zhaoxinzhi at hotmail.com (Xinzhi Zhao)
Date: Mon Dec 29 23:03:09 2003
Subject: [XML-SIG] Does Python support XQuery?
Message-ID:
Does Python support XQuery? If it does, would you please show me a example?
ManyThanks.
--Xinzhi Zhao
_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*
http://join.msn.com/?page=features/junkmail
From scout104 at comcast.net Wed Dec 31 04:06:42 2003
From: scout104 at comcast.net (Janna)
Date: Wed Dec 31 04:06:23 2003
Subject: [XML-SIG] Buy Vicodin online today, overnight shipping xyiz kccg
v
Message-ID: <3FF291A2.7080200@comcast.net>
can you give me more info on buying vicodin? Janna Kneale
scout104@comcast.net
thanks