From uche.ogbuji@fourthought.com Tue May 1 18:13:29 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 01 May 2001 11:13:29 -0600
Subject: [XML-SIG] Re: [4suite] PyChecker could help
References: <3aeee9f93d2396cb@amyris.wanadoo.fr> (added by amyris.wanadoo.fr)
Message-ID: <3AEEEEB9.29880F20@fourthought.com>
Sebastien Pierre wrote:
> Here are some errors that PyChecker has found with 4Suite 0.11:
>
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> Document.py:124 No attribute (documentElement) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> Document.py:180 No attribute (documentElement) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> Document.py:211 No attribute (implementation) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> Document.py:242 No attribute (documentElement) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> Document.py:251 No attribute (childNodes) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> Document.py:299 No attribute (childNodes) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> Document.py:299 No attribute (doctype) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> Document.py:299 No attribute (documentElement) found
>
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> FtNode.py:135 No global (XML_NAMESPACE) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> FtNode.py:271 No attribute (firstChild) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> FtNode.py:345 self is not first method argument
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> FtNode.py:346 No global (self) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> FtNode.py:362 No attribute (ownerDocument) found
> /boot/home/config/lib/python2.0/site-packages/_xmlplus/dom/
> FtNode.py:372 No attribute (ownerDocument) found
>
> Using this tool could help you find out some bugs in the 4Suite.
> PyChecker is available at .
> Cheers!
Thanks, but note that 4DOM is no longer part of 4Suite. I'll try to
look into this before the PyXML 0.6.6 release.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From rsalz@zolera.com Wed May 2 18:16:51 2001
From: rsalz@zolera.com (Rich Salz)
Date: Wed, 02 May 2001 13:16:51 -0400
Subject: [XML-SIG] Proposing a web services SIG
Message-ID: <3AF04103.A7FA3F01@zolera.com>
I'd like to propose a new SIG, Web Services. Web services uses XML and
related standards (schema, wsdl, soap, uddi) to provide a distributed
computing infrastructure.
There is a great deal of Python activity starting up here -- several
SOAP implementation, interop work, WSDL parsing, etc. Much of the
information exchange has been late-night point-to-point email, and it's
time to provide a visible focal point for this activity.
Our feeling (a few of us have chatted about this) is that the web
services community generally takes Sax, DOM, etc., "for granted" and
that it makes more sense to create a new SIG rather than be part of
XML-SIG. XML Schema is a likely area of overlap, and we'll work
together to handle that.
In terms of code, web pages, etc., we'd follow the (high) standards of
the XML Sig.
Comments, next steps?
/r$
From Nicolas.Chauvat@logilab.fr Wed May 2 18:36:06 2001
From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat)
Date: Wed, 2 May 2001 19:36:06 +0200 (CEST)
Subject: [XML-SIG] Proposing a web services SIG
In-Reply-To: <3AF04103.A7FA3F01@zolera.com>
Message-ID:
> Comments, next steps?
+1 for web-services-sig (and RDF tools in PyXML ;-)
--=20
Nicolas Chauvat
http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F=
rance)
From Mike.Olson@fourthought.com Wed May 2 19:22:12 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Wed, 02 May 2001 12:22:12 -0600
Subject: [XML-SIG] Proposing a web services SIG
References:
Message-ID: <3AF05054.E803903D@FourThought.com>
Nicolas Chauvat wrote:
>=20
> > Comments, next steps?
>=20
> +1 for web-services-sig (and RDF tools in PyXML ;-)
+1 for me as well
Mike
>=20
> --
> Nicolas Chauvat
>=20
> http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Pari=
s (France)
>=20
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--=20
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com=20
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From Cayce@actzero.com Wed May 2 19:26:40 2001
From: Cayce@actzero.com (Cayce Ullman)
Date: Wed, 2 May 2001 11:26:40 -0700
Subject: [XML-SIG] Proposing a web services SIG
Message-ID:
>> Comments, next steps?
>+1 for web-services-sig (and RDF tools in PyXML ;-)
I would like to second this motion as well. I'm aware of 5 implementations
of SOAP in Python (2 of which were created in the month of April, one of
which was mine), so there is clearly some interest in Python+WS. Plus I
think some open collaboration could go a long way towards making Python a
language of choice for web services work.
Cayce
From uche.ogbuji@fourthought.com Wed May 2 19:40:06 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 02 May 2001 12:40:06 -0600
Subject: [XML-SIG] Proposing a web services SIG
In-Reply-To: Message from Cayce Ullman
of "Wed, 02 May 2001 11:26:40 PDT."
Message-ID: <200105021840.f42Ie6D21877@localhost.local>
> >> Comments, next steps?
> >+1 for web-services-sig (and RDF tools in PyXML ;-)
>
> I would like to second this motion as well. I'm aware of 5 implementations
> of SOAP in Python (2 of which were created in the month of April, one of
> which was mine), so there is clearly some interest in Python+WS. Plus I
> think some open collaboration could go a long way towards making Python a
> language of choice for web services work.
Well, all very well, and I can go either way on new SIG vs. just use XML-SIG,
but does anyone know how to expeditiously go about creating a Python SIG? I
suppose it involves some magic incantations on the meta-SIG, but I don't know
the current state-of-the-SIGS.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From guido@digicool.com Wed May 2 20:41:48 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 14:41:48 -0500
Subject: [XML-SIG] Re: [meta-sig] Proposing a web services SIG
In-Reply-To: Your message of "Wed, 02 May 2001 13:16:51 -0400."
<3AF04103.A7FA3F01@zolera.com>
References: <3AF04103.A7FA3F01@zolera.com>
Message-ID: <200105021941.OAA03587@cj20424-a.reston1.va.home.com>
> I'd like to propose a new SIG, Web Services. Web services uses XML and
> related standards (schema, wsdl, soap, uddi) to provide a distributed
> computing infrastructure.
>
> There is a great deal of Python activity starting up here -- several
> SOAP implementation, interop work, WSDL parsing, etc. Much of the
> information exchange has been late-night point-to-point email, and it's
> time to provide a visible focal point for this activity.
>
> Our feeling (a few of us have chatted about this) is that the web
> services community generally takes Sax, DOM, etc., "for granted" and
> that it makes more sense to create a new SIG rather than be part of
> XML-SIG. XML Schema is a likely area of overlap, and we'll work
> together to handle that.
>
> In terms of code, web pages, etc., we'd follow the (high) standards of
> the XML Sig.
>
> Comments, next steps?
Read http://www.python.org/sigs/guidelines.html (all of it!).
Basically, you need to appoint a volunteer, write a mission statement,
and circulate the draft mission statement on the meta-sig.
--Guido van Rossum (home page: http://www.python.org/~guido/)
From rsalz@zolera.com Wed May 2 20:23:15 2001
From: rsalz@zolera.com (Rich Salz)
Date: Wed, 02 May 2001 15:23:15 -0400
Subject: [XML-SIG] Re: [meta-sig] Proposing a web services SIG
References: <3AF04103.A7FA3F01@zolera.com> <200105021941.OAA03587@cj20424-a.reston1.va.home.com>
Message-ID: <3AF05EA3.71F8B4F1@zolera.com>
> Read http://www.python.org/sigs/guidelines.html (all of it!).
I did. The instructions at the end were fairly casual, and I thought my
note was good enough, sorry. Let me try again...
> Basically, you need to appoint a volunteer, write a mission statement,
> and circulate the draft mission statement on the meta-sig.
I'm volunteering to coordinate webservices-sig.
Short blurb: make it easy for python programmers to provide and use web
services.
Longer blurb: Web services uses SOAP, WSDL, UDDI, other standards to
provide a distributed component infrastructure. The webservices-sig is
focused on providing implementations of these standards so that Python
programmers can easily write and use web services (i.e., both clients
and servers -- the latter includes HTTPServer, but also other servers
such as Apache, Zope, etc.)
The initial goal of the SIG will be to develop freely-usable
implementations of SOAP, WSDL, and probably UDDI. Some coordination with
XML Sig will be necessary, for example, because WSDL uses XML Schema. We
will develop a framework for supporting multiple implementations.
Thanks.
/r$
From Juergen Hermann"
Message-ID:
On Wed, 2 May 2001 11:26:40 -0700, Cayce Ullman wrote:
>I would like to second this motion as well. I'm aware of 5 implementat=
ions
>of SOAP in Python (2 of which were created in the month of April, one o=
f
>which was mine),
Could you list those, together with a homepage URL? ;)
Ciao, J=FCrgen
From noreply@sourceforge.net Wed May 2 23:31:23 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 02 May 2001 15:31:23 -0700
Subject: [XML-SIG] [ pyxml-Bugs-420882 ] no xpath, xslt install from CVS checkout
Message-ID:
Bugs item #420882, was updated on 2001-05-02 15:31
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=420882&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Karl Anderson (karlanderson)
Assigned to: Nobody/Anonymous (nobody)
Summary: no xpath, xslt install from CVS checkout
Initial Comment:
I installed a CVS checkout from an hour or so ago
into a test directory with setup.py:
python setup.py build
python setup.py install --prefix=[dir]
This didn't copy the xpath or xslt dirs into the
/lib/python1.5/site-packages/xml subdirectory of my
install dir. Once I copied them manually xpath
worked.
I expected setup.py to use everything that was built;
am I doing something weird?
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=420882&group_id=6473
From Cayce@actzero.com Thu May 3 01:17:02 2001
From: Cayce@actzero.com (Cayce Ullman)
Date: Wed, 2 May 2001 17:17:02 -0700
Subject: [XML-SIG] RE: SOAP for Python
Message-ID:
> >I would like to second this motion as well. I'm aware of 5
> implementations
> >of SOAP in Python (2 of which were created in the month of
> April, one of
> >which was mine),
>
> Could you list those, together with a homepage URL? ;)
>
SOAP.py (mine) : http://www.actzero.com the leader in terms of
interoperability and features (as far as I know)
SOAP.py (part of Scarab) : http://www.casbah.org hasn't moved for over a
year, at a glance looks fairly unusable.
soaplib.py : http://www.pythonware.com by Fredrik Lundh, much in the style
of xmlrpclib
SOAPy : http://soapy.sourceforge.net by Adam Elman, new client
implementation supports WSDL
FT : http://www.fourthought.com It was my understanding that Fourthought
also is working on an impl, correct me if I'm wrong Mike.
From rsalz@zolera.com Thu May 3 01:39:34 2001
From: rsalz@zolera.com (Rich Salz)
Date: Wed, 02 May 2001 20:39:34 -0400
Subject: [XML-SIG] RE: SOAP for Python
References:
Message-ID: <3AF0A8C6.BD3C239F@zolera.com>
> FT : http://www.fourthought.com It was my understanding that Fourthought
> also is working on an impl, correct me if I'm wrong Mike.
I think he's at the same stage as I am -- discussion.
From uche.ogbuji@fourthought.com Thu May 3 03:24:34 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 02 May 2001 20:24:34 -0600
Subject: [XML-SIG] RE: SOAP for Python
In-Reply-To: Message from Rich Salz
of "Wed, 02 May 2001 20:39:34 EDT." <3AF0A8C6.BD3C239F@zolera.com>
Message-ID: <200105030224.f432OYX01370@localhost.local>
> > FT : http://www.fourthought.com It was my understanding that Fourthought
> > also is working on an impl, correct me if I'm wrong Mike.
>
> I think he's at the same stage as I am -- discussion.
Nope. Way past discussion. 4Suite Server 0.11 (alpha) features a SOAP server.
Examples here
http://www-106.ibm.com/developerworks/webservices/library/ws-pyth3/
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From Mike.Olson@fourthought.com Thu May 3 03:21:08 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Wed, 02 May 2001 20:21:08 -0600
Subject: [XML-SIG] RE: SOAP for Python
References:
Message-ID: <3AF0C094.3946A1AF@FourThought.com>
Cayce Ullman wrote:
>
>
> FT : http://www.fourthought.com It was my understanding that Fourthought
> also is working on an impl, correct me if I'm wrong Mike.
We have parts of an implementation but are looking to expand on it a lot
in the next month or so.
Mike
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Thu May 3 04:11:25 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 02 May 2001 21:11:25 -0600
Subject: [XML-SIG] RE: SOAP for Python
In-Reply-To: Message from Mike Olson
of "Wed, 02 May 2001 20:21:08 MDT." <3AF0C094.3946A1AF@FourThought.com>
Message-ID: <200105030311.f433BPV01587@localhost.local>
> Cayce Ullman wrote:
> >
> >
> > FT : http://www.fourthought.com It was my understanding that Fourthought
> > also is working on an impl, correct me if I'm wrong Mike.
>
> We have parts of an implementation but are looking to expand on it a lot
> in the next month or so.
Ah. Mike's more cautious than I. I'll be explicit though: the only part
we're "missing" is the SOAP serialization. But as far as I'm concerned, we're
not missing anything in that case. The SOAP serialization, frankly stinks.
I've already spat my venom at whomever didn't rip section 5 out of the SOAP
spec after a second reading, but we'll see how that works out.
Until then, I rely on the fact that section 5 is explicitly optional. There
is no requirement for a SOAP implementation to use the SOAP serialization.
I'm actually more interested in writing an RDF serialization, and with some
support, it's not inconceivable that such a thing would oust section 5 before
XML Protocol emerges.
So, I disagree that 4SS has just parts of an implementation. We have a SOAP
server according to the spec.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From rsalz@zolera.com Thu May 3 04:56:51 2001
From: rsalz@zolera.com (Rich Salz)
Date: Wed, 02 May 2001 23:56:51 -0400
Subject: [XML-SIG] RE: SOAP for Python
References: <200105030311.f433BPV01587@localhost.local>
Message-ID: <3AF0D703.29636208@zolera.com>
> Until then, I rely on the fact that section 5 is explicitly optional. There
> is no requirement for a SOAP implementation to use the SOAP serialization.
Technically right, but it would be *very* surprising and upsetting to
folks who naively used the 4SS implementation to talk to other web
services. It might even cause them to spit venom at you.
> I'm actually more interested in writing an RDF serialization, and with some
> support, it's not inconceivable that such a thing would oust section 5 before
> XML Protocol emerges.
It's about as likely as someone accepting my DER encoding.
/r$
From tpassin@home.com Thu May 3 05:04:47 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Thu, 3 May 2001 00:04:47 -0400
Subject: [XML-SIG] Proposing a web services SIG
References: <3AF04103.A7FA3F01@zolera.com>
Message-ID: <003301c0d386$319fab80$7cac1218@reston1.va.home.com>
[Rich Salz]
> I'd like to propose a new SIG, Web Services. Web services uses XML and
> related standards (schema, wsdl, soap, uddi) to provide a distributed
> computing infrastructure.
>
I'd go +1 on this.
Cheers,
Tom P
From uche.ogbuji@fourthought.com Thu May 3 05:23:42 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 02 May 2001 22:23:42 -0600
Subject: [XML-SIG] RE: SOAP for Python
In-Reply-To: Message from Rich Salz
of "Wed, 02 May 2001 23:56:51 EDT." <3AF0D703.29636208@zolera.com>
Message-ID: <200105030423.f434Nga02058@localhost.local>
> > Until then, I rely on the fact that section 5 is explicitly optional. There
> > is no requirement for a SOAP implementation to use the SOAP serialization.
>
> Technically right, but it would be *very* surprising and upsetting to
> folks who naively used the 4SS implementation to talk to other web
> services. It might even cause them to spit venom at you.
Usually that's when things become fun.
However, you'll have to explain yourself better. What is this naivete you're
talking about? If they're using a "conformant" SOAP client, there should be
little such "surprise". And they certainly should not be upset.
Even Dave Reed of Miccrosoft at XML DevCon was very careful to point out that
the success of SOAP interop would come with proper handling of SOAP's
flexibility. Check your assumptions at the door or prepare to crash and burn.
If the major champion of SOAP can say so, especially after cooking up five of
their own SOAP implemnentations wand having to (admittedly) force-feed
themselves interop, I don't see how I can credit your idea that anyone should
be surprised or upset working with a system that doesn't implement section 5.
> > I'm actually more interested in writing an RDF serialization, and with some
> > support, it's not inconceivable that such a thing would oust section 5 before
> > XML Protocol emerges.
>
> It's about as likely as someone accepting my DER encoding.
If you think you know the shape of what will come from XP, I think you have
another thought coming. The politics that are massed within this group are
probably even more massed than those of XML Schema, and indeed the XP WG is
larger than the Schema WG.
I can lay a solid bet that you won't recognize a significant amount of XP from
what you see in SOAP.
But then again, anyone who followed XML-RPC -> SOAP should realize this isn't
much of a prediction.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Thu May 3 05:40:28 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 02 May 2001 22:40:28 -0600
Subject: [XML-SIG] RE: SOAP for Python
In-Reply-To: Message from Rich Salz
of "Wed, 02 May 2001 23:56:51 EDT." <3AF0D703.29636208@zolera.com>
Message-ID: <200105030440.f434eSb02098@localhost.local>
> > Until then, I rely on the fact that section 5 is explicitly optional. There
> > is no requirement for a SOAP implementation to use the SOAP serialization.
>
> Technically right, but it would be *very* surprising and upsetting to
> folks who naively used the 4SS implementation to talk to other web
> services. It might even cause them to spit venom at you.
After I sent my last message another thought struck me. You use the term "Web
services" above. Probably I have to understand what you mean by that before I
understand why you think it would be surprising and upsetting to have SOAP
systems that don't implement section 5.
The only reason everyone would want to "just stick to section 5" is for
"transparent" API-type calls. RPC all over again. Basically CORBA with
SOAP/HTTP over the wire rather than CDR/IIOP.
But what on earth is the use of such a thing? Why not just use CORBA or DCOM
or RMI, all of which are vastly more efficient than SOAP and can claim more
pedigree and interop?
The answer is simple: because such tightly-coupled systems do not survive the
boundary from one business technology and process to another. Crossing such a
boundary requires loosely-coupled systems, and that is the only reason there
is any relevance to the buzzword "Web services".
Successful Web services will be message-oriented, loosely coupled systems with
a great deal of flexibility that is handled through metadata management.
Whether you're in the ebXML camp or the UDDI camp, you had better be taking
those tModels, WSDL bindings and CPPs seriously, because if you just blindly
write code that assumes that, say everyone uses SOAP serialization, you will
be doing commerce with only a fraction of your brave new market.
This is why it was utter silliness for section 5 not to have been broken out
of SOAP transport into a separate spec. It encourages people to wrongly
assume that SOAP implies section 5, and thereby condemn themselves to
reinventing the RPC wheel all over again.
And I'll note that I'm not alone in this sentiment. In past SOAP debates on
XML-DEV, no lesser figures than Tim Bray and David Megginson have expressed
similar annoyance at the conflation of transport and content model that mars
SOAP.
So do I think it's realistic that section 5 will be put in its place before XP
emerges? Absolutely. And in the unlikely event that this doesn't happen, Web
services will pretty much drown in its own unfulfilled promises.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From martin@loewis.home.cs.tu-berlin.de Wed May 2 23:53:07 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 3 May 2001 00:53:07 +0200
Subject: [XML-SIG] Proposing a web services SIG
In-Reply-To: <200105021840.f42Ie6D21877@localhost.local> (message from Uche
Ogbuji on Wed, 02 May 2001 12:40:06 -0600)
References: <200105021840.f42Ie6D21877@localhost.local>
Message-ID: <200105022253.f42Mr7B01762@mira.informatik.hu-berlin.de>
> Well, all very well, and I can go either way on new SIG vs. just use
> XML-SIG, but does anyone know how to expeditiously go about creating
> a Python SIG? I suppose it involves some magic incantations on the
> meta-SIG, but I don't know the current state-of-the-SIGS.
I just asked to close three of them, so it is probably time to fill
the empty space :-)
In any case, I think Rich's proposal is missing an expiration/review
date for the SIG, yet. Traditionally, SIGs used to expire after one
(?) year (after which they could be extended), but with the little
review they get after that time, reviewing them every two years is
probably as fine.
In any case, this is all meta-sig business.
Regards,
Martin
P.S. There is also the issue of the SIG web pages. I'm still looking
for comments on whether they ought to live in the Python CVS, or in a
separate SF project (which check-in-permissions for all SIG
coordinators).
From noreply@sourceforge.net Thu May 3 09:58:41 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 03 May 2001 01:58:41 -0700
Subject: [XML-SIG] [ pyxml-Bugs-420977 ] 4XSLT traceback
Message-ID:
Bugs item #420977, was updated on 2001-05-03 01:58
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=420977&group_id=6473
Category: SAX
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: 4XSLT traceback
Initial Comment:
Hi,
I get a traceback when trying to process an XSLT
generated by schematron. The XSLT is attached to this
bug report. It could be a problem with the schematron
itself.
The document on which he xslt is applied is '
The traceback is:
alf@lapinot:~/schematron$ 4xslt test.xml recipe.xsl
Traceback (innermost last):
File "/usr/bin/4xslt", line 5, in ?
_4xslt.Run(sys.argv)
File
"/usr/lib/python1.5/site-packages/xml/xslt/_4xslt.py",
line 113, in Run
topLevelParams=top_level_params)
File
"/usr/lib/python1.5/site-packages/xml/xslt/Processor.py",
line 150, in runUri
writer, uri, outputStream)
File
"/usr/lib/python1.5/site-packages/xml/xslt/Processor.py",
line 250, in execute
self.applyTemplates(context, None)
File
"/usr/lib/python1.5/site-packages/xml/xslt/Processor.py",
line 267, in applyTemplates
found = sty.applyTemplates(context, mode, self, params)
File
"/usr/lib/python1.5/site-packages/xml/xslt/Stylesheet.py",
line 430, in applyTemplates
patternInfo[PatternInfo.TEMPLATE].instantiate(context,
processor, params)
File
"/usr/lib/python1.5/site-packages/xml/xslt/TemplateElement.py",
line 114, in instantiate
context = child.instantiate(context, processor)[0]
File
"/usr/lib/python1.5/site-packages/xml/xslt/ApplyTemplatesElement.py",
line 93, in instantiate
processor.applyTemplates(context, mode, params)
File
"/usr/lib/python1.5/site-packages/xml/xslt/Processor.py",
line 271, in applyTemplates
self.applyBuiltins(context, mode)
File
"/usr/lib/python1.5/site-packages/xml/xslt/Processor.py",
line 284, in applyBuiltins
self.applyTemplates(context, mode)
File
"/usr/lib/python1.5/site-packages/xml/xslt/Processor.py",
line 267, in applyTemplates
found = sty.applyTemplates(context, mode, self, params)
File
"/usr/lib/python1.5/site-packages/xml/xslt/Stylesheet.py",
line 430, in applyTemplates
patternInfo[PatternInfo.TEMPLATE].instantiate(context,
processor, params)
File
"/usr/lib/python1.5/site-packages/xml/xslt/TemplateElement.py",
line 112, in instantiate
new_level)
File
"/usr/lib/python1.5/site-packages/xml/xslt/ChooseElement.py",
line 61, in instantiate
context, chosen, rec_tpl_params =
child.instantiate(context, processor, new_level)
File
"/usr/lib/python1.5/site-packages/xml/xslt/WhenElement.py",
line 43, in instantiate
result = self._expr.evaluate(context)
File
"/usr/lib/python1.5/site-packages/xml/xpath/ParsedExpr.py",
line 369, in evaluate
rt = Conversions.BooleanEvaluate(self._right, context)
File
"/usr/lib/python1.5/site-packages/xml/xpath/Conversions.py",
line 33, in BooleanEvaluate
rt = exp.evaluate(context)
File
"/usr/lib/python1.5/site-packages/xml/xpath/ParsedExpr.py",
line 408, in evaluate
lrt = self._left.evaluate(context)
File
"/usr/lib/python1.5/site-packages/xml/xpath/ParsedExpr.py",
line 180, in evaluate
return self._func(context, arg0)
File
"/usr/lib/python1.5/site-packages/xml/xpath/CoreFunctions.py",
line 300, in Floor
if int(number) == number:
TypeError: object can't be converted to int
This is with 4Suite-0.11a2.
Cheers
Alexandre
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=420977&group_id=6473
From noreply@sourceforge.net Thu May 3 12:29:30 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 03 May 2001 04:29:30 -0700
Subject: [XML-SIG] [ pyxml-Bugs-421001 ] Undefined symbol XML_SetEntityDeclHandle
Message-ID:
Bugs item #421001, was updated on 2001-05-03 04:29
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421001&group_id=6473
Category: expat
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Undefined symbol XML_SetEntityDeclHandle
Initial Comment:
On FreeBSD 4.2 i386, with Python 2.0, PyXML 0.6.5, 4Suite 0.11a2 and 4SS 0.11a2 I get the following error:
File "/usr/local/bin/4ss", line 3, in ?
from FtServer.Console import CommandLine
File "/usr/local/lib/python2.0/site-packages/FtServer/Console/CommandLine.py", line 3, in ?
from Commands import g_commands
File "/usr/local/lib/python2.0/site-packages/FtServer/Console/Commands/__init__.py", line 2, in ?
import Init
File "/usr/local/lib/python2.0/site-packages/FtServer/Console/Commands/Init.py", line 15, in ?
from FtServer.Core.Lib import ConfigFile
File "/usr/local/lib/python2.0/site-packages/FtServer/Core/Lib/ConfigFile.py", line 2, in ?
from Ft.Rdf.Serializers.Dom import Serializer
File "/usr/local/lib/python2.0/site-packages/Ft/Rdf/Serializers/Dom.py", line 27, in ?
from Ft.Lib import pDomlette
File "/usr/local/lib/python2.0/site-packages/Ft/Lib/pDomlette.py", line 668, in ?
from pDomletteReader import *
File "/usr/local/lib/python2.0/site-packages/Ft/Lib/pDomletteReader.py", line 27, in ?
from xml.parsers import expat
File "/usr/local/lib/python2.0/site-packages/_xmlplus/parsers/expat.py", line 4, in ?
from pyexpat import *
ImportError: /usr/local/lib/python2.0/site-packages/_xmlplus/parsers/pyexpat.so: Undefined symbol "XML_SetEntityDeclHandler"
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421001&group_id=6473
From rsalz@zolera.com Thu May 3 14:58:45 2001
From: rsalz@zolera.com (Rich Salz)
Date: Thu, 03 May 2001 09:58:45 -0400
Subject: [XML-SIG] RE: SOAP for Python
References: <200105030423.f434Nga02058@localhost.local>
Message-ID: <3AF16415.A4BB8CD1@zolera.com>
> What is this naivete you're
> talking about?
If you asked 100 people who were building SOAP applications
Did you know we could both be compliant but use different data
transfers and therefore be unable to interoperate?
I'll bet more than half would be surprised, and more than 80% would say
"yes, but doesn't everyone at least support the common scheme."
I agree WSDL is way important, which is one of the motivators for a
web-services SIG.
I disagree that Sec5's inefficiencies doom it to failure, and it's
installed base will be enough to ensure it's viability. But that's a
simple bet whose answer we'll know in a couple of years. Not worth
arguing over.
/r$
From uche.ogbuji@fourthought.com Thu May 3 15:27:02 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 03 May 2001 08:27:02 -0600
Subject: [XML-SIG] RE: SOAP for Python
In-Reply-To: Message from Rich Salz
of "Thu, 03 May 2001 09:58:45 EDT." <3AF16415.A4BB8CD1@zolera.com>
Message-ID: <200105031427.f43ER2004499@localhost.local>
> > What is this naivete you're
> > talking about?
>
> If you asked 100 people who were building SOAP applications
> Did you know we could both be compliant but use different data
> transfers and therefore be unable to interoperate?
> I'll bet more than half would be surprised, and more than 80% would say
> "yes, but doesn't everyone at least support the common scheme."
As you said, time will tell, but you were talking about Web services, not
applications. I thought this is what the entire thread was about. I can
assure you that I have spoken to/worked with quite a few in the nascent Web
services space, and that most have learned not to take anything for granted,
as long as it is conformant.
You'd be surprised how much SOAP work is proceeding without Section 5.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From rsalz@zolera.com Thu May 3 15:31:17 2001
From: rsalz@zolera.com (Rich Salz)
Date: Thu, 03 May 2001 10:31:17 -0400
Subject: [XML-SIG] RE: SOAP for Python
References: <200105031427.f43ER2004499@localhost.local>
Message-ID: <3AF16BB5.117B7D45@zolera.com>
> You'd be surprised how much SOAP work is proceeding without Section 5.
Life surprises me.
:)
/r$
From stuff4gary@hotmail.com Thu May 3 20:48:34 2001
From: stuff4gary@hotmail.com (gary cor)
Date: Thu, 03 May 2001 19:48:34
Subject: [XML-SIG] Deleting and appending of a file, without reading into memory
Message-ID:
I want to add some text onto the end of an XML file just before the closing
tag but I don't want to read the whole file into memory as it is quite a
large file. I am trying to do the following:
1. delete 14 characters off the end of the file (the closing tag)
2. add some new data text from a cgi script onto this
ie - file.append(cgi_resxml)
3. - then add back on the closing tag (14 character '')
ie - file.append('')
I can manage (2.) & (3.) no problems opening the file handler with append
access ('a'), but I can't get into to do (1.) as well... does this append
function have a reverse function and can I use that, or should I be doing
this a differn't way?
Kind Regards
Gary
_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
From uche.ogbuji@fourthought.com Thu May 3 21:14:46 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 3 May 2001 14:14:46 -0600
Subject: [XML-SIG] ANN: 4Suite and 4Suite Server 0.11
Message-ID: <200105032014.f43KEkv08659@localhost.local>
Fourthought, Inc. (http://Fourthought.com) announces the release of
4Suite 0.11 and 4Suite Server 0.11
----------------------------
Open source XML processing tools and an XML data server
http://4Suite.org
http://Fourthought.com/4SuiteServer
4Suite Server News
------------------
Basically re-written from ground up. CORBA is no longer required
and is now just another way to access the server (along with HTTP,
SOAP, WebDAV, Python API, etc).
Many usability, documentation, performance and architectural
improvements
4Suite News
-----------
* Release 0.11.0 (Tag R20010501)
* pDomlette: XInclude implemented directly into parse for efficiency
* pDomlette: better modularized
* cDomlette: memory leaks squashed
* RDF: add command line
* RDF: major serialization and deserialization fixes
* RDF: Work access-control directly into RDF model
* RDF: API tweaks: use user flags for query flexibility
* XSLT: Many speedups
* XSLT: xsl:variable and xsl:param conformance fixes
* ODS: Many bugs fixes in the DbmAdapter
* Lib: Many bugs fixes in the DbmDriver
* Many misc optimizations and bug-fixes
4Suite is a collection of Python tools for XML processing and object
database management. It provides support for XML parsing, several
transient and persistent DOM implementations, XPath expressions,
XPointer, XSLT transforms, XLink, RDF and ODMG object databases.
4Suite Server is a platform for XML processing. It features an XML data
repository, metadata management, a rules-based engine, XSLT transforms,
XPath and RDF-based indexing and query, XLink resolution and many other
XML services. It also provides transactions and access control features.
Along with basic console and command-line management, it supports remote,
cross-platform and cross-language access through CORBA, WebDAV,
HTTP and other request protocols.
4Suite Server is not meant to be a full-blown application server. It
provides highly-specialized services for XML processing that can be used
with other application servers.
All the software is open-source and free to download. Priority support
and customization is available from Fourthought, Inc. For more
information on this, see the http://FourThought.com, or contact
Fourthought at info@fourthought.com or +1 303 583 9900
More info and Obtaining 4Suite and 4Suite Server
------------------------------------------------
Please see
http://4Suite.org
http://Fourthought.com/4SuiteServer
>From where you can download source, Windows and Linux binaries.
4Suite is distributed under a license similar to that of the
Apache Web Server.
From akuchlin@mems-exchange.org Thu May 3 21:19:19 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Thu, 3 May 2001 16:19:19 -0400
Subject: [XML-SIG] Deleting and appending of a file, without reading into memory
In-Reply-To: ; from stuff4gary@hotmail.com on Thu, May 03, 2001 at 07:48:34PM +0000
References:
Message-ID: <20010503161919.A3785@ute.cnri.reston.va.us>
On Thu, May 03, 2001 at 07:48:34PM +0000, gary cor wrote:
>I want to add some text onto the end of an XML file just before the closing
>tag but I don't want to read the whole file into memory as it is quite a
>large file. I am trying to do the following:
>
>1. delete 14 characters off the end of the file (the closing tag)
...
This is fragile; what if there is trailing whitespace at the end of
the file? What if the closing tag is written strangely, as '< /
closing >' or something like that?
Now, what's the best way to do this? You could write a simple SAX
handler where startElement() and characters() printed their input to a
file or to standard output, and then have an endElement() that outputs
a closing tag, first checking if it's the root element and inserting
the extra content. Is there a better way?
--amk
From Mike.Olson@fourthought.com Thu May 3 21:31:10 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Thu, 03 May 2001 14:31:10 -0600
Subject: [XML-SIG] Deleting and appending of a file, without reading into
memory
References: <20010503161919.A3785@ute.cnri.reston.va.us>
Message-ID: <3AF1C00E.83423203@FourThought.com>
Andrew Kuchling wrote:
>
> On Thu, May 03, 2001 at 07:48:34PM +0000, gary cor wrote:
> >I want to add some text onto the end of an XML file just before the closing
> >tag but I don't want to read the whole file into memory as it is quite a
> >large file. I am trying to do the following:
> >
> >1. delete 14 characters off the end of the file (the closing tag)
> ...
>
> This is fragile; what if there is trailing whitespace at the end of
> the file? What if the closing tag is written strangely, as '< /
> closing >' or something like that?
>
> Now, what's the best way to do this? You could write a simple SAX
> handler where startElement() and characters() printed their input to a
> file or to standard output, and then have an endElement() that outputs
> a closing tag, first checking if it's the root element and inserting
> the extra content. Is there a better way?
If the doc is that big, what about breaking it into smaller docs and
using XInclude?
Then to add a new section, load the "hub" document (which will be pretty
small now) and add a new include tag. Then write the new content to the
referenced file.
Mike
>
> --amk
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From martin@loewis.home.cs.tu-berlin.de Thu May 3 23:04:22 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 4 May 2001 00:04:22 +0200
Subject: [XML-SIG] Deleting and appending of a file, without reading into memory
In-Reply-To: <20010503161919.A3785@ute.cnri.reston.va.us> (message from Andrew
Kuchling on Thu, 3 May 2001 16:19:19 -0400)
References: <20010503161919.A3785@ute.cnri.reston.va.us>
Message-ID: <200105032204.f43M4M401839@mira.informatik.hu-berlin.de>
> This is fragile; what if there is trailing whitespace at the end of
> the file? What if the closing tag is written strangely, as '< /
> closing >' or something like that?
If this CGI script is the only application that ever modifies the
document, the approach seems fine to me - although it is certainly
questionable why to use XML in the first place, here.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Thu May 3 23:03:22 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 4 May 2001 00:03:22 +0200
Subject: [XML-SIG] Deleting and appending of a file, without reading into memory
In-Reply-To:
(stuff4gary@hotmail.com)
References:
Message-ID: <200105032203.f43M3Mi01837@mira.informatik.hu-berlin.de>
> 1. delete 14 characters off the end of the file (the closing tag)
> 2. add some new data text from a cgi script onto this
> ie - file.append(cgi_resxml)
> 3. - then add back on the closing tag (14 character '')
> ie - file.append('')
>
> I can manage (2.) & (3.) no problems opening the file handler with append
> access ('a'), but I can't get into to do (1.) as well... does this append
> function have a reverse function and can I use that, or should I be doing
> this a differn't way?
What kind of file object do you have that has an append function?
I'd use f.seek to go 14 characters before the end, and start writing
there. Some operating systems don't even support truncation to a
certain size; they all support positioning to a given offset, though.
Regards,
Martin
From uche.ogbuji@fourthought.com Fri May 4 02:42:31 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 03 May 2001 19:42:31 -0600
Subject: [XML-SIG] A bit o' challenge
Message-ID: <200105040142.f441gVD10270@localhost.local>
OK, so the conventional wisdom lately has been that the Java processors such
as Xalan and Saxon cream 4XSLT for performance across the board. Alexandre
Fayolle said that he thought they were "orders of magnitude" faster.
Well, I know that one always does better in his own benchmarking, but I have
been working with 4XSLT quite heavily in the time leading up to the 0.11
release, and I'm having trouble crediting this impression. 4XSLT is to my
observations (and measurements using the time command-line timer) a good 25%
faster than Saxon and faster by an even greater proportion than Xalan for most
small to medium tasks.
I have indeed noticed that on huge documents, such as the "Cemetary" benchmark
(3MB source), Saxon 6.0.2 is up to 4 times faster than 4XSLT (similar for
Xalan), but this is still not "orders of magnitude" faster, and this only
seems to be true for the size and type of document I'd only expect to process
in benchmarks.
Now one note: I *always* use cDomlette. It is much faster than pDomlette, and
that is why I've declared that I'll be working to make it the default in
4Suite as of 0.11.1. Once again, I encourage everyone to help shake out any
remnant bugs in cDomlette. See this posting for more info:
http://lists.fourthought.com/pipermail/4suite/2001-April/001780.html
So here's the bit o' challenge. I'm looking for regular-sized, real-world
transforms in which Saxon or Xalan smoke 4XSLT. If you have such test cases,
and can reliably reproduce 4XSLT's lassitude using cDomlette, please send it
my way so I can have a look (and maybe find the performance bugs that I'm too
close to see).
I'm also interested, of course, in hearing positive reports about 4XSLT's
performance.
So I say 4XSLT is competitive, and as far as I can tell, is usually faster
than the Java processors (though we can't touch MSXML yet).
P.S. What got me starting to ponder was DataPower's benchmark that showed
4XSLT some 20 times slower than the group of Java processors. The nonsense
behind this I was able to grasp with one glance at their tortured "driver" for
4XSLT. I've made my complaints about their incompetence, but here's your
chance to show I'm all wet.
Thanks, all.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From noreply@sourceforge.net Fri May 4 03:04:21 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 03 May 2001 19:04:21 -0700
Subject: [XML-SIG] [ pyxml-Patches-421217 ] ImportError shoudl be AttributeError
Message-ID:
Patches item #421217, was updated on 2001-05-03 19:04
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=421217&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Karl Anderson (karlanderson)
Assigned to: Nobody/Anonymous (nobody)
Summary: ImportError shoudl be AttributeError
Initial Comment:
I'm running a recent CVS checkout of PyXML, and have
4Suite 0.11a2 installed (I haven't installed 4Suite
0.11, but it doesn't look to be different in this
regard).
I'm running Python 1.5.2 under Redhat 6.2.
I get an attribute error when I try to import xslt.
StylesheetReader.py seems to be catching ImportError
when it should catch AttributeError for me.
The intended import seems to be
Ft.Lib.Error.XML_PARSE_ERROR, not
Ft.Lib.XML_PARSE_ERROR.
This happens before my patch:
>>> import sys
>>> sys.path.insert(0,
'/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages')
# where I installed from CVS
>>> import xml
>>> from xml.xslt import Processor
Traceback (innermost last):
File "", line 1, in ?
File
"/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages/xml/xslt/Processor.py",
line 24, in ?
from xml.xslt import StylesheetReader, ReleaseNode
File
"/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages/xml/xslt/StylesheetReader.py",
line 67, in ?
XML_PARSE_ERROR = Ft.Lib.XML_PARSE_ERROR
AttributeError: XML_PARSE_ERROR
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=421217&group_id=6473
From cperez@zulunet.net Fri May 4 18:21:51 2001
From: cperez@zulunet.net (Carlos Perez)
Date: Fri, 4 May 2001 13:21:51 -0400
Subject: [XML-SIG] Looking for XML to Python sequence code.
In-Reply-To:
Message-ID: <001b01c0d4be$b5c2a140$fd0aa8c0@CPEREZ>
I'm looking for some Python code that convert XML to a Python native
sequence object.
Does anyone know where to get it?
Thanks in advance...
From dieter@handshake.de Fri May 4 19:19:41 2001
From: dieter@handshake.de (Dieter Maurer)
Date: Fri, 4 May 2001 20:19:41 +0200 (CEST)
Subject: [XML-SIG] Re: [4suite] A bit o' challenge
In-Reply-To: <782485507@toto.iv>
Message-ID: <15090.62141.831609.704073@lindm.dm>
Uche Ogbuji writes:
> ...
> Well, I know that one always does better in his own benchmarking, but I have
> been working with 4XSLT quite heavily in the time leading up to the 0.11
> release, and I'm having trouble crediting this impression. 4XSLT is to my
> observations (and measurements using the time command-line timer) a good 25%
> faster than Saxon and faster by an even greater proportion than Xalan for most
> small to medium tasks.
When I used 4XSLT for the last time, it was version 0.9.
I transformed a 240 kb DocBook/XML file into HTML using Norman Walsh's
DocBook stylesheets.
4XSLT needed about 50 MB memory and about 30 min CPU time (slow
Pentium 100 MHZ with 64 MB main memory).
A colleague of mine used Saxon for his DocBook/XML documentation,
also with Normal Walsh's stylesheets. Runtime was in the order
of a minute. I should say, it was a very different machine (Sun E450
with 256MB memory).
But nevertheless, I expect that after normalization Saxon
was several times faster than 4XSLT.
I was especially horrified by the high memory requirements.
The mentioned document is one out of eight chapters of a book.
In the final production, the complete book must be processed
together (to get correct links, table of contents, indexes,...).
I fear, I would need 200 MB memory and several hours of processing
time ....
> ....
> So here's the bit o' challenge. I'm looking for regular-sized, real-world
> transforms in which Saxon or Xalan smoke 4XSLT. If you have such test cases,
> and can reliably reproduce 4XSLT's lassitude using cDomlette, please send it
> my way so I can have a look (and maybe find the performance bugs that I'm too
> close to see).
I will give it a try, when 0.11 is released and report back.
Dieter
From rsalz@zolera.com Fri May 4 20:36:27 2001
From: rsalz@zolera.com (Rich Salz)
Date: Fri, 04 May 2001 15:36:27 -0400
Subject: [XML-SIG] xmlproc bug?
Message-ID: <3AF304BB.D5ECB468@zolera.com>
If you feed() a unicode string into an xmlproc parser, Python barfs at
line 234
# ignore unusal byte orders 2143 and 3412
elif new_data[:2] == '\xfe\xff':
enc = "utf-16-be" # with BOM
because apparently it is trying to convert the string to unicode and
it's got 8bit characters.
Not sure what the right thing to do is. here's a three-line script that
shows the fault
from xml.parsers.xmlproc import xmlproc
z = xmlproc.XMLProcessor()
z.feed(u'')
/r$
From larsga@garshol.priv.no Fri May 4 21:22:07 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 04 May 2001 22:22:07 +0200
Subject: [XML-SIG] xmlproc bug?
In-Reply-To: <3AF304BB.D5ECB468@zolera.com>
References: <3AF304BB.D5ECB468@zolera.com>
Message-ID:
* Rich Salz
|
| If you feed() a unicode string into an xmlproc parser, Python barfs at
| line 234
| # ignore unusal byte orders 2143 and 3412
| elif new_data[:2] == '\xfe\xff':
| enc = "utf-16-be" # with BOM
|
| because apparently it is trying to convert the string to unicode and
| it's got 8bit characters.
The problem here is that we are trying to autodetect the encoding of a
Unicode string, but a Unicode string is already in Unicode and so
needs no decoding.
You can solve this by setting the decoded parameter to feed to 1, but
it would be better if you did not have to.
Fixed it by doing the following:
Index: xml/parsers/xmlproc/xmlutils.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/parsers/xmlproc/xmlutils.py,v
retrieving revision 1.16
diff -c -r1.16 xmlutils.py
***************
*** 285,290 ****
--- 285,295 ----
new_data = new_data+self.encoded_data
self.encoded_data = ""
+
+ if not decoded and using_unicode and \
+ type(new_data) == types.UnicodeType:
+ decoded = 1
+
if not decoded and not self.charset_converter:
self.autodetect_encoding(new_data)
# If this returns with no auto-detected encoding, i.e. if
I need to check it first before committing it, but this should solve
the problem. (Am waiting for glibc to download, so that I can compile
Python 2.1, so that I can actually test this. The download is going
slowly, so I am posting before the commit.)
--Lars M.
From noreply@sourceforge.net Fri May 4 21:36:15 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 04 May 2001 13:36:15 -0700
Subject: [XML-SIG] [ pyxml-Bugs-421488 ] xslt processor stylesheet reader error
Message-ID:
Bugs item #421488, was updated on 2001-05-04 13:36
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421488&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Karl Anderson (karlanderson)
Assigned to: Nobody/Anonymous (nobody)
Summary: xslt processor stylesheet reader error
Initial Comment:
Can't append stylesheet. Stylesheet reader wants to
call initParser().
I'm not giving the processor a reader, it's using the
default.
When I run without Ft installed, the reader is
MinidomReader, which doesn't define this.
When I run with Ft installed from 4Suite 0.11, the
reader is DomletteReader, which also gives this error.
initParser is defined on the pDomletteReader
ReaderMixin class, but not defined anywhere on a reader
that the processor gets by default, AFAICT.
>>> p = Processor.Processor()
p.appendStylesheetString(sheet_4)
>>> Traceback (innermost last):
File "", line 1, in ?
File
"/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages/xml/xslt/Processor.py",
line 106, in appendStylesheetString
sty = self._styReader.fromString(text, baseUri)
File
"/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages/xml/xslt/minisupport.py",
line 62, in fromString
return self.fromStream(st, baseUri, ownerDoc,
stripElements)
File
"/home/karl/zope/dist/xml/PyXML-cvs-install/lib/python1.5/site-packages/xml/xslt/StylesheetReader.py",
line 305, in fromStream
self.initParser()
AttributeError: initParser
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421488&group_id=6473
From uche.ogbuji@fourthought.com Fri May 4 21:52:25 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Fri, 04 May 2001 14:52:25 -0600
Subject: [XML-SIG] Re: [4suite] A bit o' challenge
References: <15090.62141.831609.704073@lindm.dm>
Message-ID: <3AF31689.A8895445@fourthought.com>
Dieter Maurer wrote:
>
> Uche Ogbuji writes:
> > ...
> > Well, I know that one always does better in his own benchmarking, but I have
> > been working with 4XSLT quite heavily in the time leading up to the 0.11
> > release, and I'm having trouble crediting this impression. 4XSLT is to my
> > observations (and measurements using the time command-line timer) a good 25%
> > faster than Saxon and faster by an even greater proportion than Xalan for most
> > small to medium tasks.
> When I used 4XSLT for the last time, it was version 0.9.
>
> I transformed a 240 kb DocBook/XML file into HTML using Norman Walsh's
> DocBook stylesheets.
>
> 4XSLT needed about 50 MB memory and about 30 min CPU time (slow
> Pentium 100 MHZ with 64 MB main memory).
I did specifically mention working with cDomlette. Is that what you
were using?
> A colleague of mine used Saxon for his DocBook/XML documentation,
> also with Normal Walsh's stylesheets. Runtime was in the order
> of a minute. I should say, it was a very different machine (Sun E450
> with 256MB memory).
> But nevertheless, I expect that after normalization Saxon
> was several times faster than 4XSLT.
>
> I was especially horrified by the high memory requirements.
> The mentioned document is one out of eight chapters of a book.
> In the final production, the complete book must be processed
> together (to get correct links, table of contents, indexes,...).
> I fear, I would need 200 MB memory and several hours of processing
> time ....
cDomlette takes up about half the memory as pDomlette. In some cases
(since it uses string pooling) this might be more or less the
proportion.
When I checked with the 3MB cemetary demo, 4XSLT+cDom 0.11a2 took up
42MB and Saxon 6.0.2 took up 33MB of RAM.
> > ....
> > So here's the bit o' challenge. I'm looking for regular-sized, real-world
> > transforms in which Saxon or Xalan smoke 4XSLT. If you have such test cases,
> > and can reliably reproduce 4XSLT's lassitude using cDomlette, please send it
> > my way so I can have a look (and maybe find the performance bugs that I'm too
> > close to see).
> I will give it a try, when 0.11 is released and report back.
0.11 was released yesterday.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From martin@loewis.home.cs.tu-berlin.de Fri May 4 22:39:33 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 4 May 2001 23:39:33 +0200
Subject: [XML-SIG] xmlproc bug?
In-Reply-To: <3AF304BB.D5ECB468@zolera.com> (message from Rich Salz on Fri, 04
May 2001 15:36:27 -0400)
References: <3AF304BB.D5ECB468@zolera.com>
Message-ID: <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de>
> If you feed() a unicode string into an xmlproc parser, Python barfs at
> line 234
> # ignore unusal byte orders 2143 and 3412
> elif new_data[:2] == '\xfe\xff':
> enc = "utf-16-be" # with BOM
>
> because apparently it is trying to convert the string to unicode and
> it's got 8bit characters.
>
> Not sure what the right thing to do is.
My intuition is that feeding Unicode objects is an error, but that may
be debatable.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Fri May 4 23:28:08 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 5 May 2001 00:28:08 +0200
Subject: [XML-SIG] Looking for XML to Python sequence code.
In-Reply-To: <001b01c0d4be$b5c2a140$fd0aa8c0@CPEREZ>
References: <001b01c0d4be$b5c2a140$fd0aa8c0@CPEREZ>
Message-ID: <200105042228.f44MS8m02386@mira.informatik.hu-berlin.de>
> I'm looking for some Python code that convert XML to a Python native
> sequence object.
> Does anyone know where to get it?
Are you looking for a specific structure of the sequence? If not,
try
seq = open(filename).read()
seq will be a Python native sequence object representing the XML
document :-)
Regards,
Martin
From noreply@sourceforge.net Sat May 5 02:06:10 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 04 May 2001 18:06:10 -0700
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI
Message-ID:
Bugs item #421553, was updated on 2001-05-04 18:06
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421553&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Karl Anderson (karlanderson)
Assigned to: Nobody/Anonymous (nobody)
Summary: stylesheet node reader requires '' NSURI
Initial Comment:
I'm unable to use ParsedXML's DOM as a stylesheet node,
and I think
it's because of a bug in StylesheetReader.py.
The problem is at StylesheetReader.py line 186:
if not sheet.getAttributeNS('', 'version'):
raise
XsltException(Error.STYLESHEET_MISSING_VERSION)
...where the NamespaceURI given to getAttributeNS is
''. This is
supposed to find the namespace-free version attribute
of the
stylesheet documentElement, such as
"""
""".
ParsedXML's DOM builder gives this attribute a
NamespaceURI of None
when we parse.
I don't think that you can use the DOM methods to
create a node with a
NamespaceURI of "", since the NamespaceURI is supposed
to be a URI
reference. Is the empty string a valid URI reference?
Well, maybe -
the DOM level 2 rec says:
"""
Note that because the DOM does no lexical checking, the
empty string
will be treated as a real namespace URI in DOM Level 2
methods.
Applications must use the value null as the
namespaceURI parameter for
methods if they wish to have no namespace.
"""
But anyway, this indicates that when using DOM creation
methods, a
None should be used as the NamespaceURI for
namespaceless nodes such
as "version", and I think that the stylesheet reader
should accept
that.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421553&group_id=6473
From larsga@garshol.priv.no Sat May 5 10:26:45 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 05 May 2001 11:26:45 +0200
Subject: [XML-SIG] xmlproc bug?
In-Reply-To: <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de>
References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de>
Message-ID:
* Martin v. Loewis
|
| My intuition is that feeding Unicode objects is an error, but that may
| be debatable.
I see no reason why it should be. If the application is converting to
Unicode itself, or if it got the data from somewhere as Unicode, there
is no reason why it should not be allowed to parse those data.
--Lars M.
From martin@loewis.home.cs.tu-berlin.de Sat May 5 14:12:01 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 5 May 2001 15:12:01 +0200
Subject: [XML-SIG] xmlproc bug?
In-Reply-To: (message from Lars Marius
Garshol on 05 May 2001 11:26:45 +0200)
References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de>
Message-ID: <200105051312.f45DC1401103@mira.informatik.hu-berlin.de>
> I see no reason why it should be. If the application is converting to
> Unicode itself, or if it got the data from somewhere as Unicode, there
> is no reason why it should not be allowed to parse those data.
I agree in principle. However, just allowing to call feed with a
Unicode object is too permissive: What if you had previously called it
with a string?
So if this is allowed, care should be taken that a sensible thing
happens when somebody mixes byte and unicode strings (signalling a
fatal error might be sensible).
Regards,
Martin
From tpassin@home.com Sat May 5 17:11:46 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sat, 5 May 2001 12:11:46 -0400
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de>
Message-ID: <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com>
I've been able to get 4SuiteServer working on Windows98/Me, but it doesn't
quite work right out of the box as downloaded. Here's what I needed to do
to get it working.
I did all the steps in the installation and quickstart guide. Once it's
installed using the WIndows installer, you set up your environmental
variables, then you are told to run
4ss init
1) 4ss.bat is in the python\scripts directory, so you have to add it to your
path or have to be running in that directory.
2) The init command fails because the file "core.odl" is not installed into
the "generated" directory (or anywhere else) by the installer. I downloaded
the source distribution, found the file, and copied into the generated
directory.
Now init works.
3) init works, but when it asks if you want to wipe out the old data, it
wants you to answer "yes" or "no". Most Windows users are used to being
able to answer 'y' or 'n' to those questions. I did, and didn't even notice
that I hadn't literally done what the prompt said. Very Unix-like. Very
unforgiving. This code should be changed to allow "y" and "n" as well.
4) The quick start guide has you run the script populate.py in the
python\docs\4SuiteServer-0.11\demo directory. But it fails, looking for a
unix file, something like /etc/mime.types. The script has a test for this
file and an except branch to run in case it doesn't exist (which it doesn't
on a Windows machine). But the except branch incorrectly has a "raise"
statement which terminates the script.
Get rid of this line, which is line 66 of populate.py. Now the script runs.
5) At this point, populate installed its downloaded files but failed when it
tried to modify "docdefs". It turns out you have to be running as superuser
to change docdefs. The guide doesn't tell you, but implies that you should
have run as the new user it just had you create. Otherwise, why create that
user just before running populate.py?
I deleted the whole "gems" container and went through the steps again as
superuser.
6) Then I tried to install and run the guestbook. You have to run the
"bootstrap.py" script in the demo\GuestBook directory. This failed. It
turned out that you have to change to the GuestBook directory and run from
that, otherwise the script can't find the files it needs.
7) The Guestbook works until you try to submit the form for your first
guest. Then it fails, but in a strange way. With IE, I got an error
message saying it couldn't find the server or there was a DNS error. This
must be an incorrect message since the form uses a relative path, but anyway
something isn't working that I haven't tracked down.
8) The docs give examples of looking at various properties by their path, as
in
4ss show acl /localhost/index.html
None of these commands have worked for me. I had to remove the /localhost/
part. My server was running at the time.
I think there was one more change I made to get init to work - there is a
path with a unix-style "/" hardcoded somewhere - but unfortunately I forget
just where and can't find it right now. If this strikes, you should be able
to find it from the error message.
It runs now - I have it on port 8090 to avoid colliding with Zope in 8080.
Good luck!
Cheers,
Tom P
From Mike.Olson@fourthought.com Sat May 5 23:43:44 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sat, 05 May 2001 16:43:44 -0600
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com>
Message-ID: <3AF48220.B9E97B23@FourThought.com>
"Thomas B. Passin" wrote:
>
Thomas, thanks for all of the work, I'm working today on getting these
straigtened out.
>
> 4ss init
>
> 1) 4ss.bat is in the python\scripts directory, so you have to add it to your
> path or have to be running in that directory.
This is in the Windows Installation guide.
see
http://4suite.org/4Suite.org/documents/guides/4SuiteServer/Windows_Installation
Towards the end of the Installing 4SuiteServer section.
>
> 2) The init command fails because the file "core.odl" is not installed into
> the "generated" directory (or anywhere else) by the installer. I downloaded
> the source distribution, found the file, and copied into the generated
> directory.
This was a packaging bug. We will be putting out new Windows packages
today.
>
> Now init works.
>
> 3) init works, but when it asks if you want to wipe out the old data, it
> wants you to answer "yes" or "no". Most Windows users are used to being
> able to answer 'y' or 'n' to those questions. I did, and didn't even notice
> that I hadn't literally done what the prompt said. Very Unix-like. Very
> unforgiving. This code should be changed to allow "y" and "n" as well.
I'll make this more forgiving, and more informative.
>
> 4) The quick start guide has you run the script populate.py in the
> python\docs\4SuiteServer-0.11\demo directory. But it fails, looking for a
> unix file, something like /etc/mime.types. The script has a test for this
> file and an except branch to run in case it doesn't exist (which it doesn't
> on a Windows machine). But the except branch incorrectly has a "raise"
> statement which terminates the script.
>
> Get rid of this line, which is line 66 of populate.py. Now the script runs.
Fixed in CVS, thanks.
>
> 5) At this point, populate installed its downloaded files but failed when it
> tried to modify "docdefs". It turns out you have to be running as superuser
> to change docdefs. The guide doesn't tell you, but implies that you should
> have run as the new user it just had you create. Otherwise, why create that
> user just before running populate.py?
I updated the docs to say that populate needs to be run as super user.
I might change it so that any user can create a document definition
though.
>
> 6) Then I tried to install and run the guestbook. You have to run the
> "bootstrap.py" script in the demo\GuestBook directory. This failed. It
> turned out that you have to change to the GuestBook directory and run from
> that, otherwise the script can't find the files it needs.
I updated the README
>
> 7) The Guestbook works until you try to submit the form for your first
> guest. Then it fails, but in a strange way. With IE, I got an error
> message saying it couldn't find the server or there was a DNS error. This
> must be an incorrect message since the form uses a relative path, but anyway
> something isn't working that I haven't tracked down.
I'll have to look into this one a bit closer....
>
> 8) The docs give examples of looking at various properties by their path, as
> in
>
> 4ss show acl /localhost/index.html
Did you get an error of:
Uri /localhost/index.html, is unknown
>
> None of these commands have worked for me. I had to remove the /localhost/
> part. My server was running at the time.
Did it work when you removed the localhost part? Then you probably have
a document in your root called index.html. Probably from the Guestbook
example. That souldn't put things in your root.
thanks again for your help. Hopfully it is getting easier to install.
Mike
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From tpassin@home.com Sun May 6 05:02:00 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sun, 6 May 2001 00:02:00 -0400
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> <3AF48220.B9E97B23@FourThought.com>
Message-ID: <000c01c0d5e1$4d89ac80$7cac1218@reston1.va.home.com>
[Tom]
> >
> > 8) The docs give examples of looking at various properties by their
path, as
> > in
> >
> > 4ss show acl /localhost/index.html
>
[Mike Olson]
> Did you get an error of:
> Uri /localhost/index.html, is unknown
>
Yes
[Tom]
> >
> > None of these commands have worked for me. I had to remove the
/localhost/
> > part. My server was running at the time.
[Mike]
> Did it work when you removed the localhost part? Then you probably have
> a document in your root called index.html. Probably from the Guestbook
> example. That souldn't put things in your root.
>
No, same message with or without /localhost.
Here are two screen captures:
D:>4ss show acl /localhost/gems/
"d:\program files\python\python" -c "from FtServer.Console import
CommandLine; C
ommandLine.Run()" show acl /localhost/gems/
4SS User Name: dba
Uri /localhost/gems, is unknown
D:>4ss show acl gems/
"d:\program files\python\python" -c "from FtServer.Console import
CommandLine; C
ommandLine.Run()" show acl gems/
4SS User Name: dba
Resource: gems/
----------
Read ACL: ['dba']
Write ACL: ['admin']
You can read this object
You can modify this object
As for an index.html in the root:
D:>4ss fetch document /localhost/index.html
"d:\program files\python\python" -c "from FtServer.Console import
CommandLine; C
ommandLine.Run()" fetch document /localhost/index.html
4SS User Name: dba
Uri /localhost/index.html, is unknown
I just noticed that, at http://localhost:8090/ ( my 4ss site), there is a
"folder" called localhost/. Is that how it's supposed to be? If so, I'd
suggest changing the name because it could get confused (by a user - me for
example!) with the "localhost" alias for 127.0.0.1.
Thanks for your help.
Tom P
From Mike.Olson@fourthought.com Sun May 6 05:13:24 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sat, 05 May 2001 22:13:24 -0600
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> <3AF48220.B9E97B23@FourThought.com> <000c01c0d5e1$4d89ac80$7cac1218@reston1.va.home.com>
Message-ID: <3AF4CF64.A0B32045@FourThought.com>
"Thomas B. Passin" wrote:
>
>
>
> I just noticed that, at http://localhost:8090/ ( my 4ss site), there is a
> "folder" called localhost/. Is that how it's supposed to be? If so, I'd
> suggest changing the name because it could get confused (by a user - me for
> example!) with the "localhost" alias for 127.0.0.1.
Yes we need to change the default name of the SystemHost directory. It
was less confusing when we put http infront of all of the URIs, now it
is just confusing. I was thinking of callint it "etc" but I think
windows folks might not like that.
Thoughts?
Mike
>
> Thanks for your help.
>
> Tom P
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From tpassin@home.com Sun May 6 05:26:01 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sun, 6 May 2001 00:26:01 -0400
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> <3AF48220.B9E97B23@FourThought.com>
Message-ID: <001001c0d5e4$a896f8a0$7cac1218@reston1.va.home.com>
Another problem, the 4ss test_suite fails with this message:
D:>test.py
4SS User Name: dba
==== D:\Program Files\Python\Doc\4SuiteServer-0.11\test_suite ===
==== D:\Program Files\Python\Doc\4SuiteServer-0.11\test_suite\Core ===
Traceback (innermost last):
File "D:\Program Files\Python\Doc\4SuiteServer-0.11\test_suite\test.py",
line
29, in ?
test(tester)
File "D:\Program Files\Python\Doc\4SuiteServer-0.11\test_suite\test.py",
line
18, in test
m.test(tester)
File "D:\Program Files\Python\Doc\4SuiteServer-0.11\test_suite\test.py",
line
14, in test
os.chdir(dir)
OSError: [Errno 2] No such file or directory: 'Core'
I ran this script from the test_suite directory. Note that I added an extra
print statement to see what directory it couldn't find. It looks as if the
test.py script calls itself the second time rather than calling the test.py
located in the test_suite\Core directory. I'm sure this wasn't intended.
This would be a good time for me to put in a plug to make scripts that
depend on knowing where other files are relative to themselves, detect their
own location. You may have to make the script a module to do this reliably
(I'm not fully up on all the ins and outs, but if you do it right then
__file__ gives you the full path to the script).
Cheers,
Tom P
From tpassin@home.com Sun May 6 05:29:28 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sun, 6 May 2001 00:29:28 -0400
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> <3AF48220.B9E97B23@FourThought.com> <000c01c0d5e1$4d89ac80$7cac1218@reston1.va.home.com> <3AF4CF64.A0B32045@FourThought.com>
Message-ID: <001701c0d5e5$23fa30c0$7cac1218@reston1.va.home.com>
[Mike Olson]"
> "Thomas B. Passin" wrote:
> >
> >
> >
> > I just noticed that, at http://localhost:8090/ ( my 4ss site), there is
a
> > "folder" called localhost/. Is that how it's supposed to be? If so,
I'd
> > suggest changing the name because it could get confused (by a user - me
for
> > example!) with the "localhost" alias for 127.0.0.1.
>
> Yes we need to change the default name of the SystemHost directory. It
> was less confusing when we put http infront of all of the URIs, now it
> is just confusing. I was thinking of callint it "etc" but I think
> windows folks might not like that.
>
[Tom]
Depends on what you intend it to be for. It should have an evocative name.
I see the docdefs and acl stuff in mine. How about sscfg?
Cheers,
Tom P
From Mike.Olson@fourthought.com Sun May 6 09:11:06 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 06 May 2001 02:11:06 -0600
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de> <000b01c0d57e$15cfa820$7cac1218@reston1.va.home.com> <3AF48220.B9E97B23@FourThought.com> <000c01c0d5e1$4d89ac80$7cac1218@reston1.va.home.com> <3AF4CF64.A0B32045@FourThought.com> <001701c0d5e5$23fa30c0$7cac1218@reston1.va.home.com>
Message-ID: <3AF5071A.7AE1CA56@FourThought.com>
"Thomas B. Passin" wrote:
> >
> > Yes we need to change the default name of the SystemHost directory. It
> > was less confusing when we put http infront of all of the URIs, now it
> > is just confusing. I was thinking of callint it "etc" but I think
> > windows folks might not like that.
> >
>
> [Tom]
> Depends on what you intend it to be for. It should have an evocative name.
> I see the docdefs and acl stuff in mine. How about sscfg?
Maybe just system
Mike
>
> Cheers,
>
> Tom P
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Sun May 6 14:50:42 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 06 May 2001 07:50:42 -0600
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
In-Reply-To: Message from Mike Olson
of "Sun, 06 May 2001 02:11:06 MDT." <3AF5071A.7AE1CA56@FourThought.com>
Message-ID: <200105061350.f46Dog503966@localhost.local>
> "Thomas B. Passin" wrote:
> > >
> > > Yes we need to change the default name of the SystemHost directory. It
> > > was less confusing when we put http infront of all of the URIs, now it
> > > is just confusing. I was thinking of callint it "etc" but I think
> > > windows folks might not like that.
> > >
> >
> > [Tom]
> > Depends on what you intend it to be for. It should have an evocative name.
> > I see the docdefs and acl stuff in mine. How about sscfg?
>
> Maybe just system
Well, I still think it should be configurable (as it used to be, if only
through the host-name). Don't forget our non-english speaking friends, and
others who might want a user folder called "system"
For a default, I favor "4sssystem", "sys4ss" or something like that which is
unlikely to clash with user need.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From tpassin@home.com Sun May 6 15:17:36 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sun, 6 May 2001 10:17:36 -0400
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
References: <200105061350.f46Dog503966@localhost.local>
Message-ID: <000d01c0d637$4cfd7c00$7cac1218@reston1.va.home.com>
[Uche Ogbuji]
> Well, I still think it should be configurable (as it used to be, if only
> through the host-name). Don't forget our non-english speaking friends,
and
> others who might want a user folder called "system"
>
> For a default, I favor "4sssystem", "sys4ss" or something like that which
is
> unlikely to clash with user need.
>
Right on.
Tom P
From Mike.Olson@fourthought.com Sun May 6 19:24:22 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 06 May 2001 12:24:22 -0600
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
References: <200105061350.f46Dog503966@localhost.local>
Message-ID: <3AF596D6.5AACFD17@FourThought.com>
Uche Ogbuji wrote:
>
> > "Thomas B. Passin" wrote:
> > > >
> > > > Yes we need to change the default name of the SystemHost directory. It
> > > > was less confusing when we put http infront of all of the URIs, now it
> > > > is just confusing. I was thinking of callint it "etc" but I think
> > > > windows folks might not like that.
> > > >
> > >
> > > [Tom]
> > > Depends on what you intend it to be for. It should have an evocative name.
> > > I see the docdefs and acl stuff in mine. How about sscfg?
> >
> > Maybe just system
>
> Well, I still think it should be configurable (as it used to be, if only
> through the host-name). Don't forget our non-english speaking friends, and
> others who might want a user folder called "system"
It would still be configurable through the SystemHost parameters. Maybe
this should be renamed to the SystemContainer parameter.
>
> For a default, I favor "4sssystem", "sys4ss" or something like that which is
> unlikely to clash with user need.
i like sys4ss then.
Mike
>
> --
> Uche Ogbuji Principal Consultant
> uche.ogbuji@fourthought.com +1 303 583 9900 x 101
> Fourthought, Inc. http://Fourthought.com
> 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
> Software-engineering, knowledge-management, XML, CORBA, Linux, Python
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From tpassin@home.com Sun May 6 20:05:01 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Sun, 6 May 2001 15:05:01 -0400
Subject: [XML-SIG] Getting 4SuiteServer 0.11 Working on Windows
References: <200105061350.f46Dog503966@localhost.local> <3AF596D6.5AACFD17@FourThought.com>
Message-ID: <002901c0d65f$73fbe8a0$7cac1218@reston1.va.home.com>
[Mike Olson]
> > For a default, I favor "4sssystem", "sys4ss" or something like that
which is
> > unlikely to clash with user need.
>
> i like sys4ss then.
>
Suits me (or is that "4suites me"???)
Tom P
From noreply@sourceforge.net Mon May 7 10:46:17 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Mon, 07 May 2001 02:46:17 -0700
Subject: [XML-SIG] [ pyxml-Bugs-421978 ] pDomlette reader bug
Message-ID:
Bugs item #421978, was updated on 2001-05-07 02:46
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421978&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: pDomlette reader bug
Initial Comment:
Hi,
I'm trying to build a pDomlette with a custom Sax
parser, and it looks like the provided handler expects
the parser to implement a SetBase method. I could not
find it in the Sax documentation.
Providing an empty SetBase() method leads to errors
when accessing to parseFile() (instead of parse()), and
further errors in the except clause.
Did I miss something or is this a bug?
Alexandre Fayolle
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421978&group_id=6473
From larsga@garshol.priv.no Mon May 7 13:27:47 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 07 May 2001 14:27:47 +0200
Subject: [XML-SIG] xmlproc bug?
In-Reply-To: <200105051312.f45DC1401103@mira.informatik.hu-berlin.de>
References: <3AF304BB.D5ECB468@zolera.com> <200105042139.f44LdXN01714@mira.informatik.hu-berlin.de> <200105051312.f45DC1401103@mira.informatik.hu-berlin.de>
Message-ID:
* Lars Marius Garshol
|
| I see no reason why it should be. If the application is converting
| to Unicode itself, or if it got the data from somewhere as Unicode,
| there is no reason why it should not be allowed to parse those data.
* Martin v. Loewis
|
| I agree in principle. However, just allowing to call feed with a
| Unicode object is too permissive: What if you had previously called it
| with a string?
Good point. One should have to stick to either Unicode or byte strings
throughout a single parse.
Looking at the code I think it makes sense to require client code to
also be consistent in its use of the 'decoded' flag. That is, decoded
should always have the same value throughout an entire parse.
| So if this is allowed,
It is allowed now, since I've committed my change.
| care should be taken that a sensible thing happens when somebody
| mixes byte and unicode strings (signalling a fatal error might be
| sensible).
I agree.
I am working on the modification now and will commit it shortly.
--Lars M.
From akrug@mps.de Mon May 7 17:31:53 2001
From: akrug@mps.de (Arne Krug)
Date: Mon, 7 May 2001 18:31:53 +0200
Subject: [XML-SIG] sample code - msxml
Message-ID: <3AF6EA19.6350.2FF8E8@localhost>
Has anyone sample code for using the
SAXXMLReader of the Microsoft Parser msxml
in Python.
arne
--- Arne Krug: ---
--- ufcx@rz.uni-karlsruhe.de ---
--- akrug@mps.de ---
From uche.ogbuji@fourthought.com Tue May 8 00:16:34 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 07 May 2001 17:16:34 -0600
Subject: [XML-SIG] Curiouser and curiouser
Message-ID: <200105072316.f47NGYU31415@localhost.local>
Quote from anonymous source in http://xmlhack.com/read.php?item=1203
"The charter of the XML Protocols WG isn't to invent anything new."
I don't know how solid this particular source is, but this comment would seem to support Rich Salz's position in our debate from last week.
However, one of the secret weapons I had in that debate was that I'd happened to attend the W3C Web Services workshop last month, and I can certainly say the the above quote completely contradicts every sense I got from that meeting.
I think the politics of XML protocols and Web services will be white hot. It might even be a bloodier battlefield than the notorious XML Schemas. The camps appear to be roughly:
* Just use SOAP as-is and rubber-stamp WSDL and UDDI to boot (the camp that seems to be represented in the above quote)
* Take the good parts of SOAP, mix in a bit of "transactions" here, a dash of PKI there, a smidgen of EAI voodoo, and...
* This is EDI + Internet transport + XML payload + semantic Web, folks: quit reinventing wheels (the camp I occupy)
I think I can say from first hand that all camps have powerful adherents.
Don't ask me what the hell this means for Python efforts...
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From rsalz@zolera.com Tue May 8 04:59:00 2001
From: rsalz@zolera.com (Rich Salz)
Date: Mon, 07 May 2001 23:59:00 -0400
Subject: [XML-SIG] Curiouser and curiouser
References: <200105072316.f47NGYU31415@localhost.local>
Message-ID: <3AF76F04.E38CC24D@zolera.com>
> "The charter of the XML Protocols WG isn't to invent anything new."
According to the XP charter
(http://www.w3.org/2000/09/XML-Protocol-Charter), "The Working group
shall start by developing a requirements document, and then evaluate the
technical solutions proposed in the SOAP/1.1 submission against these
requirements. If in this process the Working Group finds solutions that
are agreed to be improvements over solutions suggested by SOAP 1.1,
those improved solutions should be used."
Now I find that phrase "agreed to be improvements" rife with all sorts
of potential. Certainly one could make a case that a new preferred
encoding that is non-interoperable with the deployed base of Sec 5
encodings is NOT an improvement, overall. :)
I knew you were at the WS workshop, and that I was basing my opinions
solely on the public record, but that's okay. I've served my time in
standards activities and consortia, and I can hazard a guess as to what
will happen. The same thing that always happens: folks want holes put
in so they can plug in their own "embrace and extend" or "optimized"
version of the current protocol. Well, since the encodings are
specified by namespace, the holes are already there. :) So XP will
tighten up the wording, remove ambiguity, and not break interop.
> I think the politics of XML protocols and Web services will be white hot.
I don't disagree.
> The camps appear to be roughly:
Interesting analysis, thanks!
> * Just use SOAP as-is and rubber-stamp WSDL and UDDI to boot ...
> * Take the good parts of SOAP, mix in a bit of "transactions" here, a dash of PKI there, a smidgen of EAI voodoo, and...
These aren't mutually exclusive, since #2 is presumably a subset of #1.
As a security expert, I question the need for signed soap, especially in
the presence of actors. I think applications will want to do their own
signing/encryption.
> * This is EDI + Internet transport + XML payload + semantic Web, folks: quit reinventing wheels (the camp I occupy)
I got a bit lost in your sentence syntax. Can you explain what you mean
here? Tnx.
> Don't ask me what the hell this means for Python efforts...
Quoting an old colleague "with freedom comes choices, and with choices
comes more lines of code." :)
/r$
From laurent_fontanel@globalcrossing.com Tue May 8 17:22:06 2001
From: laurent_fontanel@globalcrossing.com (Laurent Fontanel)
Date: Tue, 08 May 2001 12:22:06 -0400
Subject: [XML-SIG] Re: sample code - msxml
References:
Message-ID: <3AF81D2E.9EF58A50@globalcrossing.com>
This is a multi-part message in MIME format.
--------------5BD27586D3B9A79D098870E9
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Arne,
I've never used MSXML to process XML SAX-style, but I've used it to apply XSL stylesheets.
It's really easy with the win32com interface:
import win32com.client
def xml2htm(xmlFile):
source = win32com.client.Dispatch("Microsoft.xmldom")
source.async = 0
source.load(xmlFile)
style = win32com.client.Dispatch("Microsoft.xmldom")
style.async = 0
style.load("mystylesheet.xsl")
return source.transformNode(style)
if __name__ == '__main__':
# xml2htm("myfile.xml")
Note also that after source.load(), you can manipulate the whole document tree using DOM calls,
which is pretty neat.
Hope this helps,
Laurent.
--------------5BD27586D3B9A79D098870E9
Content-Type: text/x-vcard; charset=us-ascii;
name="laurent_fontanel.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Laurent Fontanel
Content-Disposition: attachment;
filename="laurent_fontanel.vcf"
begin:vcard
n:Fontanel;Laurent
tel;work:(716) 777-2752
x-mozilla-html:TRUE
org:;Systems and Product Development
adr:;;180 S. Clinton Ave.;Rochester;NY;14646;
version:2.1
email;internet:laurent_fontanel@globalcrossing.com
fn:Laurent Fontanel
end:vcard
--------------5BD27586D3B9A79D098870E9--
From karl@digicool.com Tue May 8 21:12:23 2001
From: karl@digicool.com (Karl Anderson)
Date: 08 May 2001 13:12:23 -0700
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI
In-Reply-To: noreply@sourceforge.net's message of "Fri, 04 May 2001 18:06:10 -0700"
References:
Message-ID:
Can anyone shine some light on which DOM implementation is right here?
After parsing an attribute with no namespace prefix, what namespace
URIs should it be possible to retrieve that attribute with?
For example, after parsing "" in a namespace
aware way, which should return "1.0":
getAttributeNS(None, 'version')
getAttributeNS('', 'version')
Only the URI of '' works for Domlette. Only the URI of None works for
ParsedXML. I think that ParsedXML's restriction is morally better
because of this line from the DOM rec:
> Note that because the DOM does no lexical checking, the
> empty string
> will be treated as a real namespace URI in DOM Level 2
> methods.
> Applications must use the value null as the
> namespaceURI parameter for
> methods if they wish to have no namespace.
OTOH, I've lost arguments when it was pointed out that you don't have
to use DOM methods when you're parsing, and in fact can't parse
everything if you're restricted to them. OTOH again, using None would
make parsing consistent with setting namespaceless names using DOM
methods.
ParsedXML doesn't work for the XSLT modules in the current PyXML
checkout because they use '' as the NSURI to use to retrieve NSless
attributes.
Should ParsedXML allow names parsed without a NS to be retrieved
with a NSURI of '' as well as None? Should Domlette allow None?
Should None be used in getAttributeNS calls like these, regardless?
noreply@sourceforge.net writes:
> Bugs item #421553, was updated on 2001-05-04 18:06
> You can respond by visiting:
> http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421553&group_id=6473
>
> Category: 4Suite
> Group: None
> Status: Open
> Resolution: None
> Priority: 5
> Submitted By: Karl Anderson (karlanderson)
> Assigned to: Nobody/Anonymous (nobody)
> Summary: stylesheet node reader requires '' NSURI
>
> Initial Comment:
>
> I'm unable to use ParsedXML's DOM as a stylesheet node,
> and I think
> it's because of a bug in StylesheetReader.py.
>
> The problem is at StylesheetReader.py line 186:
>
> if not sheet.getAttributeNS('', 'version'):
> raise
> XsltException(Error.STYLESHEET_MISSING_VERSION)
>
> ...where the NamespaceURI given to getAttributeNS is
> ''. This is
> supposed to find the namespace-free version attribute
> of the
> stylesheet documentElement, such as
> """
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> version="1.0">
> """.
>
> ParsedXML's DOM builder gives this attribute a
> NamespaceURI of None
> when we parse.
>
> I don't think that you can use the DOM methods to
> create a node with a
> NamespaceURI of "", since the NamespaceURI is supposed
> to be a URI
> reference. Is the empty string a valid URI reference?
> Well, maybe -
> the DOM level 2 rec says:
> """
> Note that because the DOM does no lexical checking, the
> empty string
> will be treated as a real namespace URI in DOM Level 2
> methods.
> Applications must use the value null as the
> namespaceURI parameter for
> methods if they wish to have no namespace.
> """
> But anyway, this indicates that when using DOM creation
> methods, a
> None should be used as the NamespaceURI for
> namespaceless nodes such
> as "version", and I think that the stylesheet reader
> should accept
> that.
>
>
> ----------------------------------------------------------------------
>
> You can respond by visiting:
> http://sourceforge.net/tracker/?func=detail&atid=106473&aid=421553&group_id=6473
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--
Karl Anderson karl@digicool.com
From fdrake@acm.org Tue May 8 21:15:29 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 8 May 2001 16:15:29 -0400 (EDT)
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI
In-Reply-To:
References:
Message-ID: <15096.21473.817904.541038@cj42289-a.reston1.va.home.com>
Karl Anderson writes:
> Can anyone shine some light on which DOM implementation is right here?
> After parsing an attribute with no namespace prefix, what namespace
> URIs should it be possible to retrieve that attribute with?
>
> For example, after parsing "" in a namespace
> aware way, which should return "1.0":
>
> getAttributeNS(None, 'version')
> getAttributeNS('', 'version')
The former is correct according to past discussions in this mailing
list.
> Only the URI of '' works for Domlette. Only the URI of None works for
> ParsedXML. I think that ParsedXML's restriction is morally better
> because of this line from the DOM rec:
Domlette is broke!
> OTOH, I've lost arguments when it was pointed out that you don't have
> to use DOM methods when you're parsing, and in fact can't parse
> everything if you're restricted to them. OTOH again, using None would
> make parsing consistent with setting namespaceless names using DOM
> methods.
Using None would be the right thing because that's the Python DOM
binding.
> ParsedXML doesn't work for the XSLT modules in the current PyXML
> checkout because they use '' as the NSURI to use to retrieve NSless
> attributes.
That stinks!
> Should ParsedXML allow names parsed without a NS to be retrieved
> with a NSURI of '' as well as None? Should Domlette allow None?
> Should None be used in getAttributeNS calls like these, regardless?
Only None needs to be supported as an indication of "no namespace";
"" is different. (And probably broken.)
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From uche.ogbuji@fourthought.com Tue May 8 21:53:02 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 08 May 2001 14:53:02 -0600
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires
'' NSURI
In-Reply-To: Message from "Fred L. Drake, Jr."
of "Tue, 08 May 2001 16:15:29 EDT." <15096.21473.817904.541038@cj42289-a.reston1.va.home.com>
Message-ID: <200105082053.f48Kr2K10012@localhost.local>
>
> Karl Anderson writes:
> > Can anyone shine some light on which DOM implementation is right here?
> > After parsing an attribute with no namespace prefix, what namespace
> > URIs should it be possible to retrieve that attribute with?
> >
> > For example, after parsing "" in a namespace
> > aware way, which should return "1.0":
> >
> > getAttributeNS(None, 'version')
> > getAttributeNS('', 'version')
>
> The former is correct according to past discussions in this mailing
> list.
Yes, and I was a proponent of the former as well, but we just haven't had a
chance to go throughout XSLT and make the needed changes. It's on our to-do
list, but any contributed patches can make this happen more quickly. To be
clear: changing pDomlette and cDomlette themselves is quite easy: it's 4XPath
and 4XSLT that will eat up the sweat equity.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From dieter@handshake.de Tue May 8 22:21:06 2001
From: dieter@handshake.de (Dieter Maurer)
Date: Tue, 8 May 2001 23:21:06 +0200 (CEST)
Subject: [XML-SIG] 4xslt: bug and patch: variable import order
Message-ID: <15096.25410.753829.204197@lindm.dm>
--Multipart_Tue_May__8_23:21:06_2001-1
Content-Type: text/plain; charset=US-ASCII
The XSLT spec specifies that definitions and template rules
in an importing stylesheet take precedence over those from
an imported stylesheet. This is essential for easy customization
of imported stylesheets.
"4xslt" implements this feature only partially:
Top level variables in an importing stylesheet do not
take precedence over imported ones.
The attached patch hopefully fixes the problem.
It ensures that variables in importing style sheets
take precedence over those defined in imported style sheets
and that all style sheets use the same top level variables.
Dieter
--Multipart_Tue_May__8_23:21:06_2001-1
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="var_import_order.pat"
Content-Transfer-Encoding: 7bit
--- :Stylesheet.py Thu May 3 01:29:05 2001
+++ Stylesheet.py Tue May 8 23:19:29 2001
@@ -398,8 +398,16 @@
self._primedContext = context
#Note: key expressions can't have var refs, so we needn't worry about imports
self._updateKeys(contextNode, processor)
+ # DM: imported variables have lower precedence than that from
+ # the main style sheet.
+ d= {}
for imp in self._imports:
- self._primedContext.varBindings.update(imp.stylesheet._primedContext.varBindings)
+ d.update(imp.stylesheet._primedContext.varBindings)
+ d.update(self._primedContext.varBindings)
+ self._primedContext.varBindings= d
+ # DM: all use the same set of top level variables
+ for imp in self._imports:
+ imp.stylesheet._primedContext.varBindings= d
return topLevelParams
--Multipart_Tue_May__8_23:21:06_2001-1--
From stuff4gary@hotmail.com Tue May 8 23:52:28 2001
From: stuff4gary@hotmail.com (gary cor)
Date: Tue, 08 May 2001 22:52:28
Subject: [XML-SIG] What are the limits of soap and python?
Message-ID:
I am pretty confused by the SOAP discussion (it hasn't any connection with
the operas browser movement has it!!).
I imagine it is like OCX, Dynamic Data Exchange, Windows Script, applescript
!! Can it push buttons on the system and fill out text fields, with text
and automate through applications?
Can I set it up hot folders with it to send pictures through photoshop, OCR
and databases?? is that even possible in python?
Many thanks for any simple explanations!
Gary
_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
From karl@digicool.com Wed May 9 00:10:21 2001
From: karl@digicool.com (Karl Anderson)
Date: 08 May 2001 16:10:21 -0700
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI
In-Reply-To: Uche Ogbuji's message of "Tue, 08 May 2001 14:53:02 -0600"
References: <200105082053.f48Kr2K10012@localhost.local>
Message-ID:
Uche Ogbuji writes:
[*NS('', ...)]
> Yes, and I was a proponent of the former as well, but we just haven't had a
> chance to go throughout XSLT and make the needed changes. It's on our to-do
> list, but any contributed patches can make this happen more quickly. To be
> clear: changing pDomlette and cDomlette themselves is quite easy: it's 4XPath
> and 4XSLT that will eat up the sweat equity.
Well, a quick grep-find shows that they're all in XSLT, and they're
all getAttributeNS or setAttributeNS calls with actual empty strings,
nothing fancy.
Are there tests for 4XSLT? My install from a PyXML checkout didn't
install any, and I'm an XSLT newbie, so my testing is pretty limited.
I could supply patches, would they be useful without real testing at
this stage of 4XSLT development?
I'd love for this to be usable with our DOM.
--
Karl Anderson karl@digicool.com
From uche.ogbuji@fourthought.com Wed May 9 00:18:30 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 08 May 2001 17:18:30 -0600
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires
'' NSURI
In-Reply-To: Message from Karl Anderson
of "08 May 2001 16:10:21 PDT."
Message-ID: <200105082318.f48NIVb19293@localhost.local>
> Uche Ogbuji writes:
>
> [*NS('', ...)]
>
> > Yes, and I was a proponent of the former as well, but we just haven't had a
> > chance to go throughout XSLT and make the needed changes. It's on our to-do
> > list, but any contributed patches can make this happen more quickly. To be
> > clear: changing pDomlette and cDomlette themselves is quite easy: it's 4XPath
> > and 4XSLT that will eat up the sweat equity.
>
> Well, a quick grep-find shows that they're all in XSLT, and they're
> all getAttributeNS or setAttributeNS calls with actual empty strings,
> nothing fancy.
>
> Are there tests for 4XSLT? My install from a PyXML checkout didn't
> install any, and I'm an XSLT newbie, so my testing is pretty limited.
The test suite is in the documentation directory (e.g.
/usr/doc/4Suite-0.11.1a0/test_suite/4XSLT on my machine)
There are 160 test scripts, many of which have multiple test each.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From noreply@sourceforge.net Wed May 9 03:08:53 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 08 May 2001 19:08:53 -0700
Subject: [XML-SIG] [ pyxml-Bugs-422528 ] can't import xpath
Message-ID:
Bugs item #422528, was updated on 2001-05-08 19:08
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=422528&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: can't import xpath
Initial Comment:
Installed PyXML 0.6.5 and 4Suite checkout with
setup.py install.
Can't import xpath, or run the test suites:
>>> import xml.xpath
Traceback (innermost last):
File "", line 1, in ?
File
"/usr/lib/python1.5/site-packages/xml/xpath/__init__.py",
line 107, in ?
import Context, XPathParser
File
"/usr/lib/python1.5/site-packages/xml/xpath/Context.py",
line 16, in ?
import CoreFunctions
File
"/usr/lib/python1.5/site-packages/xml/xpath/CoreFunctions.py",
line 18, in ?
from xml.xpath import ExpandedNameWrapper
ImportError: cannot import name ExpandedNameWrapper
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=422528&group_id=6473
From karl@digicool.com Wed May 9 03:27:56 2001
From: karl@digicool.com (Karl Anderson)
Date: 08 May 2001 19:27:56 -0700
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI
In-Reply-To: Uche Ogbuji's message of "Tue, 08 May 2001 17:18:30 -0600"
References: <200105082318.f48NIVb19293@localhost.local>
Message-ID:
Uche Ogbuji writes:
> > Are there tests for 4XSLT? My install from a PyXML checkout didn't
> > install any, and I'm an XSLT newbie, so my testing is pretty limited.
>
> The test suite is in the documentation directory (e.g.
> /usr/doc/4Suite-0.11.1a0/test_suite/4XSLT on my machine)
Oh, there's my problem, I was using a PyXML checkout.
I admit that I'm unclear about the relationship between 4Suite and
PyXML - I thought that once a module was added to PyXML, that
checking out PyXML would give you sufficiently bleeding edge code to
develop with and prod for bugs.
I'm also looking for the most vanilla version that I can tell users to
install and use with our code, when appropriate.
Once a module from 4Suite is added to PyXML, is the PyXML version a
checkout from the 4Suite CVS tree? Or is development moved to the
PyXML tree?
Why aren't the test suites part of PyXML? Do they rely on more of
4Suite?
--
Karl Anderson karl@digicool.com
From Mike.Olson@fourthought.com Wed May 9 05:31:58 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Tue, 08 May 2001 22:31:58 -0600
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires ''
NSURI
References: <200105082053.f48Kr2K10012@localhost.local>
Message-ID: <3AF8C83E.35A48C05@FourThought.com>
Karl Anderson wrote:
>
> Uche Ogbuji writes:
>
> [*NS('', ...)]
>
> > Yes, and I was a proponent of the former as well, but we just haven't had a
> > chance to go throughout XSLT and make the needed changes. It's on our to-do
> > list, but any contributed patches can make this happen more quickly. To be
> > clear: changing pDomlette and cDomlette themselves is quite easy: it's 4XPath
> > and 4XSLT that will eat up the sweat equity.
>
> Well, a quick grep-find shows that they're all in XSLT, and they're
> all getAttributeNS or setAttributeNS calls with actual empty strings,
> nothing fancy.
Nope. XPath uses these in the ParsedAxisSpecified, and decent hand full
of functions. You would need to fix these as well.
>
> Are there tests for 4XSLT? My install from a PyXML checkout didn't
> install any, and I'm an XSLT newbie, so my testing is pretty limited.
You might need to install 4Suite to get the tests but I'm not sure.
>
> --
> Karl Anderson karl@digicool.com
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From martin@loewis.home.cs.tu-berlin.de Wed May 9 07:59:16 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 9 May 2001 08:59:16 +0200
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI
In-Reply-To: (message from Karl
Anderson on 08 May 2001 16:10:21 -0700)
References: <200105082053.f48Kr2K10012@localhost.local>
Message-ID: <200105090659.f496xGN00940@mira.informatik.hu-berlin.de>
> Well, a quick grep-find shows that they're all in XSLT, and they're
> all getAttributeNS or setAttributeNS calls with actual empty strings,
> nothing fancy.
>
> Are there tests for 4XSLT? My install from a PyXML checkout didn't
> install any, and I'm an XSLT newbie, so my testing is pretty limited.
I think a major source of confusion is that the xpath/xslt
directories, as checked-out from PyXML CVS at the moment, are good for
any purpose. This is not they case: They don't work, and we know it.
If you want to *use* 4XSLT, you should install 4Suite, and not install
the xpath/xslt directories from PyXML (indeed, unless you modify
setup.py, they won't be installed).
> I could supply patches, would they be useful without real testing at
> this stage of 4XSLT development?
That said, if you want to contribute patches to make the xpath/xslt
packages useful, they are always appreciated. Of course, since you are
new to these packages, you might first want to look at how they are
supposed to function in 4Suite before fixing them in PyXML.
As for test suites: 4Suite does include test suites for these
packages.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Wed May 9 08:08:00 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 9 May 2001 09:08:00 +0200
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI
In-Reply-To: (message from Karl
Anderson on 08 May 2001 19:27:56 -0700)
References: <200105082318.f48NIVb19293@localhost.local>
Message-ID: <200105090708.f49780n00963@mira.informatik.hu-berlin.de>
> I admit that I'm unclear about the relationship between 4Suite and
> PyXML - I thought that once a module was added to PyXML, that
> checking out PyXML would give you sufficiently bleeding edge code to
> develop with and prod for bugs.
It is absolutely bleeding edge, yes, and bug reports are welcome.
However, until PyXML is released with these packages, you should not
assume that they actually work. Indeed, one possible scenario is that
the next PyXML release does *not* included these subdirectories.
> Once a module from 4Suite is added to PyXML, is the PyXML version a
> checkout from the 4Suite CVS tree? Or is development moved to the
> PyXML tree?
Neither, nor. 4XSLT uses a different XPath expression parser than the
copy in PyXML; the 4XSLT one is based on BisonGen/SWIG/bison/flex; the
PyXML one (dubbed PyXPath) uses Yapps/(s)re. The port to the other
parser, as well as other DOM implementations, is not complete.
> Why aren't the test suites part of PyXML?
A number of reasons. First of all, Fourthought has not contributed
them (although they might if asked). Then, the tests do require a
4Suite installation at the moment. Finally, the tests don't pass
without modifications; I'd like to minimize the necessary changes
before incorporating tests.
Regards,
Martin
From noreply@sourceforge.net Wed May 9 14:41:02 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 09 May 2001 06:41:02 -0700
Subject: [XML-SIG] [ pyxml-Patches-422641 ] NameError in RilParserImp.py
Message-ID:
Patches item #422641, was updated on 2001-05-09 06:41
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422641&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Alexandre Fayolle (afayolle)
Assigned to: Nobody/Anonymous (nobody)
Summary: NameError in RilParserImp.py
Initial Comment:
The parser uses undefined constants to report errors.
The attached patch adds definition of these constants
(and uses them properly).
Cheers
Alexandre Fayolle
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422641&group_id=6473
From uche.ogbuji@fourthought.com Wed May 9 15:02:52 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 09 May 2001 08:02:52 -0600
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires
'' NSURI
In-Reply-To: Message from "Martin v. Loewis"
of "Wed, 09 May 2001 09:08:00 +0200." <200105090708.f49780n00963@mira.informatik.hu-berlin.de>
Message-ID: <200105091402.f49E2qG06632@localhost.local>
> > Why aren't the test suites part of PyXML?
>
> A number of reasons. First of all, Fourthought has not contributed
> them (although they might if asked).
Of course.
> Then, the tests do require a 4Suite installation at the moment.
We have discused making them use PyUnit, but this, as all other such good
intentions, are obstructed by time limitations.
> Finally, the tests don't pass
> without modifications; I'd like to minimize the necessary changes
> before incorporating tests.
We're hammering at the test suites all the while to fix and tweak it into
submission.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Wed May 9 15:13:42 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 09 May 2001 08:13:42 -0600
Subject: [XML-SIG] Curiouser and curiouser
In-Reply-To: Message from Rich Salz
of "Mon, 07 May 2001 23:59:00 EDT." <3AF76F04.E38CC24D@zolera.com>
Message-ID: <200105091413.f49EDgM06672@localhost.local>
> > The camps appear to be roughly:
>
> Interesting analysis, thanks!
>
> > * Just use SOAP as-is and rubber-stamp WSDL and UDDI to boot ...
> > * Take the good parts of SOAP, mix in a bit of "transactions" here, a dash of PKI there, a smidgen of EAI voodoo, and...
>
> These aren't mutually exclusive, since #2 is presumably a subset of #1.
No. Some want to change SOAP, which is different from #1. Also while the #1
folks want to call it a day when WSDL and UDDI are stabilized, the #2 folk
want much more.
> > * This is EDI + Internet transport + XML payload + semantic Web, folks: quit reinventing wheels (the camp I occupy)
>
> I got a bit lost in your sentence syntax. Can you explain what you mean
> here? Tnx.
Basically:
* take the business process, internationalization and authority-of-record work
hammered out in EDI.
* Use Internet transport (HTTP/SMTP) rather than VAN/BBS, Use XML as the
payload for human readibility, inexpensive app integration and extensibility
* Use a unified structured meta-data model for decription and modeling.
I think this is the most attainable and viable approach to XML-based business
transactions, mostly because it avoids reinventing wheels as much as possible.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From rsalz@zolera.com Wed May 9 15:39:47 2001
From: rsalz@zolera.com (Rich Salz)
Date: Wed, 09 May 2001 10:39:47 -0400
Subject: [XML-SIG] Curiouser and curiouser
References: <200105091413.f49EDgM06672@localhost.local>
Message-ID: <3AF956B3.1146CC7C@zolera.com>
> * take the business process, internationalization and authority-of-record work
> hammered out in EDI.
> * Use Internet transport (HTTP/SMTP) rather than VAN/BBS, Use XML as the
> payload for human readibility, inexpensive app integration and extensibility
> * Use a unified structured meta-data model for decription and modeling.
So you must be a big fan of ebXML.
Me, too. :)
From uche.ogbuji@fourthought.com Wed May 9 16:18:38 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Wed, 09 May 2001 09:18:38 -0600
Subject: [XML-SIG] Curiouser and curiouser
In-Reply-To: Message from Rich Salz
of "Wed, 09 May 2001 10:39:47 EDT." <3AF956B3.1146CC7C@zolera.com>
Message-ID: <200105091518.f49FIcg07346@localhost.local>
> > * take the business process, internationalization and authority-of-record work
> > hammered out in EDI.
> > * Use Internet transport (HTTP/SMTP) rather than VAN/BBS, Use XML as the
> > payload for human readibility, inexpensive app integration and extensibility
> > * Use a unified structured meta-data model for decription and modeling.
>
> So you must be a big fan of ebXML.
>
> Me, too. :)
You got it. I should rather clarigy that I prefer ebXML to the WSDL/UDDI
camp, because they are standing on the shoulders of the EDI giants. I have a
*great* deal of respect for EDI in general, and I think that the main problem
with it was the unfortunate power that the main VANs such as GEIS, Sterling
and Harbinger acquired, which strangled innovation and evolution.
I think that the UDDI camp's insistence on reinventing it all is horrid form,
and at the WSWS it looked to me as if the impetus behind reinventing it all
was for each vendor to make as much of a land grab as possible on
B2B-next-generation. I have no problem with the profit motive, but hypocrisy
sucks.
BTW, any luck on setting up that Web services SIG? We're pretty close to
off-topic in this discussion, but I'd like it to continue, especially with
regard to coordingating Python efforts in Web services.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From rsalz@zolera.com Wed May 9 16:44:05 2001
From: rsalz@zolera.com (Rich Salz)
Date: Wed, 09 May 2001 11:44:05 -0400
Subject: [XML-SIG] Curiouser and curiouser
References: <200105091518.f49FIcg07346@localhost.local>
Message-ID: <3AF965C5.73BF4B@zolera.com>
> BTW, any luck on setting up that Web services SIG? We're pretty close to
> off-topic in this discussion, but I'd like it to continue, especially with
> regard to coordingating Python efforts in Web services.
Sending a "can we create it now" note to the meta-sig was on my todo
list. I'll send it now.
/r$
From Alexandre.Fayolle@logilab.fr Wed May 9 16:56:58 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 9 May 2001 17:56:58 +0200 (CEST)
Subject: [XML-SIG] clarification request about Sax/Sax2 mappings
Message-ID:
Hello,
I would appreciate if someone could provide information about Sax/Sax2
interface in pyxml (or provide some pointer on some documentation).
Specifically, my understanding is that the prototype of the startElement
method of the ContentHandler interface in Sax2 is supposed to take 4
arguments (nsUri, localName, qName, attributes). However, in
xml.sax.handler, ContentHandler's startElement method has the same
prototype as xml.sax.saxlib's DocumentHandler (which should be used with a
SAX 1 parser), i.e. name, attributes.
I'm trying to write a parser for a non-xml document, that should behave as
a sax parser for the external world, especially to the various DOM reader
classes available around here. Some of these seem to be expecting calls to
startElementNS (I'm thinking specifically of FT's pDomletteReader), with
a signature similar to Java's SAX2 ContentHandler.startElement method.
Any help appreciated.
Alexandre Fayolle
--
http://www.logilab.com
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).
From larsga@garshol.priv.no Wed May 9 17:23:57 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 09 May 2001 18:23:57 +0200
Subject: [XML-SIG] clarification request about Sax/Sax2 mappings
In-Reply-To:
References:
Message-ID:
* Alexandre Fayolle
|
| Specifically, my understanding is that the prototype of the startElement
| method of the ContentHandler interface in Sax2 is supposed to take 4
| arguments (nsUri, localName, qName, attributes).
This is not correct. In SAX 2.0 there are two startElement methods:
startElement(name, attributes)
startElementNS(name, qname, attributes)
In the latter, name is a (nsuri, localname) tuple.
| However, in xml.sax.handler, ContentHandler's startElement method
| has the same prototype as xml.sax.saxlib's DocumentHandler (which
| should be used with a SAX 1 parser), i.e. name, attributes.
That is correct. This is used when the XML processor is not in
namespace mode.
| I'm trying to write a parser for a non-xml document, that should behave as
| a sax parser for the external world, especially to the various DOM reader
| classes available around here. Some of these seem to be expecting calls to
| startElementNS (I'm thinking specifically of FT's pDomletteReader), with
| a signature similar to Java's SAX2 ContentHandler.startElement method.
A good DOM builder should accept calls to both startElement and
startElementNS. It should also require applications to be consistent
and only use one or the other throughout a single document.
I hope this helps.
--Lars M.
From noreply@sourceforge.net Wed May 9 17:29:00 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 09 May 2001 09:29:00 -0700
Subject: [XML-SIG] [ pyxml-Patches-422689 ] RIL parser fixes
Message-ID:
Patches item #422689, was updated on 2001-05-09 09:28
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422689&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Alexandre Fayolle (afayolle)
Assigned to: Nobody/Anonymous (nobody)
Summary: RIL parser fixes
Initial Comment:
The RilParser class makes some strange calls to
construct new Predicate classes, some of which do not
exist.
Here's an attempt to fix this. Not much tested. Please
examine carefully before applying. (the diff is against
a version patched with the patch I submitted earlier
today).
Cheers Alexandre Fayolle
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422689&group_id=6473
From fdrake@acm.org Wed May 9 17:35:23 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 9 May 2001 12:35:23 -0400 (EDT)
Subject: [XML-SIG] clarification request about Sax/Sax2 mappings
In-Reply-To:
References:
Message-ID: <15097.29131.401159.457645@cj42289-a.reston1.va.home.com>
Lars Marius Garshol writes:
> A good DOM builder should accept calls to both startElement and
> startElementNS. It should also require applications to be consistent
> and only use one or the other throughout a single document.
This is not clear; does the DOM specification indicate that only one
or the other can be used? I think it seems very careful to indicate
that both can be used, as long as expectations are limited.
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From Alexandre.Fayolle@logilab.fr Wed May 9 17:47:38 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 9 May 2001 18:47:38 +0200 (CEST)
Subject: [XML-SIG] clarification request about Sax/Sax2 mappings
In-Reply-To:
Message-ID:
On 9 May 2001, Lars Marius Garshol wrote:
>
> * Alexandre Fayolle
> |
> | Specifically, my understanding is that the prototype of the startElement
> | method of the ContentHandler interface in Sax2 is supposed to take 4
> | arguments (nsUri, localName, qName, attributes).
>
> This is not correct. In SAX 2.0 there are two startElement methods:
>
> startElement(name, attributes)
> startElementNS(name, qname, attributes)
>
> In the latter, name is a (nsuri, localname) tuple.
I use http://www.megginson.com/SAX/Java/Javadoc/ as a reference. In this
documentation, the ContentHandler interface has no startElementNS method,
only startElement(java.lang.String namespaceURI, java.lang.String
localName, java.lang.String qName, Attributes atts).
If I'm not using the right reference, could someone please give me a
pointer to the right one. Otherwise, I do not understand where the
startElementNS method comes from.
Alexandre Fayolle
--
http://www.logilab.com
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).
From fdrake@acm.org Wed May 9 17:59:03 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 9 May 2001 12:59:03 -0400 (EDT)
Subject: [XML-SIG] clarification request about Sax/Sax2 mappings
In-Reply-To:
References:
Message-ID: <15097.30551.770652.554305@cj42289-a.reston1.va.home.com>
Alexandre Fayolle writes:
> If I'm not using the right reference, could someone please give me a
> pointer to the right one. Otherwise, I do not understand where the
> startElementNS method comes from.
Documentation for the Python SAX2 bindings is given in the Python
Library Reference:
http://www.python.org/doc/current/lib/markup.html
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From Alexandre.Fayolle@logilab.fr Wed May 9 18:17:18 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 9 May 2001 19:17:18 +0200 (CEST)
Subject: [XML-SIG] clarification request about Sax/Sax2 mappings
In-Reply-To: <15097.30551.770652.554305@cj42289-a.reston1.va.home.com>
Message-ID:
On Wed, 9 May 2001, Fred L. Drake, Jr. wrote:
> Documentation for the Python SAX2 bindings is given in the Python
> Library Reference:
>
> http://www.python.org/doc/current/lib/markup.html
Thanks. This is what I was looking for. (I'm still stuck with python 1.52,
and this is not part of the Python doc I'm used to read daily).
Alexandre Fayolle
--
http://www.logilab.com
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).
From larsga@garshol.priv.no Wed May 9 18:19:00 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 09 May 2001 19:19:00 +0200
Subject: [XML-SIG] clarification request about Sax/Sax2 mappings
In-Reply-To: <15097.29131.401159.457645@cj42289-a.reston1.va.home.com>
References: <15097.29131.401159.457645@cj42289-a.reston1.va.home.com>
Message-ID:
* Lars Marius Garshol
|
| A good DOM builder should accept calls to both startElement and
| startElementNS. It should also require applications to be consistent
| and only use one or the other throughout a single document.
* Fred L. Drake, Jr.
|
| This is not clear; does the DOM specification indicate that only one
| or the other can be used? I think it seems very careful to indicate
| that both can be used, as long as expectations are limited.
You are right about that, but SAX makes it clear that you must
consistently use either startElement or startElementNS, so this is a
SAX 2.0 issue more than a DOM issue. I wouldn't get too upset if the
DOM doesn't check this, though.
--Lars M.
From karl@digicool.com Wed May 9 19:23:32 2001
From: karl@digicool.com (Karl Anderson)
Date: 09 May 2001 11:23:32 -0700
Subject: [XML-SIG] [ pyxml-Bugs-421553 ] stylesheet node reader requires '' NSURI
In-Reply-To: "Martin v. Loewis"'s message of "Wed, 9 May 2001 09:08:00 +0200"
References: <200105082318.f48NIVb19293@localhost.local> <200105090708.f49780n00963@mira.informatik.hu-berlin.de>
Message-ID:
Martin v. Loewis writes:
> However, until PyXML is released with these packages, you should not
> assume that they actually work. Indeed, one possible scenario is that
> the next PyXML release does *not* included these subdirectories.
Yeah, CVS checkouts and all, I know :) FYI, my motivation is estimating
the chances of a stable release in the near future that works with
ParsedXML.
--
Karl Anderson karl@digicool.com
From Christine Hall"
Hello,
I visited
glory.python.or.kr
and I
noticed that you are not listed on some search engines. I am sure you can
increase the number of people who visit
glory.python.or.kr
. Do you know TrafficMagnet? TrafficMagnet is a unique technology that instantly submits your
web site to over 300,000+ search engines and directories every month. This is a
very low-cost and effective way of advertising your site.
From noreply@sourceforge.net Wed May 9 23:41:27 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Wed, 09 May 2001 15:41:27 -0700
Subject: [XML-SIG] [ pyxml-Patches-422801 ] CoreFunctions misusing ExpandedName
Message-ID:
Patches item #422801, was updated on 2001-05-09 15:41
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422801&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Karl Anderson (karlanderson)
Assigned to: Nobody/Anonymous (nobody)
Summary: CoreFunctions misusing ExpandedName
Initial Comment:
Using 4Suite cvs checkout.
xpath.CoreFunctions.py was looking at the wrong attrs
of ExpandedName.ExpandedName, causing tests to fail.
This patch uses the attrs in ExpandedName.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=422801&group_id=6473
From fdrake@acm.org Thu May 10 03:03:52 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 9 May 2001 22:03:52 -0400 (EDT)
Subject: [XML-SIG] clarification request about Sax/Sax2 mappings
In-Reply-To:
References:
<15097.29131.401159.457645@cj42289-a.reston1.va.home.com>
Message-ID: <15097.63240.342240.594303@cj42289-a.reston1.va.home.com>
Lars Marius Garshol writes:
> You are right about that, but SAX makes it clear that you must
> consistently use either startElement or startElementNS, so this is a
> SAX 2.0 issue more than a DOM issue. I wouldn't get too upset if the
> DOM doesn't check this, though.
I'm not going to worry about it, either. I think their are two
problems here: that the Namespaces in XML specification is poorly
written and does not cover everything it should (interaction with DTDs
being a major issue in my book, though part of that may be a lack of
clarity in the text rather than the issues not having been
approached), and the conflation of NS and non-NS documents in the
DOM.
But neither of those issues is directly related to the Python
bindings for the APIs, so I guess we've strayed a little. ;-)
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From martin@loewis.home.cs.tu-berlin.de Thu May 10 06:56:45 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 10 May 2001 07:56:45 +0200
Subject: [XML-SIG] What are the limits of soap and python?
In-Reply-To:
(stuff4gary@hotmail.com)
References:
Message-ID: <200105100556.f4A5ujB01569@mira.informatik.hu-berlin.de>
> I am pretty confused by the SOAP discussion (it hasn't any connection with
> the operas browser movement has it!!).
I don't know what the operas browswer movement is, but I guess it has
no connection to SOAP, no.
> I imagine it is like OCX, Dynamic Data Exchange, Windows Script, applescript
> !!
Not really. SOAP messages are typically received by Web servers; I
don't think anybody uses them to control applications on the same
machine.
> Can it push buttons on the system and fill out text fields, with
> text and automate through applications?
SOAP, on its own, is just a protocol for access to objects. Whether
the objects, when accessed, fill out text fields - that is up to the
objects being accessed. You cannot push buttons using SOAP.
> Can I set it up hot folders with it to send pictures through
> photoshop, OCR and databases??
I'm not sure what a hot folder is, but I guess photoshop would not
react to or emit SOAP messages; nor do I know any database system that
supports SOAP directly (although many web servers may give indirect
access to a database through SOAP).
> is that even possible in python?
Doing all these things is possible in Python, I believe - but you'ld
have to do them without SOAP.
Regards,
Martin
From mike@pdc.kth.se Thu May 10 13:25:41 2001
From: mike@pdc.kth.se (Mike Hammill)
Date: Thu, 10 May 2001 14:25:41 +0200
Subject: [XML-SIG] Help with removeChild()
Message-ID: <200105101225.f4ACPN8181353@ratatosk.pdc.kth.se>
Dear xml-sig,
I hope someone with a bit more experience can help me. I'm trying to use
xml.minidom to clean up an XML file. In brief, how does one walk through the
DOM tree and remove certain children using recursion? My attempt walks the
tree, but some children are skipped. I believe this is because when children
are removed, it not reflected in the calling program list of children. Here
is a simplified version of the problem.
XML file:
I would like to get rid of any element that has no attributes and who's text
element is just whitespace, tabs, or linefeeds. I wrote a little tree walker
the reduces the above to:
So far, so good. When I apply the following code, however, the result is:
That is only elements a, c, and e are eliminated. The code is:
def trim_dom_more(node):
if node.hasChildNodes():
for child in node.childNodes:
trim_dom_more(child)
else:
if node.nodeType == node.ELEMENT_NODE:
if (not node.hasAttributes()) and (not node.hasChildNodes()):
node.parentNode.removeChild(node)
I think I understand that the problem is that node.childNodes gets evaluated
and put on the stack, but then after the removeChild, this stacked list is not
re-evaluated so not all children are iterated through. But how to solve that?
Any advice welcome!
Thanks
Mike
From tpassin@home.com Thu May 10 13:59:19 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Thu, 10 May 2001 08:59:19 -0400
Subject: [XML-SIG] Help with removeChild()
References: <200105101225.f4ACPN8181353@ratatosk.pdc.kth.se>
Message-ID: <000a01c0d951$076f7fe0$7cac1218@reston1.va.home.com>
I think your problem is the inverse- the childNodes list ***is*** getting
updated by the DOM after each removal.
[Mike Hammill]
> That is only elements a, c, and e are eliminated. The code is:
>
> def trim_dom_more(node):
> if node.hasChildNodes():
> for child in node.childNodes:
> trim_dom_more(child)
> else:
> if node.nodeType == node.ELEMENT_NODE:
> if (not node.hasAttributes()) and (not node.hasChildNodes()):
> node.parentNode.removeChild(node)
>
> I think I understand that the problem is that node.childNodes gets
evaluated
> and put on the stack, but then after the removeChild, this stacked list is
not
> re-evaluated so not all children are iterated through. But how to solve
that?
Try this:
def trim_dom_more(node):
if node.hasChildNodes():
children=node.childNodes[:]
for child in children:
trim_dom_more(child)
Now you are iterating through a static copy of the list. It wouldn't work
if the child nodes could get changed by another thread, but I don't suppose
that's going to happen here.
Or you could do
while node.hasChildNodes():
trim_dom_more(node.childNodes[0])
That would execute slower, though. But it wouldn't get fooled by any other
activity in the DOM.
Cheers,
Tom P
From noreply@sourceforge.net Thu May 10 14:53:47 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 10 May 2001 06:53:47 -0700
Subject: [XML-SIG] [ pyxml-Bugs-423027 ] startElementNS bug in pDomletteReader
Message-ID:
Bugs item #423027, was updated on 2001-05-10 06:53
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=423027&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Logilab (ornicar)
Assigned to: Nobody/Anonymous (nobody)
Summary: startElementNS bug in pDomletteReader
Initial Comment:
Hi,
I've been trying to make a custom Sax parser work using
the startElementNS() method... No way, this function
needs some updates, and I don't exactly know how to fix
it. In fact endElementNS() tries to pop elements from
internal stacks which have not been pushed in before,
especially namespaces...
Cheers,
Bruno Van Frachem, Logilab.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=423027&group_id=6473
From mike@pdc.kth.se Thu May 10 15:03:40 2001
From: mike@pdc.kth.se (Michael Hammill)
Date: Thu, 10 May 2001 16:03:40 +0200
Subject: [XML-SIG] Help with removeChild()
In-Reply-To: <000a01c0d951$076f7fe0$7cac1218@reston1.va.home.com>
References: <200105101225.f4ACPN8181353@ratatosk.pdc.kth.se>
Message-ID: <5.1.0.14.2.20010510155431.02d383d0@localhost>
Dear Thomas,
Your solution below works great! I have discovered something else quite
instructive (at least to me). When I first saw your solution, I thought
"oh, I've tried that already". Silly of me. What I had tried was not
exactly the same, but seemingly close. I had set children =
node.childNodes *without* the final '[:]'. In testing the solution below,
I found that if the [:] is left out, the result is the same as I got before
(an incorrect trimming); however, with the [:] it works fine. I'm sorry if
this is a newbe kind of confusion. I had always thought "list" was
equivalent to "list[:]", but apparently not.
Thank you again!
Mike
[...]
> Try this:
>
> def trim_dom_more(node):
> if node.hasChildNodes():
> children=node.childNodes[:]
> for child in children:
> trim_dom_more(child)
>
>Now you are iterating through a static copy of the list. It wouldn't work
>if the child nodes could get changed by another thread, but I don't suppose
>that's going to happen here.
[...]
From noreply@sourceforge.net Thu May 10 17:44:33 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 10 May 2001 09:44:33 -0700
Subject: [XML-SIG] [ pyxml-Bugs-423086 ] xml.xpath cannot be imported
Message-ID:
Bugs item #423086, was updated on 2001-05-10 09:44
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=423086&group_id=6473
Category: 4Suite
Group: None
Status: Open
Resolution: None
Priority: 9
Submitted By: Lars Marius Garshol (larsga)
Assigned to: Nobody/Anonymous (nobody)
Summary: xml.xpath cannot be imported
Initial Comment:
When importing xml.xpath, xml.xpath.Conversions gets sucked in, and that attempts to import xml.utils.boolean, which does not exist. The result is that any attempt to import xml.xpath fails. Did someone forget to commit something?
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=423086&group_id=6473
From martin@loewis.home.cs.tu-berlin.de Thu May 10 18:08:35 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 10 May 2001 19:08:35 +0200
Subject: [XML-SIG] [ pyxml-Bugs-423086 ] xml.xpath cannot be imported
In-Reply-To:
(noreply@sourceforge.net)
References:
Message-ID: <200105101708.f4AH8Zu01871@mira.informatik.hu-berlin.de>
> When importing xml.xpath, xml.xpath.Conversions gets sucked in, and
> that attempts to import xml.utils.boolean, which does not exist. The
> result is that any attempt to import xml.xpath fails. Did someone
> forget to commit something?
xml.utils.boolean should be compiled from extensions/boolean.c, and
installed in xml/utils. Did you perform a 'setup.py install', and it
still did not work?
Regards,
Martin
From larsga@garshol.priv.no Thu May 10 18:20:45 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 10 May 2001 19:20:45 +0200
Subject: [XML-SIG] [ pyxml-Bugs-423086 ] xml.xpath cannot be imported
In-Reply-To: <200105101708.f4AH8Zu01871@mira.informatik.hu-berlin.de>
References: <200105101708.f4AH8Zu01871@mira.informatik.hu-berlin.de>
Message-ID:
* Martin v. Loewis
|
| xml.utils.boolean should be compiled from extensions/boolean.c, and
| installed in xml/utils. Did you perform a 'setup.py install', and it
| still did not work?
Arrrghh! No, I was so thick-headed I didn't even think of that. Sorry.
Will close the bug now.
--Lars M.
From noreply@sourceforge.net Thu May 10 19:53:02 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Thu, 10 May 2001 11:53:02 -0700
Subject: [XML-SIG] [ pyxml-Patches-423122 ] xml.sax.writer places chardata in tags
Message-ID:
Patches item #423122, was updated on 2001-05-10 11:53
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=423122&group_id=6473
Category: sax
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Lars Marius Garshol (larsga)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: xml.sax.writer places chardata in tags
Initial Comment:
writer produces output of the form
where the element name was 'doc' and the character
data 'content'.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=306473&aid=423122&group_id=6473
From stuff4gary@hotmail.com Fri May 11 00:13:44 2001
From: stuff4gary@hotmail.com (gary cor)
Date: Thu, 10 May 2001 23:13:44
Subject: [XML-SIG] XForms and SVG support in python?
Message-ID:
Dear All,
I am working part-time at a publishers where we do magazines in DTP packages
- Quark 5.0, Illustrator 9.0 and Photoshop 6.0 can now export as SVG for the
web (replacing EPS format which we currently use and has never been
supported on the web!!). Soon they are adding Xform fields for SVG as well!
I am wondering whether anyone could forsee any problems or opportunities
using the Fieldstorage() cgi from Python to process Xform data? or in
changing any parts of SVG on the fly, eg like boxes in our advert sections?
Gary
PS I found some good tutorials for XML etc. and programming at
http://www.w3schools.com - I am a bit disappointed :-( they had nothing on
python, is python obscure?
_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
From michael.clark@ntlworld.com Sat May 12 00:06:20 2001
From: michael.clark@ntlworld.com (michael.clark)
Date: Sat, 12 May 2001 00:06:20 +0100
Subject: [XML-SIG] FREE SMS Messaging Web Service
Message-ID: <000001c0da6e$fe533ea0$ec07ff3e@clarks>
For those who are interested in SOAP web services,
I've just located this new web service. It seems to be
the first commercial one I've come across that's
a) working & b) actually useful.
You can send SMS messages to supposedly any
mobile phone in the world, free of charge! We've tried
it and so far we've sent messages to people in USA,
UK and ASIA, pretty neat we thought!
http://www.salcentral.com/help/smsreg.htm
Mark
From greg.simmons@ntlworld.com Sun May 13 09:53:15 2001
From: greg.simmons@ntlworld.com (Greg Simmons)
Date: Sun, 13 May 2001 09:53:15 +0100
Subject: [XML-SIG] SOAP SMS Messaging Web Services for FREE
Message-ID: <001d01c0db8a$27d1bb00$e08f69d5@clarks>
For those who are interested in SOAP web services,
I've just located this new web service. It seems to be
the first commercial one I've come across that's
a) working & b) actually useful.
You can send SMS messages to supposedly any
mobile phone in the world, free of charge! We've tried
it and so far we've sent messages to people in USA,
UK and ASIA, pretty neat we thought!
http://www.salcentral.com/help/smsreg.htm
Greg.
From uche.ogbuji@fourthought.com Sun May 13 14:36:42 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 13 May 2001 07:36:42 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
Message-ID: <3AFE8DEA.F92054CB@fourthought.com>
Just a note: I'm guessing you intended to x-post this to xml-sig, not
python-dev. I've changed the headers.
"Martin v. Loewis" wrote:
>
> Currently, 4XSLT has a dependency on the DOM implementation in terms
> of memory management (among other dependencies). I'd like to reduce
> this dependency, by providing a centralized function that knows how to
> release nodes.
>
> In PyXML, I currently use
>
> # Define ReleaseNode in a DOM-independent way
> import xml.dom.ext
> import xml.dom.minidom
> def _releasenode(n):
> if isinstance(n, xml.dom.minidom.Node):
> n.unlink()
> else:
> xml.dom.ext.ReleaseNode(n)
>
> try:
> from Ft.Lib import pDomlette
> def ReleaseNode(n):
> if isinstance(n, pDomlette.Node):
> pDomlette.ReleaseNode(n)
> else:
> _releasenode(n)
> _XsltElementBase = pDomlette.Element
> except ImportError:
> ReleaseNode = _releasenode
> from minisupport import _XsltElementBase
Wouldn't it be better to make up a Reader class for minidom which
implements a releaseNode method similar to what you have above? The
idea behind the reader architecture is to manage such things.
There might be some places in 4XSLT that don't properly call releaseNode
on the reader instance itself, but I'd rather fix them to do so.
What's "minisupport" and "_XsltElementBase"?
> This code knows how to release minidom, 4DOM, and pDomlette nodes, and
> supports installations without 4Suite (i.e. without pDomlette). I've
> put this into xslt/__init__.py, so that all callers of
> Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode.
> If desired, I could produce a patch against the public Ft CVS.
>
> As a slightly independent question, such a function also ought to
> support DOM implementations not known to it; I'm thinking in
> particular of the Zope DOMs. I'd like to hear proposals on how such an
> interface should work; I see three options:
>
> a) it is an operation on the document node (or any node), as in minidom.
> b) it is an operation on the DOM implementation (almost as in 4Suite;
> you'd need to navigate from the node to the implementation, then
> you'd need a well-known operation on the implementation)
> c) the code assumes that no release activity is necessary for unknown
> DOMs, effectively believing in reference counting, garbage collection,
> acquisition, and other black art.
Maybe we need a general Reader class for unknown DOM classes. This
would require the unification of DOM factories we were discusing a few
months ago, but the releaseNode method could just be a NOP, i.e. your
(c) option.
> Any comments appreciated, in particular
> 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and
> 2. from authors of other DOMs on a general memory management API for
> Python DOM.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From martin@loewis.home.cs.tu-berlin.de Sun May 13 15:41:25 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 16:41:25 +0200
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFE8DEA.F92054CB@fourthought.com> (message from Uche Ogbuji on
Sun, 13 May 2001 07:36:42 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com>
Message-ID: <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de>
[yes, I indeed meant to cross-post to xml-sig]
> Wouldn't it be better to make up a Reader class for minidom which
> implements a releaseNode method similar to what you have above? The
> idea behind the reader architecture is to manage such things.
How would that work? Assume there was a reader class for minidom, and
the XSLT runtime had a node object. How can you release the node?
Or do you need to know the reader class which originally created that
node as well? That would be not so good: the node might not have been
created by a reader at all, as it might have come directly from the
DOM implementation.
> There might be some places in 4XSLT that don't properly call releaseNode
> on the reader instance itself, but I'd rather fix them to do so.
There is a number of those. Grepping for ReleaseNode in the public CVS
gives
Processor.py: pDomlette.ReleaseNode(rtfRoot)
Processor.py: xml.dom.ext.ReleaseNode(rtfRoot)
Processor.py: pDomlette.ReleaseNode(self._dummyDoc)
Stylesheet.py: pDomlette.ReleaseNode(imp.stylesheet.ownerDocument)
StylesheetReader.py: pDomlette.ReleaseNode(inc)
StylesheetReader.py: pDomlette.ReleaseNode(sheet.ownerDocument)
StylesheetReader.py: pDomlette.ReleaseNode(inc)
XsltContext.py: pDomlette.ReleaseNode(doc)
XsltContext.py: pDomlette.ReleaseNode(rtf)
XsltContext.py: xml.dom.ext.ReleaseNode(rtf)
> What's "minisupport" and "_XsltElementBase"?
minisupport is an emulation of pDomlette equivalents as used by 4XSLT,
implemented using pDomlette. There are various pieces that I found
necessary: readers, ReaderBase, and Element. The latter is there to
support pickling, and to support the __init__ signature expected from
XsltElement.
> Maybe we need a general Reader class for unknown DOM classes. This
> would require the unification of DOM factories we were discusing a few
> months ago, but the releaseNode method could just be a NOP, i.e. your
> (c) option.
I don't recall that discussion. Your comment seems to imply a
relationship between a DOM implementation and a Reader class, which I
can't find in the 4Suite code. What do I miss?
Regards,
Martin
From uche.ogbuji@fourthought.com Sun May 13 18:48:28 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 13 May 2001 11:48:28 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de>
Message-ID: <3AFEC8EC.D6CFC2F2@fourthought.com>
I see what you mean. I was thinking about running 4XSLT on non-domlette
source nodes.
I'm guessing you've been working on code to allow XsltElements and
result-tree fragments to use minidom, so you're talking about calls to
releaseNode that handle these things.
Well, I think the best solution to this, rather than making a universal
ReleaseNode function, is to generalize the Reader architecture into a
general factory that can read, initialize and dispose of nodes. This
could be a Python DOM standard binding extension to DOMImplementation.
The earlier conversation I alluded to is the DOMImplementationFactory
discussion. If the DOMImplementation gets some standard add-ons, then
this can be used to determine the destruction mechanism in the general
case.
http://mail.python.org/pipermail/xml-sig/2001-February/004508.html
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From Mike.Olson@fourthought.com Sun May 13 19:19:06 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 12:19:06 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com>
Message-ID: <3AFED01A.3FF0E6F9@FourThought.com>
Uche Ogbuji wrote:
>
> Wouldn't it be better to make up a Reader class for minidom which
> implements a releaseNode method similar to what you have above? The
> idea behind the reader architecture is to manage such things.
The thing I don't like about the reader, is that you need to pass it
around or store it in order to call the correct release. We could get
around this by having each node store a reference to its reader when it
is created.
node.reader.releaseNode(node)
>
> There might be some places in 4XSLT that don't properly call releaseNode
> on the reader instance itself, but I'd rather fix them to do so.
Stylesheet nodes are the big ones (off head) because we don't keep track
of what reader the stylesheet was created with so we always call
pDomlette.releaseNode
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From Mike.Olson@fourthought.com Sun May 13 19:27:00 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 12:27:00 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de>
Message-ID: <3AFED1F4.C11668EF@FourThought.com>
"Martin v. Loewis" wrote:
>
> [yes, I indeed meant to cross-post to xml-sig]
>
> > Wouldn't it be better to make up a Reader class for minidom which
> > implements a releaseNode method similar to what you have above? The
> > idea behind the reader architecture is to manage such things.
>
> How would that work? Assume there was a reader class for minidom, and
> the XSLT runtime had a node object. How can you release the node?
>
> Or do you need to know the reader class which originally created that
> node as well? That would be not so good: the node might not have been
> created by a reader at all, as it might have come directly from the
> DOM implementation.
This is why I vote for either the implementation has the releaseNode
function, or the node itself. Readers are great for an abstract way of
creating a DOM (atleast until we all support level III), but without a
releationship between a node instance and its reader they don't work
very well for releasing them. I also did not think of your point of
nodes created without a reader but it is a good one.
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From martin@loewis.home.cs.tu-berlin.de Sun May 13 20:02:20 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 21:02:20 +0200
Subject: [XML-SIG] Disentangling StylesheetReader from Ft.Lib
Message-ID: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de>
I've tried to update my 4XSLT port to use the 4Suite 0.11 code base,
only to discover that the StyleseetReader class is now much stronger
connected to Ft.Lib than before, in particular to classes from
pDomletteReader, and their specific instance attributes.
I took the approach of providing alternative base classes to the ones
provided by pDomlette, but that soon became a desaster since none of
the minidom/pulldom classes bear any relationship to how the
PyExpatReader and Handler classes work.
I'd still like pursue my attempt of integrating 4XSLT to work without
Ft.Lib, and pDomlette in particular, but I'd need some advise here. I
feel that I miss some grand picture in all these classes, and how they
are connected. It seems that the authors of the code lose track, too,
with code duplication all over the place.
So my question is: Is all this complexity really necessary? Would it
be possible to simplify things by breaking down processing in multiple
processing steps? It seems to me that all StylesheetReader does is to
create a DOM tree, except that it creates StylesheetElement nodes
where a normal DOM build would create Element nodes. If this is really
all it does, I could propose some dramatic code reduction.
Any proposals are welcome.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Sun May 13 20:04:39 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 21:04:39 +0200
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFEC8EC.D6CFC2F2@fourthought.com> (message from Uche Ogbuji on
Sun, 13 May 2001 11:48:28 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com>
Message-ID: <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de>
> Well, I think the best solution to this, rather than making a universal
> ReleaseNode function, is to generalize the Reader architecture into a
> general factory that can read, initialize and dispose of nodes. This
> could be a Python DOM standard binding extension to DOMImplementation.
That is a solution that I could easily accept; it would take some time
until all relevant implementations support the method, though, and
we'd need a name for it.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Sun May 13 20:12:39 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 21:12:39 +0200
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFED01A.3FF0E6F9@FourThought.com> (message from Mike Olson on
Sun, 13 May 2001 12:19:06 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com>
Message-ID: <200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de>
> The thing I don't like about the reader, is that you need to pass it
> around or store it in order to call the correct release. We could get
> around this by having each node store a reference to its reader when it
> is created.
With regard to the reader, I'd also like to point you to the level 3
load-store interfaces,
http://www.w3.org/TR/2001/WD-DOM-Level-3-CMLS-20010419/load-save.html
where they have a DOMBuilder interface. So while your Reader interface
is fine as Ft-provided API, I think the DOMBuilder interface has a
higher chance of getting accepted widely.
Regards,
Martin
From uche.ogbuji@fourthought.com Sun May 13 20:31:18 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 13 May 2001 13:31:18 -0600
Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib
References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de>
Message-ID: <3AFEE106.4C99F9FD@fourthought.com>
"Martin v. Loewis" wrote:
>
> I've tried to update my 4XSLT port to use the 4Suite 0.11 code base,
> only to discover that the StyleseetReader class is now much stronger
> connected to Ft.Lib than before, in particular to classes from
> pDomletteReader, and their specific instance attributes.
This is to provide shared code, which, oddly enough, you advocate
below. Some of the routines could indeed be moved into a generic
handler that goes into xml.utils.
> I took the approach of providing alternative base classes to the ones
> provided by pDomlette, but that soon became a desaster since none of
> the minidom/pulldom classes bear any relationship to how the
> PyExpatReader and Handler classes work.
This could all be helped by using mix-in classes in xml.utils. Note
that I mean *real* mix-in classes, that is, classes that provide
implementation but not interface (a disturbing chunk of the Python
community seems to think that mixing in is just plain old inheritance).
> I'd still like pursue my attempt of integrating 4XSLT to work without
> Ft.Lib, and pDomlette in particular, but I'd need some advise here. I
> feel that I miss some grand picture in all these classes, and how they
> are connected. It seems that the authors of the code lose track, too,
> with code duplication all over the place.
Of course: the code is not all polished, but I must note that what you
complained above in your first para was actually a step that eliminated
a *great* deal of duplicated code from StylesheetReader.
The solution is to move the common code somewhere accessible from PyXML.
> So my question is: Is all this complexity really necessary? Would it
> be possible to simplify things by breaking down processing in multiple
> processing steps? It seems to me that all StylesheetReader does is to
> create a DOM tree, except that it creates StylesheetElement nodes
> where a normal DOM build would create Element nodes.
Wow. I'd count this a huge oversimplification. The Stylesheet reader
does a great deal that most readers needn't worry about, as I'd think
would be obvious from a glance at te code.
> If this is really
> all it does, I could propose some dramatic code reduction.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Sun May 13 20:33:26 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 13 May 2001 13:33:26 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de>
Message-ID: <3AFEE186.99AFB1EF@fourthought.com>
"Martin v. Loewis" wrote:
>
> > Well, I think the best solution to this, rather than making a universal
> > ReleaseNode function, is to generalize the Reader architecture into a
> > general factory that can read, initialize and dispose of nodes. This
> > could be a Python DOM standard binding extension to DOMImplementation.
>
> That is a solution that I could easily accept; it would take some time
> until all relevant implementations support the method, though, and
> we'd need a name for it.
I'd favor cleanUp().
And I'm not worried that implementations would need to catch up. The
desire for 4XSLT interop will accelerate this work.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Sun May 13 20:34:08 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 13 May 2001 13:34:08 -0600
Subject: [XML-SIG] [Fwd: [4suite] ReleaseNode interface in 4XSLT]
Message-ID: <3AFEE1B0.5461D63D@fourthought.com>
This is a multi-part message in MIME format.
--------------28F27A9C4716E5DCE6516942
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
--------------28F27A9C4716E5DCE6516942
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Return-Path: <4suite-admin@dollar.fourthought.com>
Received: from dollar.fourthought.com ([204.144.146.184])
by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4DJLv719962;
Sun, 13 May 2001 13:21:57 -0600
Received: from dollar.fourthought.com (localhost.localdomain [127.0.0.1])
by dollar.fourthought.com (8.9.3/8.9.3) with ESMTP id NAA13808;
Sun, 13 May 2001 13:16:02 -0600
Received: from yen.fourthought.com (bastion.fourthought.com [204.144.146.185])
by dollar.fourthought.com (8.9.3/8.9.3) with ESMTP id NAA13772
for <4suite@dollar.fourthought.com>; Sun, 13 May 2001 13:15:50 -0600
Received: from mail.cs.tu-berlin.de (root@mail.cs.tu-berlin.de [130.149.17.13])
by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4DJKl719685;
Sun, 13 May 2001 13:20:48 -0600
Received: from mira.informatik.hu-berlin.de (loewis.home.cs.tu-berlin.de [130.149.147.34])
by mail.cs.tu-berlin.de (8.9.3/8.9.3) with ESMTP id VAA14169;
Sun, 13 May 2001 21:15:10 +0200 (MET DST)
Received: (from martin@localhost)
by mira.informatik.hu-berlin.de (8.10.2/8.10.2/SuSE Linux 8.10.0-0.3) id f4DJ8lh14249;
Sun, 13 May 2001 21:08:47 +0200
Message-Id: <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de>
From: "Martin v. Loewis"
To: Mike.Olson@fourthought.com
CC: 4suite@fourthought.com, python-dev@python.org
In-reply-to: <3AFECF52.FF7E9B26@FourThought.com> (message from Mike Olson on
Sun, 13 May 2001 12:15:46 -0600)
Subject: Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com>
User-Agent: REMI/1.14.2 (=?ISO-8859-4?Q?Hokuhoku-=D2shima?=) Chao/1.14.1
(=?ISO-8859-4?Q?Rokujiz=F2?=) APEL/10.2 Emacs/20.7 (i386-suse-linux)
MULE/4.0 (HANANOEN)
MIME-Version: 1.0 (generated by REMI 1.14.2 - =?ISO-8859-4?Q?=22Hokuhoku-=D2?=
=?ISO-8859-4?Q?shima=22?=)
Content-Type: text/plain; charset=US-ASCII
Sender: 4suite-admin@dollar.fourthought.com
Errors-To: 4suite-admin@dollar.fourthought.com
X-BeenThere: 4suite@lists.fourthought.com
X-Mailman-Version: 2.0beta6
Precedence: bulk
List-Help:
List-Post:
List-Subscribe: ,
List-Id: Users and support for 4Suite tools <4suite.lists.fourthought.com>
List-Unsubscribe: ,
List-Archive: http://lists.fourthought.com/pipermail/4suite/
Date: Sun, 13 May 2001 21:08:47 +0200
> What if we put these on the implementation, that or came up with a
> standard interface on the node. Then, every DOM imp that wants to be
> compatible with xpath/xslt needs to support this interface?
>
>
> node.ownerDocument.implementation.releaseNode(node)
>
> or
>
> node.py_unlink()
releaseNode sounds good to me; it is unlikely that W3C would give an
operation that name but a different meaning. Any objections?
Regards,
Martin
_______________________________________________
4suite mailing list
4suite@lists.fourthought.com
http://lists.fourthought.com/mailman/listinfo/4suite
--------------28F27A9C4716E5DCE6516942--
From uche.ogbuji@fourthought.com Sun May 13 20:36:40 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 13 May 2001 13:36:40 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com> <200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de>
Message-ID: <3AFEE248.CA8C2BC4@fourthought.com>
"Martin v. Loewis" wrote:
>
> > The thing I don't like about the reader, is that you need to pass it
> > around or store it in order to call the correct release. We could get
> > around this by having each node store a reference to its reader when it
> > is created.
>
> With regard to the reader, I'd also like to point you to the level 3
> load-store interfaces,
>
> http://www.w3.org/TR/2001/WD-DOM-Level-3-CMLS-20010419/load-save.html
>
> where they have a DOMBuilder interface. So while your Reader interface
> is fine as Ft-provided API, I think the DOMBuilder interface has a
> higher chance of getting accepted widely.
I'm quite familiar with DOM Level 3, but the Reader architecture
predates this, and there is no immediate prospect of time to move to the
Level 3 interfaces. Perhaps in a month or two. Of course, this could
be accelerated by contributions.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From martin@loewis.home.cs.tu-berlin.de Sun May 13 21:17:22 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 22:17:22 +0200
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFEE186.99AFB1EF@fourthought.com> (message from Uche Ogbuji on
Sun, 13 May 2001 13:33:26 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> <3AFEE186.99AFB1EF@fourthought.com>
Message-ID: <200105132017.f4DKHMC20333@mira.informatik.hu-berlin.de>
> I'd favor cleanUp().
On the node, or on the DOM implementation?
Martin
From rsalz@zolera.com Mon May 14 01:39:48 2001
From: rsalz@zolera.com (Rich Salz)
Date: Sun, 13 May 2001 20:39:48 -0400
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com>
Message-ID: <3AFF2954.32ACAD38@zolera.com>
> The thing I don't like about the reader, is that you need to pass it
> around or store it in order to call the correct release. We could get
> around this by having each node store a reference to its reader when it
> is created.
I'm in favor of this for exactly this reason.
Since Python doesn't allow tilde in method names ~Node is out, so I'd go
along with releaseNode() as suggested elsewhere. :)
/r$
From Mike.Olson@fourthought.com Mon May 14 02:05:48 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 19:05:48 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> <3AFEE186.99AFB1EF@fourthought.com> <200105132017.f4DKHMC20333@mira.informatik.hu-berlin.de>
Message-ID: <3AFF2F6C.B1350B6D@FourThought.com>
"Martin v. Loewis" wrote:
>
> > I'd favor cleanUp().
>
> On the node, or on the DOM implementation?
I'm infavor of on the node. It would be a lot easier to access. If it
was on the implementation, you would need more logic to release an
arbitrary node as only the document has the implementation reference
(and document's don't have an owner document)
Mike
>
> Martin
> _______________________________________________
> 4suite mailing list
> 4suite@lists.fourthought.com
> http://lists.fourthought.com/mailman/listinfo/4suite
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From Mike.Olson@fourthought.com Mon May 14 02:14:17 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 19:14:17 -0600
Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib
References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de>
Message-ID: <3AFF3169.29F2B6C8@FourThought.com>
"Martin v. Loewis" wrote:
>
> I've tried to update my 4XSLT port to use the 4Suite 0.11 code base,
> only to discover that the StyleseetReader class is now much stronger
> connected to Ft.Lib than before, in particular to classes from
> pDomletteReader, and their specific instance attributes.
I was just in there as well and quite suprised how complex the code has
become. I thought of doing some work on it but figured, it ain't
broke.....
My thoughts were that the implementation should be able to hadle it,
then there would be on reader. all of the code in the Stylesheet Reader
would be handled in StylesheetDocument.createElement, or atleast the
marority of it. I haven't looked too closely to see if this is 100%
feasible thought.
>
> I took the approach of providing alternative base classes to the ones
> provided by pDomlette, but that soon became a desaster since none of
> the minidom/pulldom classes bear any relationship to how the
> PyExpatReader and Handler classes work.
Is pDomlette the only import from Ft.Lib? If so, why not move pDomlette
into xml.utils? Better yet, let's merge pDomlette and minidom so there
is only one domlette. pDomlette has greatly out grown its original
purpose so I have not problems with moving it into XML-Sig.
>
> I'd still like pursue my attempt of integrating 4XSLT to work without
> Ft.Lib, and pDomlette in particular, but I'd need some advise here. I
> feel that I miss some grand picture in all these classes, and how they
> are connected. It seems that the authors of the code lose track, too,
> with code duplication all over the place.
I agree. There was a lot of redundant code when I looked into it last.
I think there should be one xml-sig "reader" that works off a
DOMImplementation to create actual instances. Some things to note are
that this would slow things down. One big speed increase the pDomlette
gives us by having its own reader is that it can create elements
directly and not have to use the createElementNS interface. The problem
with the interface is that we have to do a "prefix + ':' + localName"
just to satisfy the interface (and then the function itself does a
sting.split(qname,':'). Not really a time consuming process, but when
you call it 10000 it adds up.
>
> So my question is: Is all this complexity really necessary? Would it
> be possible to simplify things by breaking down processing in multiple
> processing steps? It seems to me that all StylesheetReader does is to
> create a DOM tree, except that it creates StylesheetElement nodes
> where a normal DOM build would create Element nodes. If this is really
> all it does, I could propose some dramatic code reduction.
It also does validation, processing of include and import elements,
namespace aliasing, extension element processing, and more.
Though like I said, I think this could be handeled in a createElementNS
of a StylesheetDocument class.
Mike
>
> Any proposals are welcome.
>
> Regards,
> Martin
>
> _______________________________________________
> 4suite mailing list
> 4suite@lists.fourthought.com
> http://lists.fourthought.com/mailman/listinfo/4suite
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From Mike.Olson@fourthought.com Mon May 14 02:20:48 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 19:20:48 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com> <200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de>
Message-ID: <3AFF32F0.8AAAED0C@FourThought.com>
"Martin v. Loewis" wrote:
>
> > The thing I don't like about the reader, is that you need to pass it
> > around or store it in order to call the correct release. We could get
> > around this by having each node store a reference to its reader when it
> > is created.
>
> With regard to the reader, I'd also like to point you to the level 3
> load-store interfaces,
>
> http://www.w3.org/TR/2001/WD-DOM-Level-3-CMLS-20010419/load-save.html
>
> where they have a DOMBuilder interface. So while your Reader interface
> is fine as Ft-provided API, I think the DOMBuilder interface has a
> higher chance of getting accepted widely.
Agreed, same with xml.dom.ext.Print. Infact, all of the stuff in
xml.dom.ext was originally put there as "stuff the w3c will add
eventually" mainly the reader and printer interfaces. BAck when it was
only level I, there were functions to get a nodes namespace URI, prefix,
and local name in the ext directory. We moved to level II and thoase
were not needed. I think the same should happen with the printers and
readers.
However, are we ready to move to level III? Is level III ready to be
moved too? I don't think anyone here(at FT) will have too much time to
work on it for a month or too. We are really trying to get 1.0 out.
4Suite has been in beta for 3 years as of June 1 :)
This isn't to say that someone else can't do it and we'll help when
where we can.
Mike
>
> Regards,
> Martin
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Mon May 14 02:57:53 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 13 May 2001 19:57:53 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> <3AFEE186.99AFB1EF@fourthought.com> <200105132017.f4DKHMC20333@mira.informatik.hu-berlin.de>
Message-ID: <3AFF3BA1.DB51A55A@fourthought.com>
"Martin v. Loewis" wrote:
>
> > I'd favor cleanUp().
>
> On the node, or on the DOM implementation?
DOMImplementation.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Mon May 14 02:59:44 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 13 May 2001 19:59:44 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> <3AFEE186.99AFB1EF@fourthought.com> <200105132017.f4DKHMC20333@mira.informatik.hu-berlin.de> <3AFF2F6C.B1350B6D@FourThought.com>
Message-ID: <3AFF3C10.32E79FDA@fourthought.com>
Mike Olson wrote:
>
> "Martin v. Loewis" wrote:
> >
> > > I'd favor cleanUp().
> >
> > On the node, or on the DOM implementation?
>
> I'm infavor of on the node. It would be a lot easier to access. If it
> was on the implementation, you would need more logic to release an
> arbitrary node as only the document has the implementation reference
> (and document's don't have an owner document)
Fine with me.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Mon May 14 03:10:09 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 13 May 2001 20:10:09 -0600
Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib
References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com>
Message-ID: <3AFF3E81.7473BD6C@fourthought.com>
Mike Olson wrote:
>
> "Martin v. Loewis" wrote:
> >
> > I've tried to update my 4XSLT port to use the 4Suite 0.11 code base,
> > only to discover that the StyleseetReader class is now much stronger
> > connected to Ft.Lib than before, in particular to classes from
> > pDomletteReader, and their specific instance attributes.
>
> I was just in there as well and quite suprised how complex the code has
> become. I thought of doing some work on it but figured, it ain't
> broke.....
This is a false impression. The code is actually quite simpler than it
was before. In the past, we had the code for mapping prefixes to NSUris
releated in pDomlette/PyExpat, pDomlette/SAX and StylesheetReader. Now
it's in a single place. There are many other places where code is now
shared where before it was duplicated.
It certainly needs a lot of polish still: the main problem is that all
the reader systems have evolved separately, and mix-in based
implementation merging is probbaly the best solution.
> My thoughts were that the implementation should be able to hadle it,
> then there would be on reader. all of the code in the Stylesheet Reader
> would be handled in StylesheetDocument.createElement, or atleast the
> marority of it. I haven't looked too closely to see if this is 100%
> feasible thought.
I don't favor this. I think tight coupling with the parse mechanism is
important for efficiency. It would be better to hav e a separate
fall-back Stylesheet Reader that did things throught DOM interface only
(althought I'm not sure what this would buy us since the same amount of
work would then need to be done in the DOM implementation).
> > I took the approach of providing alternative base classes to the ones
> > provided by pDomlette, but that soon became a desaster since none of
> > the minidom/pulldom classes bear any relationship to how the
> > PyExpatReader and Handler classes work.
>
> Is pDomlette the only import from Ft.Lib? If so, why not move pDomlette
> into xml.utils? Better yet, let's merge pDomlette and minidom so there
> is only one domlette. pDomlette has greatly out grown its original
> purpose so I have not problems with moving it into XML-Sig.
I disagree with the idea of merging pDomlette and minidom, but I have no
problem mocing pDomlette to xml.utils.
> > I'd still like pursue my attempt of integrating 4XSLT to work without
> > Ft.Lib, and pDomlette in particular, but I'd need some advise here. I
> > feel that I miss some grand picture in all these classes, and how they
> > are connected. It seems that the authors of the code lose track, too,
> > with code duplication all over the place.
>
> I agree. There was a lot of redundant code when I looked into it last.
> I think there should be one xml-sig "reader" that works off a
> DOMImplementation to create actual instances.
Disagree. See above. Things can be parameterized more usign DIMImp,
but not at the parser interface level.
> Some things to note are that this would slow things down. One big speed
> increase the pDomlette gives us by having its own reader is that it can create
> elements directly and not have to use the createElementNS interface. The problem
> with the interface is that we have to do a "prefix + ':' + localName"
> just to satisfy the interface (and then the function itself does a
> sting.split(qname,':'). Not really a time consuming process, but when
> you call it 10000 it adds up.
There's more to it than just this. There is a lot about the DOM factory
interfaces that is very inefficient.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uogbuji@fourthought.com Mon May 14 04:31:14 2001
From: uogbuji@fourthought.com (Uche Ogbuji)
Date: Sun, 13 May 2001 21:31:14 -0600
Subject: [XML-SIG] [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT (fwd)
Message-ID: <200105140331.f4E3VEt12406@localhost.local>
------- Forwarded Message
Return-Path:
Received: from mail.fourthought.com [204.144.146.185]
by localhost with IMAP (fetchmail-5.6.8)
for uogbuji@localhost (single-drop); Sun, 13 May 2001 20:10:58 -0600 (MDT)
Received: from mail.python.org (mail.python.org [63.102.49.29])
by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4E18N706668
for ; Sun, 13 May 2001 19:08:23 -0600
Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org)
by mail.python.org with esmtp (Exim 3.21 #1)
id 14z6qB-0004Y8-00; Sun, 13 May 2001 21:08:03 -0400
Received: from [204.144.146.185] (helo=yen.fourthought.com)
by mail.python.org with esmtp (Exim 3.21 #1)
id 14z6q5-0004Wh-00
for python-dev@python.org; Sun, 13 May 2001 21:07:57 -0400
Received: from FourThought.com (IDENT:molson@usrtcc1-pool2-38.prolynx.com
[63.122.17.102])
by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4E17k706656;
Sun, 13 May 2001 19:07:46 -0600
Message-ID: <3AFF2E8B.31B9ED97@FourThought.com>
From: Mike Olson
Organization: FourThought, Inc
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: "Martin v. Loewis"
CC: 4suite@fourthought.com, python-dev@python.org
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
<3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.
hu-berlin.de>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
Sender: python-dev-admin@python.org
Errors-To: python-dev-admin@python.org
X-BeenThere: python-dev@python.org
X-Mailman-Version: 2.0.5 (101270)
Precedence: bulk
List-Help:
List-Post:
List-Subscribe: ,
List-Id: Python core developers
List-Unsubscribe: ,
List-Archive:
Date: Sun, 13 May 2001 19:02:03 -0600
"Martin v. Loewis" wrote:
>
> > What if we put these on the implementation, that or came up with a
> > standard interface on the node. Then, every DOM imp that wants to be
> > compatible with xpath/xslt needs to support this interface?
> >
> >
> > node.ownerDocument.implementation.releaseNode(node)
> >
> > or
> >
> > node.py_unlink()
>
> releaseNode sounds good to me; it is unlikely that W3C would give an
> operation that name but a different meaning. Any objections?
Should we standardize all of the python xml extensions with a py
prefix? pyReleaseNode or py_releaseNode? Then we will never have to
worry about a name clash.
Mike
>
> Regards,
> Martin
- --
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
------- End of Forwarded Message
From martin@loewis.home.cs.tu-berlin.de Mon May 14 06:42:58 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 07:42:58 +0200
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFF32F0.8AAAED0C@FourThought.com> (message from Mike Olson on
Sun, 13 May 2001 19:20:48 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <3AFED01A.3FF0E6F9@FourThought.com> <200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de> <3AFF32F0.8AAAED0C@FourThought.com>
Message-ID: <200105140542.f4E5gwX01307@mira.informatik.hu-berlin.de>
> However, are we ready to move to level III? Is level III ready to be
> moved too?
No, and no. I would not actively change or drop existing code until
DOM Level 3 is almost finished (proposed recommendation, or some
such). It's just a thing to take into consideration when designing new
code.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Mon May 14 06:39:34 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 07:39:34 +0200
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFF2F6C.B1350B6D@FourThought.com> (message from Mike Olson on
Sun, 13 May 2001 19:05:48 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFE8DEA.F92054CB@fourthought.com> <200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de> <3AFEC8EC.D6CFC2F2@fourthought.com> <200105131904.f4DJ4d114214@mira.informatik.hu-berlin.de> <3AFEE186.99AFB1EF@fourthought.com> <200105132017.f4DKHMC20333@mira.informatik.hu-berlin.de> <3AFF2F6C.B1350B6D@FourThought.com>
Message-ID: <200105140539.f4E5dYx01305@mira.informatik.hu-berlin.de>
> >
> > > I'd favor cleanUp().
> >
> > On the node, or on the DOM implementation?
>
> I'm infavor of on the node. It would be a lot easier to access. If it
> was on the implementation, you would need more logic to release an
> arbitrary node as only the document has the implementation reference
> (and document's don't have an owner document)
In that case, I'd prefer unlink, since this is what is already
documented for minidom.
Regards,
Martin
From uche.ogbuji@fourthought.com Mon May 14 08:06:00 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 14 May 2001 01:06:00 -0600
Subject: [XML-SIG] [Fwd: [4suite] ReleaseNode interface in 4XSLT]
Message-ID: <3AFF83D8.AFF83E34@fourthought.com>
This is a multi-part message in MIME format.
--------------8D5F80E05CA0787A819F3271
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
--------------8D5F80E05CA0787A819F3271
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Return-Path: <4suite-admin@dollar.fourthought.com>
Received: from dollar.fourthought.com ([204.144.146.184])
by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4E5vl722886;
Sun, 13 May 2001 23:57:47 -0600
Received: from dollar.fourthought.com (localhost.localdomain [127.0.0.1])
by dollar.fourthought.com (8.9.3/8.9.3) with ESMTP id XAA24241;
Sun, 13 May 2001 23:52:18 -0600
Received: from yen.fourthought.com (bastion.fourthought.com [204.144.146.185])
by dollar.fourthought.com (8.9.3/8.9.3) with ESMTP id XAA24066
for <4suite@dollar.fourthought.com>; Sun, 13 May 2001 23:50:08 -0600
Received: from mail.cs.tu-berlin.de (root@mail.cs.tu-berlin.de [130.149.17.13])
by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f4E5t7722581;
Sun, 13 May 2001 23:55:07 -0600
Received: from mira.informatik.hu-berlin.de (loewis.home.cs.tu-berlin.de [130.149.147.34])
by mail.cs.tu-berlin.de (8.9.3/8.9.3) with ESMTP id HAA28334;
Mon, 14 May 2001 07:54:00 +0200 (MET DST)
Received: (from martin@localhost)
by mira.informatik.hu-berlin.de (8.10.2/8.10.2/SuSE Linux 8.10.0-0.3) id f4E5cOb01301;
Mon, 14 May 2001 07:38:24 +0200
Message-Id: <200105140538.f4E5cOb01301@mira.informatik.hu-berlin.de>
From: "Martin v. Loewis"
To: Mike.Olson@fourthought.com
CC: 4suite@fourthought.com, python-dev@python.org
In-reply-to: <3AFF2E8B.31B9ED97@FourThought.com> (message from Mike Olson on
Sun, 13 May 2001 19:02:03 -0600)
Subject: Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> <3AFF2E8B.31B9ED97@FourThought.com>
User-Agent: REMI/1.14.2 (=?ISO-8859-4?Q?Hokuhoku-=D2shima?=) Chao/1.14.1
(=?ISO-8859-4?Q?Rokujiz=F2?=) APEL/10.2 Emacs/20.7 (i386-suse-linux)
MULE/4.0 (HANANOEN)
MIME-Version: 1.0 (generated by REMI 1.14.2 - =?ISO-8859-4?Q?=22Hokuhoku-=D2?=
=?ISO-8859-4?Q?shima=22?=)
Content-Type: text/plain; charset=US-ASCII
Sender: 4suite-admin@dollar.fourthought.com
Errors-To: 4suite-admin@dollar.fourthought.com
X-BeenThere: 4suite@lists.fourthought.com
X-Mailman-Version: 2.0beta6
Precedence: bulk
List-Help:
List-Post:
List-Subscribe: ,
List-Id: Users and support for 4Suite tools <4suite.lists.fourthought.com>
List-Unsubscribe: ,
List-Archive: http://lists.fourthought.com/pipermail/4suite/
Date: Mon, 14 May 2001 07:38:24 +0200
> Should we standardize all of the python xml extensions with a py
> prefix? pyReleaseNode or py_releaseNode? Then we will never have to
> worry about a name clash.
IMO, no. The entire interface together is the Python DOM mapping. In
the unlikely event of a name clash, we could still decide to rename
the DOM function, or find some other magic (e.g. overloading on the
argument count).
Regards,
Martin
_______________________________________________
4suite mailing list
4suite@lists.fourthought.com
http://lists.fourthought.com/mailman/listinfo/4suite
--------------8D5F80E05CA0787A819F3271--
From martin@loewis.home.cs.tu-berlin.de Mon May 14 08:26:46 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 09:26:46 +0200
Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib
In-Reply-To: <3AFEE106.4C99F9FD@fourthought.com> (message from Uche Ogbuji on
Sun, 13 May 2001 13:31:18 -0600)
References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFEE106.4C99F9FD@fourthought.com>
Message-ID: <200105140726.f4E7QkI01878@mira.informatik.hu-berlin.de>
>> It seems to me that all StylesheetReader does is to
>> create a DOM tree, except that it creates StylesheetElement nodes
>> where a normal DOM build would create Element nodes.
> Wow. I'd count this a huge oversimplification. The Stylesheet reader
> does a great deal that most readers needn't worry about, as I'd think
> would be obvious from a glance at te code.
I'd like to discuss specific aspects, then. Looking at the current
public CVS, I see:
fromStream: duplicates ReaderMixin.fromStream, then adds call to
sheet.setup(), and some error handling
initParser: duplicates PyExpatReader.initParser. It uses
Utf8OnlyHandler sometimes, but I could not find that class.
_completeTextNode: creates LiteralText instead of Text nodes. Also does
not deal with top_node, but I'm not sure whether this is on
purpose
_initializeSheet: has no equivalent elsewhere
_handleExtUris: has no equivalent elsewhere
processingInstruction: Does *not* create PI nodes
comment: Likewise
startElement: great similarities with Handler.startElement. The significant
differences seem to be:
- creates element nodes based on g_mappings[nsuri][localname],
extension tables, or creates LiteralElement
- processes xsl:include somehow (?)
- passes attributes through _handleExtUris for xsl:stylesheet
endElement: great overload with Handler.endElement; I could not tell
whether differences are on purpose or by mistake
characters: does not deal with _includeDepth and force8Bit (again, this
might be by mistake)
Did I miss aspects of the functionality relevant to proper operation
of the StylesheetReader?
So all in all, it still seems to me that the essential difference is
what nodes are created; the control logic and parsing data structures
seem to be duplicates of the code found in the handler.
That, in turn, suggests that using a standard DOM builder with a
different DOM implementation would achieve the same effect.
Regards,
Martin
From fdrake@acm.org Mon May 14 15:08:44 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 14 May 2001 10:08:44 -0400 (EDT)
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFEE248.CA8C2BC4@fourthought.com>
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
<3AFE8DEA.F92054CB@fourthought.com>
<3AFED01A.3FF0E6F9@FourThought.com>
<200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de>
<3AFEE248.CA8C2BC4@fourthought.com>
Message-ID: <15103.59116.344325.572131@cj42289-a.reston1.va.home.com>
Uche Ogbuji writes:
> I'm quite familiar with DOM Level 3, but the Reader architecture
> predates this, and there is no immediate prospect of time to move to the
> Level 3 interfaces. Perhaps in a month or two. Of course, this could
> be accelerated by contributions.
Parsed XML is already starting to support the Level 3 interfaces,
most interestingly, the Load portion of the Load/Save "feature". (I
just haven't had time to spend on the Save portion.)
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From fdrake@acm.org Mon May 14 19:11:01 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 14 May 2001 14:11:01 -0400 (EDT)
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFF32F0.8AAAED0C@FourThought.com>
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
<3AFE8DEA.F92054CB@fourthought.com>
<3AFED01A.3FF0E6F9@FourThought.com>
<200105131912.f4DJCdj14251@mira.informatik.hu-berlin.de>
<3AFF32F0.8AAAED0C@FourThought.com>
Message-ID: <15104.8117.108130.195638@cj42289-a.reston1.va.home.com>
Mike Olson writes:
> However, are we ready to move to level III? Is level III ready to be
> moved too?
I agree with Martin on this: it's not ready. The "Load"
specification is pretty reasonable, but it's still fairly preliminary
as well.
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From fdrake@acm.org Mon May 14 19:24:17 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 14 May 2001 14:24:17 -0400 (EDT)
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFED1F4.C11668EF@FourThought.com>
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
<3AFE8DEA.F92054CB@fourthought.com>
<200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de>
<3AFED1F4.C11668EF@FourThought.com>
Message-ID: <15104.8913.239603.628509@cj42289-a.reston1.va.home.com>
Mike Olson writes:
> This is why I vote for either the implementation has the releaseNode
> function, or the node itself.
Putting such a method on the node makes the most sense, if the
method makes sense at all. This allows different classes within an
implementation to do the right thing without the dispatching overhead,
and makes the most sense for implementations which can be subclassed.
I am a little concerned about the method, however, because I see two
different possibilities. One is the "I don't need you anymore; don't
bother me" option (equivalent to DECREF), and the other is "Break all
your internal links and die", equivalent to the minidom .unlink()
method. From the discussion so far, I'm getting the sense that the
latter is what is being discussed, and this is not always
appropriate. To build DOM trees to use with the XPath/XSLT engines,
would I need to provide an empty .releaseNode(), since the DOM trees
are persistent and have lifetimes far beyond the individual use for
them with a specific transformation?
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From jmurray@agyinc.com Mon May 14 19:22:05 2001
From: jmurray@agyinc.com (Joe Murray)
Date: Mon, 14 May 2001 11:22:05 -0700
Subject: [XML-SIG] building XML docs using ?
Message-ID: <3B00224D.AFB2057D@agyinc.com>
Dear All,
I am converting many large "legacy" text files to XML. Some of the
original text files are upwards of 100 MB. What is the most efficient,
using the speed/memory metrics, way to convert these text files to XML?
Currently, I parse through the text files and create a DOM Document
representation. However, the time and memory expenditure for conversion
is huge, using either xml.dom.minidom or xml.dom. Here's an example of
what I do:
----------
# import stuff
from xml.dom.minidom import Document
# create doc and documentElement node
doc = Document()
docelement = doc.appendChild(...)
f = open(...)
..
while 1:
# get data from file
line = f.readline()
if not line:
break
line = line.strip()
data = line.split(...)
# create a new element node using data from file
node = doc.createElement(...)
node.setAttribute(...)
node.appendChild(...)
docelement.appendChild(node)
...
----------
Should I forgo the ease of using the DOM objects by simply generating
outputting "hand-generated" markup? I was doing this previously, it's
efficient, but definitely not as nice/clean as it could be...
So basically, is there a lightweight XML module which provides for (as a
graphics programmer would say) "immediate mode" output, with as nice an
interface as the DOM modules? Oh, and BTW, can XML solve all my
problems??? ;-)
Thanks much,
joe
--
Joseph Murray
Bioinformatics Specialist, AGY Therapeutics
290 Utah Avenue, South San Francisco, CA 94080
(650) 228-1146
From fdrake@acm.org Mon May 14 20:23:57 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 14 May 2001 15:23:57 -0400 (EDT)
Subject: [XML-SIG] building XML docs using ?
In-Reply-To: <3B00224D.AFB2057D@agyinc.com>
References: <3B00224D.AFB2057D@agyinc.com>
Message-ID: <15104.12493.360699.521399@cj42289-a.reston1.va.home.com>
Joe Murray writes:
> Currently, I parse through the text files and create a DOM Document
> representation. However, the time and memory expenditure for conversion
> is huge, using either xml.dom.minidom or xml.dom. Here's an example of
> what I do:
Instead of building a DOM tree, send events to a SAX output
generator. This avoids keeping your entire document in memory. The
xml.sax.writer module provides this, and there may be others. (Be
sure to get the xml.sax.writer from CVS though; I just fixed a really
stupid bug...)
> ----------
>
> # import stuff
> from xml.dom.minidom import Document
>
> # create doc and documentElement node
> doc = Document()
> docelement = doc.appendChild(...)
> f = open(...)
> ..
> while 1:
>
> # get data from file
> line = f.readline()
> if not line:
> break
> line = line.strip()
> data = line.split(...)
>
> # create a new element node using data from file
> node = doc.createElement(...)
> node.setAttribute(...)
> node.appendChild(...)
> docelement.appendChild(node)
This would end up looking more like:
writer = xml.sax.writer.XmlWriter(f)
while 1:
# get data from file
...
# write new element to output:
writer.startElement("item", {"attr": value})
writer.characters(data)
writer.endElement("item")
writer.characters("\n") # record separator, unless you're
# using the PrettyPrinter version
f.close()
> So basically, is there a lightweight XML module which provides for (as a
> graphics programmer would say) "immediate mode" output, with as nice an
> interface as the DOM modules? Oh, and BTW, can XML solve all my
> problems??? ;-)
XML is an acronym, and as everyone knows, acronyms solve problems.
All of them. So, yes, life will be perfect with your new-found TLA. ;)
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From martin@loewis.home.cs.tu-berlin.de Mon May 14 21:19:42 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 22:19:42 +0200
Subject: [XML-SIG] building XML docs using ?
In-Reply-To: <3B00224D.AFB2057D@agyinc.com> (message from Joe Murray on Mon,
14 May 2001 11:22:05 -0700)
References: <3B00224D.AFB2057D@agyinc.com>
Message-ID: <200105142019.f4EKJgR05670@mira.informatik.hu-berlin.de>
> I am converting many large "legacy" text files to XML. Some of the
> original text files are upwards of 100 MB. What is the most efficient,
> using the speed/memory metrics, way to convert these text files to XML?
The less markup, the less the memory overhead, and the faster the
processing. So if you have a plain text file with contents XXX, the
most efficient XML document you could get (from the viewpoint of
parsing speed) is
XXX
Provided there is no markup in XXX, this is also the smallest XML
document storing all bytes of XXX :-)
> Currently, I parse through the text files and create a DOM Document
> representation.
Ah, so you are apparently bound by some DTD. In that case, it very
much depends on how complex the transformation is.
> node = doc.createElement(...)
> node.setAttribute(...)
> node.appendChild(...)
> docelement.appendChild(node)
So you create one element per line, in a single pass over the file?
That is quite a simple conversion procedure.
> Should I forgo the ease of using the DOM objects by simply generating
> outputting "hand-generated" markup?
Yes, definitely.
> I was doing this previously, it's efficient, but definitely not as
> nice/clean as it could be...
Why is that? If you create the right template for a single line, e.g.
template = '%s'
then a simple print statement would suffice to fill out this template.
This also make a nice separation of structure and content.
> So basically, is there a lightweight XML module which provides for (as a
> graphics programmer would say) "immediate mode" output, with as nice an
> interface as the DOM modules?
You could use the SAX interfaces, essentially implementing a Reader
class, and using an xml.sax.XMLGenerator as the content handler.
Then, you'd do proper startElement and endElement calls; the
XMLGenerator will do immediate output.
> Oh, and BTW, can XML solve all my problems??? ;-)
Almost. To get rich quick, you still need to write chain letters :-)
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Mon May 14 21:21:31 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 22:21:31 +0200
Subject: [XML-SIG] building XML docs using ?
In-Reply-To: <15104.12493.360699.521399@cj42289-a.reston1.va.home.com>
(fdrake@acm.org)
References: <3B00224D.AFB2057D@agyinc.com> <15104.12493.360699.521399@cj42289-a.reston1.va.home.com>
Message-ID: <200105142021.f4EKLVb05674@mira.informatik.hu-berlin.de>
> This would end up looking more like:
>
> writer = xml.sax.writer.XmlWriter(f)
That's a SAX1 class, right? The SAX2 class is
xml.sax.saxutils.XMLGenerator.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Mon May 14 21:09:50 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 22:09:50 +0200
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <15104.8913.239603.628509@cj42289-a.reston1.va.home.com>
(fdrake@acm.org)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
<3AFE8DEA.F92054CB@fourthought.com>
<200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de>
<3AFED1F4.C11668EF@FourThought.com> <15104.8913.239603.628509@cj42289-a.reston1.va.home.com>
Message-ID: <200105142009.f4EK9oP05647@mira.informatik.hu-berlin.de>
> Putting such a method on the node makes the most sense, if the
> method makes sense at all. This allows different classes within an
> implementation to do the right thing without the dispatching overhead,
> and makes the most sense for implementations which can be subclassed.
I agree. Making it a non-method is a suggestion you might get from a
C++ programmer; the C++ equivalen - "delete this;" - bad style since
you might run a method of the object that is being destroyed. Of
course, in Python, this is not a problem.
> I am a little concerned about the method, however, because I see two
> different possibilities. One is the "I don't need you anymore; don't
> bother me" option (equivalent to DECREF), and the other is "Break all
> your internal links and die", equivalent to the minidom .unlink()
> method.
I can't understand the value of the first option. If you don't need an
Element or a document anymore which somebody else might be holding
onto, you can just drop it, right?
> From the discussion so far, I'm getting the sense that the
> latter is what is being discussed, and this is not always
> appropriate. To build DOM trees to use with the XPath/XSLT engines,
> would I need to provide an empty .releaseNode(), since the DOM trees
> are persistent and have lifetimes far beyond the individual use for
> them with a specific transformation?
Not necessarily. Currently, 4XSLT uses ReleaseNode e.g. to release a
style sheet, in a data flow:
- read the style sheet using the StylesheetReader from an XML document
(i.e. a byte stream)
- process the style sheet
- release it
Another application is with result tree fragments: when instantiating
an element, nodes get cloned over and over, and temporary results need
to be released.
There may be also cases where 4XSLT releases elements it did not
create; I'd consider that a bug.
I don't think we should introduce explicit reference counters for
documents or some such; we should strive for less memory management,
not more.
Regards,
Martin
From fdrake@acm.org Mon May 14 21:26:24 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 14 May 2001 16:26:24 -0400 (EDT)
Subject: [XML-SIG] building XML docs using ?
In-Reply-To: <200105142021.f4EKLVb05674@mira.informatik.hu-berlin.de>
References: <3B00224D.AFB2057D@agyinc.com>
<15104.12493.360699.521399@cj42289-a.reston1.va.home.com>
<200105142021.f4EKLVb05674@mira.informatik.hu-berlin.de>
Message-ID: <15104.16240.140204.456352@cj42289-a.reston1.va.home.com>
Martin v. Loewis writes:
> That's a SAX1 class, right? The SAX2 class is
> xml.sax.saxutils.XMLGenerator.
That's right.
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From fdrake@acm.org Mon May 14 21:37:09 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 14 May 2001 16:37:09 -0400 (EDT)
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <200105142009.f4EK9oP05647@mira.informatik.hu-berlin.de>
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
<3AFE8DEA.F92054CB@fourthought.com>
<200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de>
<3AFED1F4.C11668EF@FourThought.com>
<15104.8913.239603.628509@cj42289-a.reston1.va.home.com>
<200105142009.f4EK9oP05647@mira.informatik.hu-berlin.de>
Message-ID: <15104.16885.755115.164847@cj42289-a.reston1.va.home.com>
Martin v. Loewis writes:
> I can't understand the value of the first option. If you don't need an
> Element or a document anymore which somebody else might be holding
> onto, you can just drop it, right?
You can't do that in minidom without requiring cyclic GC, and that's
not available for all projects thanks to users of legacy Python
versions. I'm really learning to dislike Python 1.5.2. ;-(
> Not necessarily. Currently, 4XSLT uses ReleaseNode e.g. to release a
> style sheet, in a data flow:
> - read the style sheet using the StylesheetReader from an XML document
> (i.e. a byte stream)
> - process the style sheet
> - release it
>
> Another application is with result tree fragments: when instantiating
> an element, nodes get cloned over and over, and temporary results need
> to be released.
OK, this makes sense. As long as it only releases nodes that it
creates and does not use as part of the result, that's fine. As long
as I can create a stylesheet and store it as a persistent object,
create and store a bunch of documents, and then process them over &
over without damaging them, and make the results persistent and usable
in the same fashion, I'm happy. ;-)
> There may be also cases where 4XSLT releases elements it did not
> create; I'd consider that a bug.
Agreed!
> I don't think we should introduce explicit reference counters for
> documents or some such; we should strive for less memory management,
> not more.
Agreed as well. If we can rely on GC, then I'm all for it. I just
wanted to be sure that we were clear on the semantics of
.releaseNode(), since it has a large potential for disaster.
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From larsga@garshol.priv.no Mon May 14 22:57:13 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 14 May 2001 23:57:13 +0200
Subject: [XML-SIG] building XML docs using ?
In-Reply-To: <3B00224D.AFB2057D@agyinc.com>
References: <3B00224D.AFB2057D@agyinc.com>
Message-ID:
* Joe Murray
|
| So basically, is there a lightweight XML module which provides for
| (as a graphics programmer would say) "immediate mode" output, with
| as nice an interface as the DOM modules?
As Martin says SAX has the advantage that it does not store the entire
document in memory and so can be used to write applications that
operate with a fixed amount of memory (more or less). Unless your
document structure is too complex I would go for this.
minidom also has mechanisms that can be used to build only parts of
the tree at a time and throw them away afterwards. This may or may not
work for your processing. These mechanisms are not documented, either,
so it may be tricky to get them to work.
Pyxie also has support for building partial trees and discarding them
as you go. As an additional benefit it has an API that, IMHO, is far
nicer than the DOM API. It's unlikely to be very fast, though.
| Oh, and BTW, can XML solve all my problems??? ;-)
I'm afraid not. You'll need topic maps for that... :-)
--Lars M.
From tpassin@home.com Mon May 14 23:38:20 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Mon, 14 May 2001 18:38:20 -0400
Subject: [XML-SIG] building XML docs using ?
References: <3B00224D.AFB2057D@agyinc.com>
Message-ID: <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com>
[Lars Marius Garshol]
>
> * Joe Murray
> ...
> > Oh, and BTW, can XML solve all my problems??? ;-)
>
> I'm afraid not. You'll need topic maps for that... :-)
>
Hey, the man needs speed here .... :-)
Tom P
From Mike.Olson@fourthought.com Tue May 15 05:44:51 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Mon, 14 May 2001 22:44:51 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
<3AFE8DEA.F92054CB@fourthought.com>
<200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de>
<3AFED1F4.C11668EF@FourThought.com> <15104.8913.239603.628509@cj42289-a.reston1.va.home.com>
Message-ID: <3B00B443.18219553@FourThought.com>
"Fred L. Drake, Jr." wrote:
>
> Mike Olson writes:
> > This is why I vote for either the implementation has the releaseNode
> > function, or the node itself.
>
> I am a little concerned about the method, however, because I see two
> different possibilities. One is the "I don't need you anymore; don't
> bother me" option (equivalent to DECREF), and the other is "Break all
> your internal links and die", equivalent to the minidom .unlink()
> method. From the discussion so far, I'm getting the sense that the
> latter is what is being discussed, and this is not always
> appropriate. To build DOM trees to use with the XPath/XSLT engines,
> would I need to provide an empty .releaseNode(), since the DOM trees
> are persistent and have lifetimes far beyond the individual use for
> them with a specific transformation?
It depends on the interface into the XSLT/XPath engine. They way
4XSLT/4XPath is set up, if you pass us a DOM node to process, we won't
touch it. It is your DOM node, you job to release it. However, if you
call appendStylesheetUri (as an example) we create a DOM node, and we
will release it when processing is done. Currently, you can call
"setDocumentReader" on the 4XSLT processor to use anything that conforms
to the Reader interface when fromUri, fromString, fromStream are
called. We then call the coresponding releaseNode on the documetn
reader to free the DOM tree when we are done with it.
So, I guess I still see plenty of cases where "unlink" makes sense.
When would you want to use the DECREF equiv.?
Mike
>
> -Fred
>
> --
> Fred L. Drake, Jr.
> PythonLabs at Digital Creations
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From larsga@garshol.priv.no Tue May 15 08:17:10 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 15 May 2001 09:17:10 +0200
Subject: [XML-SIG] building XML docs using ?
In-Reply-To: <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com>
References: <3B00224D.AFB2057D@agyinc.com> <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com>
Message-ID:
* Lars Marius Garshol
|
| I'm afraid not. You'll need topic maps for that... :-)
* Thomas B. Passin
|
| Hey, the man needs speed here .... :-)
SMOO. :-)
--Lars M.
From rsalz@zolera.com Tue May 15 15:02:21 2001
From: rsalz@zolera.com (Rich Salz)
Date: Tue, 15 May 2001 10:02:21 -0400
Subject: [XML-SIG] Parsing namespace attributes (e.g., xml.dom.ext.GetAllNs)
Message-ID: <3B0136ED.EC1EE700@zolera.com>
According to my reading of the namespace spec, "xmlns" is not a
namespace identifier, but is instead just lexically significant. Yet
xml.dom (cf Document.py and ext/__init__.py) treats it as if it were a
namespace, and uses it to find namespace nodes. Is that just an
implementation technique?
Where is the "xmlns" defined in a W3 recommendation? For example, in
dom/__init__.py:
XMLNS_NAMESPACE = "http://www.w3.org/2000/xmlns/"
I can't find that value in W3C docs -- what am I missing?
I'm asking for a couple of reasons. First, I might be missing something
on the specs. Second, I need to add this to xml/ns.py if it's really
there, and third, it seems that if I'm write, then there's a
(minor/obscure) bug.
value
What namespace is "testit" really in? I believe uri:zolera.com
/r$
From fdrake@acm.org Tue May 15 15:06:01 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 15 May 2001 10:06:01 -0400 (EDT)
Subject: [XML-SIG] Parsing namespace attributes (e.g., xml.dom.ext.GetAllNs)
In-Reply-To: <3B0136ED.EC1EE700@zolera.com>
References: <3B0136ED.EC1EE700@zolera.com>
Message-ID: <15105.14281.196876.100997@cj42289-a.reston1.va.home.com>
Rich Salz writes:
> Where is the "xmlns" defined in a W3 recommendation? For example, in
> dom/__init__.py:
> XMLNS_NAMESPACE = "http://www.w3.org/2000/xmlns/"
> I can't find that value in W3C docs -- what am I missing?
AFAICR, this is noted in the DOM Level 2 specification, with a note
that it was an oversight in the Namespaces in XML recommendation that
the W3C intends to correct in some future version. I haven't checked
the errata for the Namespaces recommendation, however.
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From rsalz@zolera.com Tue May 15 15:35:34 2001
From: rsalz@zolera.com (Rich Salz)
Date: Tue, 15 May 2001 10:35:34 -0400
Subject: [XML-SIG] Parsing namespace attributes (e.g., xml.dom.ext.GetAllNs)
References: <3B0136ED.EC1EE700@zolera.com> <15105.14281.196876.100997@cj42289-a.reston1.va.home.com>
Message-ID: <3B013EB6.A9EE8032@zolera.com>
> AFAICR, this is noted in the DOM Level 2 specification
Aha, found it.
"Note: In the DOM, all namespace declaration attributes are by
definition bound to the namespace URI: "http://www.w3.org/2000/xmlns/".
These are the attributes whose namespace prefix or qualified name is
"xmlns". Although, at the time of writing, this is not part of the XML
Namespaces specification [Namespaces], it is planned to be incorporated
in a future revision."
I won't hold my breath waiting for a revision of the XML Namespace spec,
which seems pretty clear that xmlns is lexical, so I'd anticipate a
fight. :)
Thanks.
/r$
From fdrake@acm.org Tue May 15 16:40:26 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 15 May 2001 11:40:26 -0400 (EDT)
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3B00B443.18219553@FourThought.com>
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
<3AFE8DEA.F92054CB@fourthought.com>
<200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de>
<3AFED1F4.C11668EF@FourThought.com>
<15104.8913.239603.628509@cj42289-a.reston1.va.home.com>
<3B00B443.18219553@FourThought.com>
Message-ID: <15105.19946.344121.580203@cj42289-a.reston1.va.home.com>
Mike Olson writes:
> It depends on the interface into the XSLT/XPath engine. They way
> 4XSLT/4XPath is set up, if you pass us a DOM node to process, we won't
> touch it. It is your DOM node, you job to release it. However, if you
> call appendStylesheetUri (as an example) we create a DOM node, and we
> will release it when processing is done. Currently, you can call
> "setDocumentReader" on the 4XSLT processor to use anything that conforms
> to the Reader interface when fromUri, fromString, fromStream are
> called. We then call the coresponding releaseNode on the documetn
> reader to free the DOM tree when we are done with it.
This sounds pretty reasonable to me.
> So, I guess I still see plenty of cases where "unlink" makes sense.
> When would you want to use the DECREF equiv.?
If you're using something that isn't GC friendly, such as minidom,
you need explicit incref/decref machinery to be able to discard the
document when it is no longer being used. This is less of an issue
with the cycle detector introduced in "modern" Python releases, but is
still a real problem with Python 1.5.2. And there are still a fair
number of users of the older version, for a variety of reasons.
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From Mike.Olson@fourthought.com Tue May 15 16:48:22 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Tue, 15 May 2001 09:48:22 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
<3AFE8DEA.F92054CB@fourthought.com>
<200105131441.f4DEfPe12921@mira.informatik.hu-berlin.de>
<3AFED1F4.C11668EF@FourThought.com>
<15104.8913.239603.628509@cj42289-a.reston1.va.home.com>
<3B00B443.18219553@FourThought.com> <15105.19946.344121.580203@cj42289-a.reston1.va.home.com>
Message-ID: <3B014FC6.FCE8CE4F@FourThought.com>
"Fred L. Drake, Jr." wrote:
>
>
> If you're using something that isn't GC friendly, such as minidom,
> you need explicit incref/decref machinery to be able to discard the
> document when it is no longer being used. This is less of an issue
> with the cycle detector introduced in "modern" Python releases, but is
> still a real problem with Python 1.5.2. And there are still a fair
> number of users of the older version, for a variety of reasons.
So your saying a smarter unlink. either flag that I am no longer using
this document, or completely destroy it if I was the last external
reference to document. I think I see what your saying.
Mike
>
> -Fred
>
> --
> Fred L. Drake, Jr.
> PythonLabs at Digital Creations
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From noreply@sourceforge.net Tue May 15 17:24:17 2001
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Tue, 15 May 2001 09:24:17 -0700
Subject: [XML-SIG] [ pyxml-Bugs-424260 ] error importing Xhtml2HtmlPrinter
Message-ID:
Bugs item #424260, was updated on 2001-05-15 09:24
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=424260&group_id=6473
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: error importing Xhtml2HtmlPrinter
Initial Comment:
>>> import
xml.dom.ext.XHtml2HtmlPrinter
Traceback (innermost last):
File "", line 1, in ?
File
"/usr/lib/python1.5/site-packages/xml/dom/ext/XHtml2HtmlPrinter.py",
line 3, in ?
from xml.dom.html import HTML_FORBIDDEN_END,
XHTML_NAMESPACE
ImportError: cannot import name XHTML_NAMESPACE
Patch for the bug is:
--- XHtml2HtmlPrinter.py Tue Apr 24 20:31:42
2001
+++ /home/alf/XHtml2HtmlPrinter.py Tue May 15
18:18:18 2001
@@ -1,6 +1,7 @@
import string
import Printer
-from xml.dom.html import HTML_FORBIDDEN_END,
XHTML_NAMESPACE
+from xml.dom.html import HTML_FORBIDDEN_END
+from xml.dom import XHTML_NAMESPACE
class HtmlDocType:
name = 'HTML'
Cheers
Alexandre Fayolle
(I could not logging, because it seems there's some
problem with SF and their ssl server)
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=424260&group_id=6473
From Alexandre.Fayolle@logilab.fr Tue May 15 17:41:33 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 15 May 2001 18:41:33 +0200 (CEST)
Subject: [XML-SIG] Python newbie question
Message-ID:
Hi there,
I really feel dumb for asking this... Well here comes anyway.
In xml.dom.ext.Xhtml2HtmlPrinter, there's the following statement:
import Printer
There's also a file called Printer in xml/dom/ext, but xml/dom/ext is not,
as far as I know, in my PYTHONPATH. So how does this work (a pointer to
the right page of TFM is fine by me)?
TIA
Alexandre Fayolle
--
http://www.logilab.com
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).
From fdrake@acm.org Tue May 15 18:06:36 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 15 May 2001 13:06:36 -0400 (EDT)
Subject: [XML-SIG] pyexpat interface issue
Message-ID: <15105.25116.646987.317835@cj42289-a.reston1.va.home.com>
The pyexpat module defines two wrappers for handlers which are
expected to return integers (NotStandaloneHandler and
ExternalEntityRefHandler). What stands out about these handlers is
that Expat is expecting a return value (the others have void
returns). The wrappers will propogate an exception if one is raised
by the Python handler implementation, but then assumes that the return
value is actually an integer. They use PyInt_AsLong() to convert the
return value to an integer, but don't check the return value: if
PyInt_AsLong() returns -1 and PyErr_Occurred() is non-NULL, a
TypeError was raised by PyInt_AsLong() because the value passed to it
was not an integer object. The -1 will be passed to Expat, which will
happily continue parsing since it expects a false value to tell it to
stop parsing. This has been this way for a while.
Should the documentation for these interfaces be modifed to reflect
this (strange) behavior, with some code cleanup to avoid having unused
exception state laying around (which *can* show up later in unrelated
code), or should the implementation be fixed to propogate the
exception, or something else? I'm concerned that changing the actual
behavior will adversely effect existing code that uses pyexpat.
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From jmurray@agyinc.com Tue May 15 18:17:41 2001
From: jmurray@agyinc.com (Joe Murray)
Date: Tue, 15 May 2001 10:17:41 -0700
Subject: [XML-SIG] building XML docs using ?
References: <3B00224D.AFB2057D@agyinc.com> <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com>
Message-ID: <3B0164B5.BFB9EB46@agyinc.com>
Thanks to everyone for their helpful responses. And to probe even
further, into this technology that will "solve all my problems"...
"Thomas B. Passin" wrote:
>
> [Lars Marius Garshol]
> >
> > * Joe Murray
> > ...
> > > Oh, and BTW, can XML solve all my problems??? ;-)
> >
> > I'm afraid not. You'll need topic maps for that... :-)
> >
> Hey, the man needs speed here .... :-)
So, with regard to speed, is there an XSLT processor (python or not)
which take a SAX-like event-driven approach to transforming XML? I know
this doesn't deal fully with the dynamicity of an XSL doc, but it would
be useful. I checked some old xml-dev, xml-sig... I can't vouch for the
people who were discussing such a processor and given the fact that most
of the posts were circa 1999... I couldn't find a straightforward
answer. Does Sablotron support this? It seems as if the Oracle XML
parsers packages do... but after some surfin', I ain't certain...
"Martin v. Loewis" wrote:
> > Should I forgo the ease of using the DOM objects by simply generating
> > outputting "hand-generated" markup?
>
> Yes, definitely.
>
> > I was doing this previously, it's efficient, but definitely not as
> > nice/clean as it could be...
>
> Why is that? If you create the right template for a single line, e.g.
>
> template = '%s'
>
> then a simple print statement would suffice to fill out this template.
> This also make a nice separation of structure and content.
Indeed, this is the route I have gone. I'm using
xml.sax.saxutils.escape, a handy function, in lieu of the SAX writer
interfaces.
All you guys are a helpful bunch!
Regards,
joe
--
Joseph Murray
Bioinformatics Specialist, AGY Therapeutics
290 Utah Avenue, South San Francisco, CA 94080
(650) 228-1146
From larsga@garshol.priv.no Tue May 15 18:35:29 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 15 May 2001 19:35:29 +0200
Subject: [XML-SIG] building XML docs using ?
In-Reply-To: <3B0164B5.BFB9EB46@agyinc.com>
References: <3B00224D.AFB2057D@agyinc.com> <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com> <3B0164B5.BFB9EB46@agyinc.com>
Message-ID:
* Joe Murray
|
| So, with regard to speed, is there an XSLT processor (python or not)
| which take a SAX-like event-driven approach to transforming XML?
Currently there is not, and part of the reason for that is that some
parts of XSLT require the entire document to be available to the
processor at the same time. If you use only a subset of XSLT one can
use an event-based approach, but currently nobody has implemented
anything like this.
However, SAXON has some extensions that can enable you to build only
parts of the tree at a time. This puts some constraints on what you
are able to do, but you may be able to live with it.
| Does Sablotron support this?
It does not.
| It seems as if the Oracle XML parsers packages do... but after some
| surfin', I ain't certain...
I don't think they do, though there is a chance that I might be wrong.
You should in any case distinguish carefully between XML parsers
(nearly all of which have event-based interfaces) and XSLT engines.
--Lars M.
From martin@loewis.home.cs.tu-berlin.de Tue May 15 19:43:04 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 20:43:04 +0200
Subject: [XML-SIG] Python newbie question
In-Reply-To:
(message from Alexandre Fayolle on Tue, 15 May 2001 18:41:33 +0200
(CEST))
References:
Message-ID: <200105151843.f4FIh4Z01461@mira.informatik.hu-berlin.de>
> There's also a file called Printer in xml/dom/ext, but xml/dom/ext is not,
> as far as I know, in my PYTHONPATH. So how does this work (a pointer to
> the right page of TFM is fine by me)?
I don't think the package import procedure is documented anywhere; the
best you can get is
http://www.python.org/doc/essays/packages.html
For your specific question, see Intra-package References.
Regards,
Martin
From tpassin@home.com Wed May 16 00:55:15 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Tue, 15 May 2001 19:55:15 -0400
Subject: [XML-SIG] building XML docs using ?
References: <3B00224D.AFB2057D@agyinc.com> <003a01c0dcc6$94210080$7cac1218@reston1.va.home.com> <3B0164B5.BFB9EB46@agyinc.com>
Message-ID: <002801c0dd9a$7d011960$7cac1218@reston1.va.home.com>
[Joe Murray]
> So, with regard to speed, is there an XSLT processor (python or not)
> which take a SAX-like event-driven approach to transforming XML? I know
> this doesn't deal fully with the dynamicity of an XSL doc, but it would
> be useful. I checked some old xml-dev, xml-sig... I can't vouch for the
> people who were discussing such a processor and given the fact that most
> of the posts were circa 1999... I couldn't find a straightforward
> answer. Does Sablotron support this? It seems as if the Oracle XML
> parsers packages do... but after some surfin', I ain't certain...
>
>
Some processors can do lazy evaluation and thereby avoid computing branches
that aren't used in a particular transformation. I'm pretty sure Xalan does
this, and I think Saxon can be asked to. Of course, if your source document
and transform need to pull together nodes from all parts of the document,
this won't help.
Otherwise, some processors can ingest the xml via SAX as well as the/a DOM,
but they then build their own DOM model.
Cheers,
Tom P
From Alexandre.Fayolle@logilab.fr Wed May 16 10:06:03 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 16 May 2001 11:06:03 +0200 (CEST)
Subject: [XML-SIG] Python newbie question
In-Reply-To: <200105151843.f4FIh4Z01461@mira.informatik.hu-berlin.de>
Message-ID:
On Tue, 15 May 2001, Martin v. Loewis wrote:
> > There's also a file called Printer in xml/dom/ext, but xml/dom/ext is not,
> > as far as I know, in my PYTHONPATH. So how does this work (a pointer to
> > the right page of TFM is fine by me)?
>
> I don't think the package import procedure is documented anywhere; the
> best you can get is
>
> http://www.python.org/doc/essays/packages.html
Thanks for this very interesting pointer. It clarifies a number of notions
for which I only had an intuitive grasp.
Alexandre Fayolle
--
http://www.logilab.com
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).
From uche.ogbuji@fourthought.com Thu May 17 07:59:15 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 17 May 2001 00:59:15 -0600
Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib
In-Reply-To: Message from "Martin v. Loewis"
of "Mon, 14 May 2001 09:26:46 +0200." <200105140726.f4E7QkI01878@mira.informatik.hu-berlin.de>
Message-ID: <200105170659.f4H6xFF13604@localhost.local>
> >> It seems to me that all StylesheetReader does is to
> >> create a DOM tree, except that it creates StylesheetElement nodes
> >> where a normal DOM build would create Element nodes.
>
> > Wow. I'd count this a huge oversimplification. The Stylesheet reader
> > does a great deal that most readers needn't worry about, as I'd think
> > would be obvious from a glance at te code.
>
> I'd like to discuss specific aspects, then. Looking at the current
> public CVS, I see:
>
> fromStream: duplicates ReaderMixin.fromStream, then adds call to
> sheet.setup(), and some error handling
>
> initParser: duplicates PyExpatReader.initParser. It uses
> Utf8OnlyHandler sometimes, but I could not find that class.
>
> _completeTextNode: creates LiteralText instead of Text nodes. Also does
> not deal with top_node, but I'm not sure whether this is on
> purpose
>
> _initializeSheet: has no equivalent elsewhere
> _handleExtUris: has no equivalent elsewhere
>
> processingInstruction: Does *not* create PI nodes
> comment: Likewise
>
> startElement: great similarities with Handler.startElement. The significant
> differences seem to be:
> - creates element nodes based on g_mappings[nsuri][localname],
> extension tables, or creates LiteralElement
> - processes xsl:include somehow (?)
> - passes attributes through _handleExtUris for xsl:stylesheet
>
> endElement: great overload with Handler.endElement; I could not tell
> whether differences are on purpose or by mistake
>
> characters: does not deal with _includeDepth and force8Bit (again, this
> might be by mistake)
>
> Did I miss aspects of the functionality relevant to proper operation
> of the StylesheetReader?
>
> So all in all, it still seems to me that the essential difference is
> what nodes are created; the control logic and parsing data structures
> seem to be duplicates of the code found in the handler.
>
> That, in turn, suggests that using a standard DOM builder with a
> different DOM implementation would achieve the same effect.
There is a lot of state that the StylesheetReader manages that other readers
don't.
This would be very cumbersome to shoe-horn into a standard DOM reader.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Thu May 17 08:03:07 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 17 May 2001 01:03:07 -0600
Subject: [XML-SIG] building XML docs using ?
In-Reply-To: Message from Joe Murray
of "Mon, 14 May 2001 11:22:05 PDT." <3B00224D.AFB2057D@agyinc.com>
Message-ID: <200105170703.f4H738F13626@localhost.local>
> Dear All,
>
> I am converting many large "legacy" text files to XML. Some of the
> original text files are upwards of 100 MB. What is the most efficient,
> using the speed/memory metrics, way to convert these text files to XML?
1) Using SAX
2) Cutting the output docs to reasonable size
I can guarantee you you want nothing to do with XML files in the hundreds of
MB. You don't even want them in the MB, period.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Thu May 17 08:04:48 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 17 May 2001 01:04:48 -0600
Subject: [XML-SIG] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: Message from "Fred L. Drake, Jr."
of "Mon, 14 May 2001 16:37:09 EDT." <15104.16885.755115.164847@cj42289-a.reston1.va.home.com>
Message-ID: <200105170704.f4H74mr13635@localhost.local>
> > There may be also cases where 4XSLT releases elements it did not
> > create; I'd consider that a bug.
>
> Agreed!
I'm not aware of any such case.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From tony.mcdonald@ncl.ac.uk Thu May 17 08:18:29 2001
From: tony.mcdonald@ncl.ac.uk (Tony McDonald)
Date: Thu, 17 May 2001 08:18:29 +0100
Subject: [XML-SIG] Advice needed: RTF->XML conversions
Message-ID:
Hi all,
I'm currently using Omnimark to convert RTF files into a usable form of XML,
ready for uploading into our SQL database.
Omnimark is no longer free, so this means I can't pass on our software to
other HE institutions in the UK.
Can anyone suggest some (preferably python based) tools I can use to get
from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages)
to an XML form?
If someone has written something that takes that (dreadful) 'XML' output
that Word 2001 outputs and cleans it up into valid XML that would be a great
start for me.
Many thanks
Tone.
--
Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/
The Medical School, Newcastle University Tel: +44 191 243 6140
A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope
From Alexandre.Fayolle@logilab.fr Thu May 17 08:49:08 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Thu, 17 May 2001 09:49:08 +0200 (CEST)
Subject: [XML-SIG] Advice needed: RTF->XML conversions
In-Reply-To:
Message-ID:
On Thu, 17 May 2001, Tony McDonald wrote:
> Can anyone suggest some (preferably python based) tools I can use to get
> from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages)
> to an XML form?
>
> If someone has written something that takes that (dreadful) 'XML' output
> that Word 2001 outputs and cleans it up into valid XML that would be a great
> start for me.
I don't have a coded solution, but if I were to do such thing, I'd use the
Automation interface of Word together with python's COM interface on
windows to have Word parse the document for me using the various iterators
available in the Word Document interface and building my own XML.
This can be very simple if your document only uses the basic styles in
word (title 1, text body, toc... [I don't know the english names, only
guessing here]), or dreadful if your document features images, tables,
floating text sections, etc.
Alexandre Fayolle
--
http://www.logilab.com
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).
From tony.mcdonald@ncl.ac.uk Thu May 17 09:14:34 2001
From: tony.mcdonald@ncl.ac.uk (Tony McDonald)
Date: Thu, 17 May 2001 09:14:34 +0100
Subject: [XML-SIG] Advice needed: RTF->XML conversions
In-Reply-To:
Message-ID:
On 17/5/01 8:49 am, "Alexandre Fayolle"
wrote:
> On Thu, 17 May 2001, Tony McDonald wrote:
>
>> Can anyone suggest some (preferably python based) tools I can use to get
>> from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages)
>> to an XML form?
>>
>> If someone has written something that takes that (dreadful) 'XML' output
>> that Word 2001 outputs and cleans it up into valid XML that would be a great
>> start for me.
>
> I don't have a coded solution, but if I were to do such thing, I'd use the
> Automation interface of Word together with python's COM interface on
> windows to have Word parse the document for me using the various iterators
> available in the Word Document interface and building my own XML.
>
We have very little experience of doing things this way - we're a Unix and
Zope shop and try not to get too involved with the inner workings of
Microsoft software (if at all possible).
> This can be very simple if your document only uses the basic styles in
> word (title 1, text body, toc... [I don't know the english names, only
> guessing here]), or dreadful if your document features images, tables,
> floating text sections, etc.
>
> Alexandre Fayolle
Thanks for the advice Alexandre, but it's the latter case I'm afraid :(
Our documents have tables, images, superscripts/subscripts, greek characters
(ie simple formulas), page breaks and more besides.
Cheers
Tone.
--
Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/
The Medical School, Newcastle University Tel: +44 191 243 6140
A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope
From Mike.Olson@fourthought.com Thu May 17 09:24:37 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Thu, 17 May 2001 02:24:37 -0600
Subject: [XML-SIG] Advice needed: RTF->XML conversions
References:
Message-ID: <3B038AC5.B205328F@FourThought.com>
Tony McDonald wrote:
>
> On 17/5/01 8:49 am, "Alexandre Fayolle"
> wrote:
>
> > On Thu, 17 May 2001, Tony McDonald wrote:
> >
> >> Can anyone suggest some (preferably python based) tools I can use to get
> >> from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages)
> >> to an XML form?
Can you send me a sample of the word XML output, and the format your
looking for. You can probably do it with a stylesheet as long as what
word spits out really is XML.
Mike
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From larsga@garshol.priv.no Thu May 17 09:45:05 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 17 May 2001 10:45:05 +0200
Subject: [XML-SIG] building XML docs using ?
In-Reply-To: <200105170703.f4H738F13626@localhost.local>
References: <200105170703.f4H738F13626@localhost.local>
Message-ID:
* Uche Ogbuji
|
| I can guarantee you you want nothing to do with XML files in the
| hundreds of MB. You don't even want them in the MB, period.
Why ever not? I've worked with lots of XML files of that size over the
last years and see nothing wrong with that. If the amount of data you
need to move around or work with is large, then your XML documents
will be large.
I see no reason why this should be considered somehow suspect or wrong.
If you use SAX there is really no reason why you shouldn't be able to
handle such documents.
--Lars M.
From larsga@garshol.priv.no Thu May 17 09:48:20 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 17 May 2001 10:48:20 +0200
Subject: [XML-SIG] Advice needed: RTF->XML conversions
In-Reply-To:
References:
Message-ID:
* Tony McDonald
|
| Can anyone suggest some (preferably python based) tools I can use to get
| from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages)
| to an XML form?
These are the only ones I know of:
--Lars M.
From mal@lemburg.com Thu May 17 11:12:40 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 12:12:40 +0200
Subject: [XML-SIG] Advice needed: RTF->XML conversions
References:
Message-ID: <3B03A418.5871B67@lemburg.com>
Lars Marius Garshol wrote:
>
> * Tony McDonald
> |
> | Can anyone suggest some (preferably python based) tools I can use to get
> | from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages)
> | to an XML form?
>
> These are the only ones I know of:
>
>
If you want to invest some time, you may want to look at the
RTF.py example in mxTextTools (see Python Software link below)
and extend it to whatever you need as basis for generating XML
from the RTF input.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/
From DShriyash@pun.cognizant.com Thu May 17 11:35:30 2001
From: DShriyash@pun.cognizant.com (Shriyash, Divekar (CTS))
Date: Thu, 17 May 2001 16:05:30 +0530
Subject: [XML-SIG] Small problem in XML parsing
Message-ID: <49532EE860A3D411812A00508B690B29FBC264@ctsinpunsxua>
This is a multi-part message in MIME format.
--------------InterScan_NT_MIME_Boundary
Content-Type: text/plain;
charset="iso-8859-1"
Hi Folks,
Have got a small problem in XML parsing.
I wish to append a new element in my XML file without creating new Elements.
.
General methodology is to first remove all the available tags & then by
'document.createElement', create the new required element.
My requirement is to point to already available element and append new child
to it.
e.g.
New Role abc def
---------
---------
Here, will go on adding.
I wish to point to already available ' New Role'
tags & the append new principals to it.
XML does not diffrentiates between & tags.
This may look a very simple problem but causing us bit more efforts.
We would be very happy if anybody can throw light on it.
Thanx in advance
Regards
Shri
--------------InterScan_NT_MIME_Boundary
Content-Type: text/plain;
name="InterScan_Disclaimer.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="InterScan_Disclaimer.txt"
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorised review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful.
Visit us at http://www.cognizant.com
--------------InterScan_NT_MIME_Boundary--
From tony.mcdonald@ncl.ac.uk Thu May 17 12:05:39 2001
From: tony.mcdonald@ncl.ac.uk (Tony McDonald)
Date: Thu, 17 May 2001 12:05:39 +0100
Subject: [XML-SIG] Advice needed: RTF->XML conversions
In-Reply-To:
Message-ID:
On 17/5/01 9:48 am, "Lars Marius Garshol" wrote:
>
> * Tony McDonald
> |
> | Can anyone suggest some (preferably python based) tools I can use to get
> | from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML pages)
> | to an XML form?
>
> These are the only ones I know of:
>
>
>
> --Lars M.
>
Thanks for that Lars,
However, the first program is based on Omnimark (it's actually what I'm
using now), and the second is a Java based program, and I *think* the java
program I've mentioned in my other post (wh2fo) does a good enough job to
get initially to XML.
I still need to do my other machinations on the resultant XML however.
Thanks
Tone.
--
Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/
The Medical School, Newcastle University Tel: +44 191 243 6140
A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope
From tony.mcdonald@ncl.ac.uk Thu May 17 12:05:39 2001
From: tony.mcdonald@ncl.ac.uk (Tony McDonald)
Date: Thu, 17 May 2001 12:05:39 +0100
Subject: [XML-SIG] Advice needed: RTF->XML conversions
In-Reply-To: <3B038AC5.B205328F@FourThought.com>
Message-ID:
On 17/5/01 9:24 am, "Mike Olson" wrote:
> Tony McDonald wrote:
>>
>> On 17/5/01 8:49 am, "Alexandre Fayolle"
>> wrote:
>>
>>> On Thu, 17 May 2001, Tony McDonald wrote:
>>>
>>>> Can anyone suggest some (preferably python based) tools I can use to get
>>>> from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML
>>>> pages)
>>>> to an XML form?
>
> Can you send me a sample of the word XML output, and the format your
> looking for. You can probably do it with a stylesheet as long as what
> word spits out really is XML.
>
> Mike
>
Thanks for the offer Mike - I *was* under the impression that what word spat
out was not real XML, but I found this (sorry, Java) based program;
http://www-uk.hpl.hp.com/people/fabgia/wh2fo/wh2fo.html
which generates XML and XSL from the html files that word 2000 generates.
Frankly, I'm amazed, as I thought that constructs such as
where attributes aren't quoted or, if they are, they're quoted with " or '
inconsistently, were very bad XML. I guess I was wrong.
I still need to do some work with the XML that the above program uses and
would like to use Python for that as I'm *far* more comfortable with it than
java.
If I've ready you right, are you saying I could apply a stylesheet to this
XML to get to my output XML which is then ok for my (finally!) python based
sgmlop processor that makes SQL? If so, I'll be very happy indeed!
Essentially I need to 'stack' the headings in the original document so that
this;
Heading 1 "Title"
heading 2 "Overview"
heading 3 "Core Content"
heading 2 "Theme 1"
Goes to
If you're saying that I can use XSL stylesheets to get this to work, then I
need to do some reading!
Thanks for the comments,
Tone.
--
Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/
The Medical School, Newcastle University Tel: +44 191 243 6140
A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope
From tony.mcdonald@ncl.ac.uk Thu May 17 12:05:40 2001
From: tony.mcdonald@ncl.ac.uk (Tony McDonald)
Date: Thu, 17 May 2001 12:05:40 +0100
Subject: [XML-SIG] Advice needed: RTF->XML conversions
In-Reply-To: <3B03A418.5871B67@lemburg.com>
Message-ID:
On 17/5/01 11:12 am, "M.-A. Lemburg" wrote:
> Lars Marius Garshol wrote:
>>
>> * Tony McDonald
>> |
>> | Can anyone suggest some (preferably python based) tools I can use to get
>> | from Word RTF (or even, gasp, the 'XML' Word 2001 expels as it's HTML
>> pages)
>> | to an XML form?
>>
>> These are the only ones I know of:
>>
>>
>
> If you want to invest some time, you may want to look at the
> RTF.py example in mxTextTools (see Python Software link below)
> and extend it to whatever you need as basis for generating XML
> from the RTF input.
Thanks for the pointer Marc,
I did look at the RTF.py files a while back, but at the time I was ok with
Omnimark and the code was a bit over my head, so I had to put it on the back
burner.
Cheers
Tone.
--
Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/
The Medical School, Newcastle University Tel: +44 191 243 6140
A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope
From larsga@garshol.priv.no Thu May 17 12:18:21 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 17 May 2001 13:18:21 +0200
Subject: [XML-SIG] Advice needed: RTF->XML conversions
In-Reply-To:
References:
Message-ID:
* Tony McDonald
|
| However, the first program is based on Omnimark (it's actually what I'm
| using now),
Uh - sorry, should have seen that.
| and the second is a Java based program, and I *think* the java
| program I've mentioned in my other post (wh2fo) does a good enough
| job to get initially to XML.
Thanks for that pointer. I've put it in the inbox to my site.
| I still need to do my other machinations on the resultant XML however.
Well, that's an ordinary XML processing job, so Python should have all
the tools you need for that task.
--Lars M.
From tpassin@home.com Thu May 17 14:51:39 2001
From: tpassin@home.com (Thomas B. Passin)
Date: Thu, 17 May 2001 09:51:39 -0400
Subject: [XML-SIG] Advice needed: RTF->XML conversions
References:
Message-ID: <002801c0ded8$810ba4a0$7cac1218@reston1.va.home.com>
[Tony McDonald]
>
> If someone has written something that takes that (dreadful) 'XML' output
> that Word 2001 outputs and cleans it up into valid XML that would be a
great
> start for me.
>
HTML-tidy has an option to clean up Word 2000 xml. You can get it from the
W3C site, or in a GUI editor, as part of HTML-kit (free), from
www.chami.org.
Cheers,
Tom P
From rsalz@zolera.com Thu May 17 14:57:08 2001
From: rsalz@zolera.com (Rich Salz)
Date: Thu, 17 May 2001 09:57:08 -0400
Subject: [XML-SIG] XML Canonicalization
Message-ID: <3B03D8B4.9108432D@zolera.com>
This is a multi-part message in MIME format.
--------------4C8E83122B2EF8C15F82E15C
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Someone had asked for code to do XML C14N (canonicalization) a couple of
weeks ago. I finally got around to cleaning up my code; it's attached.
I would be more than happy to add this to PyXML if there's interest.
Since it operates on DOM nodes, perhaps xml.dom.utils ? I'd probably
also need to upgrade the documentation -- the docstrings in the code
should tell you all you need.
Hope this helps -- looking forward to feedback.
/r$
--------------4C8E83122B2EF8C15F82E15C
Content-Type: text/plain; charset=us-ascii;
name="c14n.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="c14n.py"
#! /usr/bin/env python
'''XML C14N
Perform XML Canonicalization. Not fully conformant to the spec
in a couple of ways (mostly minor):
Comments are always stripped
Whitespace preservation/stripping not totally correct
Processing Instruction nodes aren't handled
The nodeset must start with an element and includes all descendants
Fixing the last one would be non-trivial.
'''
_copyright = '''Copyright 2001, Zolera Systems Inc. All Rights Reserved.
Distributed under the terms of the Python 2.0 Copyright.'''
from xml.dom import Node
import re
import StringIO
_attrs = lambda E: E._get_attributes() or []
_children = lambda E: E._get_childNodes() or []
_sorter = lambda n1, n2: cmp(n1._get_nodeName(), n2._get_nodeName())
xmlns_base = "http://www.w3.org/2000/xmlns/"
class _implementation:
# Handlers for each node, by node type.
handlers = {}
# pattern/replacement list for whitespace stripping.
repats = (
( re.compile(r'^[ \t]+', re.MULTILINE), '' ),
( re.compile(r'[ \t]+$', re.MULTILINE), '' ),
( re.compile(r'[\r\n]+'), '\n' ),
)
def __init__(self, node, write, nsdict={}, stripspace=0):
'''Create and run the implementation.'''
if node._get_nodeType() != Node.ELEMENT_NODE:
raise TypeError, 'Non-element node'
self.write, self.ns_stack, self.stripspace = \
write, [nsdict], stripspace
self._do_element(node)
self.ns_stack.pop()
def _do_text(self, node):
'Output a text node in canonical form.'
s = node._get_data() \
.replace("\015", "
") \
.replace("&", "&") \
.replace("<", "<") \
.replace(">", ">")
if self.stripspace:
for pat,repl in _implementation.repats:
s = re.sub(pat, repl, s)
if s: self.write(s)
handlers[Node.TEXT_NODE] =_do_text
handlers[Node.CDATA_SECTION_NODE] =_do_text
def _do_pi(self, node):
'Output a processing instruction in canonical form.'
pass # XXX
handlers[Node.PROCESSING_INSTRUCTION_NODE] =_do_pi
def _do_comment(self, node):
'Output a comment node in canonical form.'
pass # XXX
handlers[Node.COMMENT_NODE] =_do_comment
def _do_attr(self, n, value):
'Output an attribute in canonical form.'
W = self.write
W(' ')
W(n)
W('="')
s = value \
.replace("&", "&") \
.replace("<", "<") \
.replace('"', '"') \
.replace('\011', ' ') \
.replace('\012', 'A') \
.replace('\015', 'D')
W(s)
W('"')
def _do_element(self, node):
'Output an element (and its children) in canonical form.'
name = node._get_nodeName()
parent_ns = self.ns_stack[-1]
my_ns = { 'xmlns': parent_ns.get('xmlns', '') }
W = self.write
W('<')
W(name)
# Divide attributes to NS definitions and others.
nsnodes, others = [], []
for a in _attrs(node):
if a._get_namespaceURI() == xmlns_base:
nsnodes.append(a)
else:
others.append(a)
# Namespace attributes: update dictionary; if not already
# in parent, output it.
nsnodes.sort(_sorter)
for a in nsnodes:
n = a._get_nodeName()
if n == "xmlns:":
key, n = "", "xmlns"
else:
key = a._get_localName()
v = my_ns[key] = a._get_nodeValue()
pval = parent_ns.get(key, None)
if v != pval: self._do_attr(n, v)
# Other attributes: sort and output.
others.sort(_sorter)
for a in others:
self._do_attr(a._get_nodeName(), a._get_value())
W('>')
self.ns_stack.append(my_ns)
for c in _children(node):
handler = _implementation.handlers.get(c._get_nodeType(), None)
if handler: handler(self, c)
self.ns_stack.pop()
W('%s>' % (name,))
handlers[Node.ELEMENT_NODE] =_do_element
def XMLC14N(node, output=None, **kw):
'''Canonicalize a DOM element node and everything underneath it.
Return the text; if output is specified then output.write will
be called to output the text and the return value will be None.
Keyword parameters:
stripspace -- remove extra (almost all) whitespace from text nodes
nsdict -- a dictionary of prefix/uri namespace entries assumed
to exist in the surrounding context.
'''
if output:
s = None
else:
output = s = StringIO.StringIO()
_implementation(node,
output.write,
stripspace=kw.get('stripspace', 0),
nsdict=kw.get('nsdict', {})
)
if s: return (s.getvalue(), s.close())[0]
return None
if s == None: return None
ret = s.getvalue()
s.close()
return ret
if __name__ == '__main__':
text = '''3444This is the nameHello]]>a GVsbG8=Red12rich salz3F2041The value of n3content'''
print _copyright
from xml.dom.ext.reader import PyExpat
reader = PyExpat.Reader()
dom = reader.fromString(text)
for e in _children(dom):
if e._get_nodeType() != Node.ELEMENT_NODE: continue
for ee in _children(e):
if ee._get_nodeType() != Node.ELEMENT_NODE: continue
print '\n', '=' * 60
print XMLC14N(ee, nsdict={'spare':'foo'}, stripspace=1)
print '-' * 60
print XMLC14N(ee, stripspace=0)
print '=' * 60
--------------4C8E83122B2EF8C15F82E15C--
From rsalz@zolera.com Thu May 17 15:13:48 2001
From: rsalz@zolera.com (Rich Salz)
Date: Thu, 17 May 2001 10:13:48 -0400
Subject: [XML-SIG] XML Canonicalization
References: <3B03D8B4.9108432D@zolera.com>
Message-ID: <3B03DC9C.D12A6B91@zolera.com>
Oops. I didn't save-file in the other window before I sent...
> def XMLC14N(node, output=None, **kw):
...
> if s: return (s.getvalue(), s.close())[0]
> return None
> if s == None: return None **
> ret = s.getvalue() **
> s.close() **
> return ret **
Obviously those last four lines can be deleted.
From uche.ogbuji@fourthought.com Thu May 17 18:09:28 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 17 May 2001 11:09:28 -0600
Subject: [XML-SIG] building XML docs using ?
In-Reply-To: Message from Lars Marius Garshol
of "17 May 2001 10:45:05 +0200."
Message-ID: <200105171709.f4HH9SX17328@localhost.local>
>
> * Uche Ogbuji
> |
> | I can guarantee you you want nothing to do with XML files in the
> | hundreds of MB. You don't even want them in the MB, period.
>
> Why ever not? I've worked with lots of XML files of that size over the
> last years and see nothing wrong with that. If the amount of data you
> need to move around or work with is large, then your XML documents
> will be large.
>
> I see no reason why this should be considered somehow suspect or wrong.
> If you use SAX there is really no reason why you shouldn't be able to
> handle such documents.
Why not? Because most XML handling tools are not very scalable, XSLT being
the foremost example.
Also because XML eliminates the need, which I think quite unneccesary, of
storing mountains of data in a single file. Inclusion, transclusion, other
linking mechanisms, and many tools are available for breaking XML into
manageable packets.
So, in my opionion, it's suspect *and* wrong to be dealing with 100MB XML
files. Opinion of others might vary, of course.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Thu May 17 18:13:06 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 17 May 2001 11:13:06 -0600
Subject: [XML-SIG] XML Canonicalization
In-Reply-To: Message from Rich Salz
of "Thu, 17 May 2001 09:57:08 EDT." <3B03D8B4.9108432D@zolera.com>
Message-ID: <200105171713.f4HHD6Z17352@localhost.local>
> Someone had asked for code to do XML C14N (canonicalization) a couple of
> weeks ago. I finally got around to cleaning up my code; it's attached.
>
> I would be more than happy to add this to PyXML if there's interest.
> Since it operates on DOM nodes, perhaps xml.dom.utils ? I'd probably
> also need to upgrade the documentation -- the docstrings in the code
> should tell you all you need.
Brilliant! I heartily vote for its inclusion in PyXML.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From martin@loewis.home.cs.tu-berlin.de Thu May 17 19:15:13 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 20:15:13 +0200
Subject: [XML-SIG] Advice needed: RTF->XML conversions
In-Reply-To: <3B038AC5.B205328F@FourThought.com> (message from Mike Olson on
Thu, 17 May 2001 02:24:37 -0600)
References: <3B038AC5.B205328F@FourThought.com>
Message-ID: <200105171815.f4HIFDF01101@mira.informatik.hu-berlin.de>
> Can you send me a sample of the word XML output, and the format your
> looking for. You can probably do it with a stylesheet as long as what
> word spits out really is XML.
It isn't. Most notably, attribute values are not enclosed in quotes.
I found that sgmlop can parse what word produces, though.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Thu May 17 20:06:53 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 21:06:53 +0200
Subject: [XML-SIG] XML Canonicalization
In-Reply-To: <200105171713.f4HHD6Z17352@localhost.local> (message from Uche
Ogbuji on Thu, 17 May 2001 11:13:06 -0600)
References: <200105171713.f4HHD6Z17352@localhost.local>
Message-ID: <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de>
> Brilliant! I heartily vote for its inclusion in PyXML.
It's fine with me, too. Rich, could you please check it in?
Thanks,
Martin
From rsalz@zolera.com Thu May 17 20:20:05 2001
From: rsalz@zolera.com (Rich Salz)
Date: Thu, 17 May 2001 15:20:05 -0400
Subject: [XML-SIG] XML Canonicalization
References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de>
Message-ID: <3B042465.1DCA826D@zolera.com>
> Rich, could you please check it in?
Gladly. Just tell me where (xml.dom.utils?) and where are the docs that
I should update.
/r$
From uche.ogbuji@fourthought.com Thu May 17 20:27:31 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 17 May 2001 13:27:31 -0600
Subject: [XML-SIG] XML Canonicalization
References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de>
Message-ID: <3B042623.157DD7F1@fourthought.com>
"Martin v. Loewis" wrote:
>
> > Brilliant! I heartily vote for its inclusion in PyXML.
>
> It's fine with me, too. Rich, could you please check it in?
Rich did ask about the best place to put it.
He suggested xml.dom.utils, but I wonder if there's any prospect of
generalizing it so that it would work with SAX streams. Based on his
DOM ops, I guess probably not.
So maybe xml.dom.ext.c14n
I think this will be handy for RDF (parseType="literal", ya know).
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From chris@hddesign.com Thu May 17 21:01:47 2001
From: chris@hddesign.com (Chris Meyers)
Date: Thu, 17 May 2001 15:01:47 -0500
Subject: [XML-SIG] newbie question
Message-ID: <20010517150147.A5471@hddesign.com>
Ok I have been looking at PyXML for a couple of days now, and I still can't really find a good example of the basic stuff I need to do. I want to read in an XML file, traverse the tree and pull out information. For example I would like to go through this xml:
123John>Smith>
>From this xml I would like to pull out the id attributes and the values from the elements. I can do this in jython with jdom easily enough, but I need to use python for my current application
If someone could point me in the right direction as to where to look to find an example similar to what I am trying to do, I would really appreciate it.
Thanks,
Chris
From martin@loewis.home.cs.tu-berlin.de Thu May 17 21:12:36 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 22:12:36 +0200
Subject: [XML-SIG] XML Canonicalization
In-Reply-To: <3B042623.157DD7F1@fourthought.com> (message from Uche Ogbuji on
Thu, 17 May 2001 13:27:31 -0600)
References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> <3B042623.157DD7F1@fourthought.com>
Message-ID: <200105172012.f4HKCaR02192@mira.informatik.hu-berlin.de>
> He suggested xml.dom.utils, but I wonder if there's any prospect of
> generalizing it so that it would work with SAX streams. Based on his
> DOM ops, I guess probably not.
>
> So maybe xml.dom.ext.c14n
xml.dom.ext sounds better than xml.dom.utils, since I dislike packages
with only a single module, and because it is also an extension.
I'm not whether people can make sense out of c14n - I certainly
couldn't, although it is a cute name. 'normalize' would not be
appropriate, would it?
Regards,
Martin
From Mike.Olson@fourthought.com Thu May 17 23:18:52 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Thu, 17 May 2001 16:18:52 -0600
Subject: [XML-SIG] newbie question
References: <20010517150147.A5471@hddesign.com>
Message-ID: <3B044E4C.37A2F38C@FourThought.com>
Chris Meyers wrote:
>
> Ok I have been looking at PyXML for a couple of days now, and I still can't really find a good example of the basic stuff I need to do. I want to read in an XML file, traverse the tree and pull out information. For example I would like to go through this xml:
>
>
>
>
>
> 123
> John>
> Smith>
>
>
>
There are a couple of ways:
1. Use DOM
from xml.dom.ext.reader import PyExpat
reader = PyExpat.Reader()
dom = reader.fromString(XML_SRC)
flds = dom.documentElement.getElementsByTagName('fld')
for f in flds:
print fld.getAttribute('id')
print fld.firstChild.data
2. Use XPath
from xml import xpath
from xml.dom.ext.reader import PyExpat
reader = PyExpat.Reader()
dom = reader.fromString(XML_SRC)
flds = xpath.Evaluate('//fld',contextNode = dom)
for f in flds:
print fld.getAttribute('id')
print fld.firstChild.data
Mike
>
> >From this xml I would like to pull out the id attributes and the values from the elements. I can do this in jython with jdom easily enough, but I need to use python for my current application
>
> If someone could point me in the right direction as to where to look to find an example similar to what I am trying to do, I would really appreciate it.
>
> Thanks,
> Chris
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From chris@hddesign.com Thu May 17 23:47:42 2001
From: chris@hddesign.com (Chris Meyers)
Date: Thu, 17 May 2001 17:47:42 -0500
Subject: [XML-SIG] newbie question
In-Reply-To: <3B044E4C.37A2F38C@FourThought.com>; from Mike.Olson@fourthought.com on Thu, May 17, 2001 at 04:18:52PM -0600
References: <20010517150147.A5471@hddesign.com> <3B044E4C.37A2F38C@FourThought.com>
Message-ID: <20010517174742.A5790@hddesign.com>
Thanks a lot, that did the trick.
Chris
On Thu, May 17, 2001 at 04:18:52PM -0600, Mike Olson wrote:
> Chris Meyers wrote:
> >
> > Ok I have been looking at PyXML for a couple of days now, and I still can't really find a good example of the basic stuff I need to do. I want to read in an XML file, traverse the tree and pull out information. For example I would like to go through this xml:
> >
> >
> >
> >
> >
> > 123
> > John>
> > Smith>
> >
> >
> >
>
> There are a couple of ways:
>
> 1. Use DOM
>
> from xml.dom.ext.reader import PyExpat
> reader = PyExpat.Reader()
>
> dom = reader.fromString(XML_SRC)
>
> flds = dom.documentElement.getElementsByTagName('fld')
>
> for f in flds:
> print fld.getAttribute('id')
> print fld.firstChild.data
>
>
> 2. Use XPath
>
> from xml import xpath
> from xml.dom.ext.reader import PyExpat
> reader = PyExpat.Reader()
>
> dom = reader.fromString(XML_SRC)
>
> flds = xpath.Evaluate('//fld',contextNode = dom)
>
> for f in flds:
> print fld.getAttribute('id')
> print fld.firstChild.data
>
>
> Mike
>
>
> >
> > >From this xml I would like to pull out the id attributes and the values from the elements. I can do this in jython with jdom easily enough, but I need to use python for my current application
> >
> > If someone could point me in the right direction as to where to look to find an example similar to what I am trying to do, I would really appreciate it.
> >
> > Thanks,
> > Chris
> >
> > _______________________________________________
> > XML-SIG maillist - XML-SIG@python.org
> > http://mail.python.org/mailman/listinfo/xml-sig
>
> --
> Mike Olson Principal Consultant
> mike.olson@fourthought.com (303)583-9900 x 102
> Fourthought, Inc. http://Fourthought.com
> Software-engineering, knowledge-management, XML, CORBA, Linux, Python
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
--
Chris Meyers
7941 Tree Lane Suite 200
Madison WI 53717
From jsydik@virtualparadigm.com Fri May 18 00:14:30 2001
From: jsydik@virtualparadigm.com (Jeremy J. Sydik)
Date: Thu, 17 May 2001 18:14:30 -0500
Subject: [XML-SIG] Advice needed: RTF->XML conversions
In-Reply-To: <200105171815.f4HIFDF01101@mira.informatik.hu-berlin.de>
Message-ID:
---------------------------------------------------------------------------
Martin is right. The Office/Word 'XML' can be a difficult thing to work
with. It's been a while since i've thought about it, but you will probably
need to account for the following:
* Not all attributes are quoted
* Singleton tags aren't closed (This can be dealt with fairly easily,
however. It's simply the 'standard' singleton html tags that
occur this way (br, img, etc).
* There are a few microsoft namespaces to deal with, as well as
special tags. The documentation for these is found in:
http://msdn.microsoft.com/library/officedev/ofxml2k/ofhtml9.exe
The primary ones you'll probably encounter are o: and w:
* Also described in this document are
and
...
pairs. These break most SGML
and XML implementations. (It would be good to think of a regex
solution, since you'll probably need one to properly enclose
the attributes anyway).
Once those issues are addressed, you SHOULD have valid XML. If you don't,
chances are you haven't hit everything in this list :)
Good Luck,
Jeremy
-----Original Message-----
From: xml-sig-admin@python.org [mailto:xml-sig-admin@python.org]On
Behalf Of Martin v. Loewis
Sent: Thursday, May 17, 2001 1:15 PM
To: Mike.Olson@fourthought.com
Cc: tony.mcdonald@ncl.ac.uk; Alexandre.Fayolle@logilab.fr;
xml-sig@python.org
Subject: Re: [XML-SIG] Advice needed: RTF->XML conversions
> Can you send me a sample of the word XML output, and the format your
> looking for. You can probably do it with a stylesheet as long as what
> word spits out really is XML.
It isn't. Most notably, attribute values are not enclosed in quotes.
I found that sgmlop can parse what word produces, though.
Regards,
Martin
_______________________________________________
XML-SIG maillist - XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig
From rsalz@zolera.com Fri May 18 01:09:59 2001
From: rsalz@zolera.com (Rich Salz)
Date: Thu, 17 May 2001 20:09:59 -0400
Subject: [XML-SIG] newbie question
References: <20010517150147.A5471@hddesign.com>
Message-ID: <3B046857.2D18B6B4@zolera.com>
Mike's already posted a solution.
I've found the code in dom.ext useful for examples.
/r$
From rsalz@zolera.com Fri May 18 01:40:34 2001
From: rsalz@zolera.com (Rich Salz)
Date: Thu, 17 May 2001 20:40:34 -0400
Subject: [XML-SIG] XML Canonicalization
References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> <3B042623.157DD7F1@fourthought.com> <200105172012.f4HKCaR02192@mira.informatik.hu-berlin.de>
Message-ID: <3B046F82.3F306701@zolera.com>
> xml.dom.ext sounds better than xml.dom.utils, since I dislike packages
> with only a single module
Me too.
> and because it is also an extension.
I think it's a matter of very detailed use of English. :) I view it as
a utility. But it doesn't matter.
> I'm not whether people can make sense out of c14n - I certainly
> couldn't, although it is a cute name. 'normalize' would not be
> appropriate, would it?
No, the proper term really is canonicalization. I agree, the name is
somewhat cute, but within the community C14N is as well-known as I18N.
How about
from xml.dom.ext import Canonicalize
and in ext/__init__.py I add
from c14n import Canonicalize
So the filename is c14n.py, but the exported name is more use-friendly.
From martin@loewis.home.cs.tu-berlin.de Thu May 17 22:36:38 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 23:36:38 +0200
Subject: [XML-SIG] newbie question
In-Reply-To: <20010517150147.A5471@hddesign.com> (message from Chris Meyers on
Thu, 17 May 2001 15:01:47 -0500)
References: <20010517150147.A5471@hddesign.com>
Message-ID: <200105172136.f4HLacH02948@mira.informatik.hu-berlin.de>
> From this xml I would like to pull out the id attributes and the
> values from the elements. I can do this in jython with jdom
> easily enough, but I need to use python for my current application
In PyXML, it works mostly the same way. The only different thing is
how to obtain a DOM Document; you use xml.dom.ext.reader.Sax2.FromXml*
for this. Once you have a DOM tree, you proceed just as with jython,
i.e. using getElementsByTagName, etc.
You probably need to be aware of the Python DOM mapping, see
http://www.python.org/doc/current/lib/module-xml.dom.html
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Fri May 18 05:08:54 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 18 May 2001 06:08:54 +0200
Subject: [XML-SIG] XML Canonicalization
In-Reply-To: <3B046F82.3F306701@zolera.com> (message from Rich Salz on Thu, 17
May 2001 20:40:34 -0400)
References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> <3B042623.157DD7F1@fourthought.com> <200105172012.f4HKCaR02192@mira.informatik.hu-berlin.de> <3B046F82.3F306701@zolera.com>
Message-ID: <200105180408.f4I48s000954@mira.informatik.hu-berlin.de>
> How about
> from xml.dom.ext import Canonicalize
> and in ext/__init__.py I add
> from c14n import Canonicalize
>
> So the filename is c14n.py, but the exported name is more use-friendly.
That sounds good.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Fri May 18 05:17:00 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 18 May 2001 06:17:00 +0200
Subject: [XML-SIG] XML Canonicalization
In-Reply-To: <3B042465.1DCA826D@zolera.com> (message from Rich Salz on Thu, 17
May 2001 15:20:05 -0400)
References: <200105171713.f4HHD6Z17352@localhost.local> <200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de> <3B042465.1DCA826D@zolera.com>
Message-ID: <200105180417.f4I4H0p00981@mira.informatik.hu-berlin.de>
> Gladly. Just tell me where (xml.dom.utils?) and where are the docs that
> I should update.
As for the docs, it would be IMO best to put a
\section{xml.dom.ext.c14n} into doc/xml-ref.tex. You'll notice that
much of the content of that file is outdated. Since updating the
documentation consists of removing most of the stuff first, adding new
sections contributes to that update process.
Regards,
Martin
From rsalz@zolera.com Fri May 18 13:42:53 2001
From: rsalz@zolera.com (Rich Salz)
Date: Fri, 18 May 2001 08:42:53 -0400
Subject: [XML-SIG] newbie question
References: <20010517150147.A5471@hddesign.com> <200105172136.f4HLacH02948@mira.informatik.hu-berlin.de>
Message-ID: <3B0518CD.79A4D3D9@zolera.com>
> You probably need to be aware of the Python DOM mapping, see
>
> http://www.python.org/doc/current/lib/module-xml.dom.html
That brings up a question I meant to ask last week.
What's better, the "raw" mapping documented above, or the Corba-style
mapping? That is, self.nodeType or self._get_nodeType() ?
I am mainly interested to know which is most portable across Python DOM
implementations, but I also care a bit about efficiency.
Since Python has documented its own DOM interface, having an official
Corba->Python mapping doesn't matter all that much to me, although it is
convenient to be able to read Corba IDL and write Python without any
intermediate docs.
/r$
From fdrake@acm.org Fri May 18 15:22:23 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 18 May 2001 10:22:23 -0400 (EDT)
Subject: [XML-SIG] XML Canonicalization
In-Reply-To: <3B046F82.3F306701@zolera.com>
References: <200105171713.f4HHD6Z17352@localhost.local>
<200105171906.f4HJ6rZ01295@mira.informatik.hu-berlin.de>
<3B042623.157DD7F1@fourthought.com>
<200105172012.f4HKCaR02192@mira.informatik.hu-berlin.de>
<3B046F82.3F306701@zolera.com>
Message-ID: <15109.12319.311051.900182@cj42289-a.reston1.va.home.com>
Rich Salz writes:
> How about
> from xml.dom.ext import Canonicalize
> and in ext/__init__.py I add
> from c14n import Canonicalize
How about calling the module "canon":
from xml.dom.ext import canon
def main():
... = canon.Canonicalize(...)
-Fred
--
Fred L. Drake, Jr.
PythonLabs at Digital Creations
From martin@loewis.home.cs.tu-berlin.de Fri May 18 21:52:46 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 18 May 2001 22:52:46 +0200
Subject: [XML-SIG] newbie question
In-Reply-To: <3B0518CD.79A4D3D9@zolera.com> (message from Rich Salz on Fri, 18
May 2001 08:42:53 -0400)
References: <20010517150147.A5471@hddesign.com> <200105172136.f4HLacH02948@mira.informatik.hu-berlin.de> <3B0518CD.79A4D3D9@zolera.com>
Message-ID: <200105182052.f4IKqkF01843@mira.informatik.hu-berlin.de>
> What's better, the "raw" mapping documented above, or the Corba-style
> mapping? That is, self.nodeType or self._get_nodeType() ?
>
> I am mainly interested to know which is most portable across Python DOM
> implementations, but I also care a bit about efficiency.
It's mainly a matter of personal taste. Some people believe in
accessor functions, some in attributes.
If you want to care about portability and speed, you should use
attributes. Whether you go through __getattr__ or not varies depending
on DOM implementation and attribute; most attributes will be directly
available, though.
Regards,
Martin
From tony.mcdonald@ncl.ac.uk Sun May 20 10:17:20 2001
From: tony.mcdonald@ncl.ac.uk (Tony McDonald)
Date: Sun, 20 May 2001 10:17:20 +0100
Subject: [XML-SIG] Problems with 'multiple definitions'
Message-ID:
Hi all,
This isn't strictly an XML thing, but as the packages I really want to use
are the XML ones, I thought the group might be able to help.
I'm working with python2.1 and MacOS X and compiling up packages such as
PyXML and 4Suite (although this happens with packages such as MySQLdb
too).
I use the standard procedure to build and install these packages, ie
% python2.1 setup.py install
But, when I test out 4Suite (for example), ie
% cd /usr/local/doc/4Suite-0.11/test_suite/4XSLT
% python2.1 basic_test.py
I get this;
dyld: python2.1 multiple definitions of symbol _XML_DefaultCurrent
python2.1 definition of _XML_DefaultCurrent
/usr/local/lib/python2.1/site-packages/Ft/Lib/cDomlettec.so definition
of _XML_DefaultCurrent
I get similar errors with other packages such as PyXML and MySQLdb.
I've managed to install MySQLdb by stripping out an offending symbol
from libmysqlclient.a, but surely there's a cleaner way of doing this?
Is there some compiler flag I can set that gets around this?
The python is a pre-compiled version from http://tony.lownds.com/macosx/
any help would be appreciated, this effectively stops me using any
compiled modules under MacOS X (which is, in almost all other respects,
excellent!).
TIA
tone
--
Dr Tony McDonald, Assistant Director, FMCC, http://www.fmcc.org.uk/
The Medical School, Newcastle University Tel: +44 191 243 6140
A Zope list for UK HE/FE http://www.fmcc.org.uk/mailman/listinfo/zope
From karl@digicool.com Tue May 22 00:38:55 2001
From: karl@digicool.com (Karl Anderson)
Date: 21 May 2001 16:38:55 -0700
Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib
In-Reply-To: Mike Olson's message of "Sun, 13 May 2001 19:14:17 -0600"
References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com>
Message-ID:
Mike Olson writes:
> "Martin v. Loewis" wrote:
> >
> > I've tried to update my 4XSLT port to use the 4Suite 0.11 code base,
> > only to discover that the StyleseetReader class is now much stronger
> > connected to Ft.Lib than before, in particular to classes from
> > pDomletteReader, and their specific instance attributes.
>
> I was just in there as well and quite suprised how complex the code has
> become. I thought of doing some work on it but figured, it ain't
> broke.....
> Is pDomlette the only import from Ft.Lib? If so, why not move pDomlette
> into xml.utils? Better yet, let's merge pDomlette and minidom so there
> is only one domlette. pDomlette has greatly out grown its original
> purpose so I have not problems with moving it into XML-Sig.
If you're suggesting that the DOMs should be consolidated so that
tools like PyXML's XSLT could support only that DOM, I hope you'll
reconsider. I'd like Zope's DOM to be usable by PyXML's XSLT and
XPath implementations.
There are some hurdles to this, though. The tests are only usable
with 4Suite, which makes it harder to find inconsistencies.
Submitting patches to 4Suite's implementations wouldn't be helpful for
my goals, because 4Suite's XSLT and XPath processors have become more
reliant on its particular DOM since these modules were forked to
PyXML.
--
Karl Anderson karl@digicool.com
From Mike.Olson@fourthought.com Tue May 22 00:57:53 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Mon, 21 May 2001 17:57:53 -0600
Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib
References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com>
Message-ID: <3B09AB81.910B27F2@FourThought.com>
Karl Anderson wrote:
>
> Mike Olson writes:
>
> > "Martin v. Loewis" wrote:
> > >
> > > I've tried to update my 4XSLT port to use the 4Suite 0.11 code base,
> > > only to discover that the StyleseetReader class is now much stronger
> > > connected to Ft.Lib than before, in particular to classes from
> > > pDomletteReader, and their specific instance attributes.
> >
> > I was just in there as well and quite suprised how complex the code has
> > become. I thought of doing some work on it but figured, it ain't
> > broke.....
>
> > Is pDomlette the only import from Ft.Lib? If so, why not move pDomlette
> > into xml.utils? Better yet, let's merge pDomlette and minidom so there
> > is only one domlette. pDomlette has greatly out grown its original
> > purpose so I have not problems with moving it into XML-Sig.
>
> If you're suggesting that the DOMs should be consolidated so that
> tools like PyXML's XSLT could support only that DOM, I hope you'll
> reconsider. I'd like Zope's DOM to be usable by PyXML's XSLT and
> XPath implementations.
Not at all. I was suggesting that both miniDOM and pDomlette are light
weight python DOM implementations and I don't think we need two of
them. If Zope's DOM supports the Python DOM interface, then it should
work in xslt/xpath. If not it is a bug in xslt/xpath.
However, I don't know if this will always be the case. 4XSLT is about
to get a _big_ rewrite and we might not support a "runNode" interface
anymore. If we do, it will probably not be the most efficent way to use
4xslt as we will have to translate from DOM into the internal data
structure.
>
> There are some hurdles to this, though. The tests are only usable
> with 4Suite, which makes it harder to find inconsistencies.
> Submitting patches to 4Suite's implementations wouldn't be helpful for
> my goals, because 4Suite's XSLT and XPath processors have become more
> reliant on its particular DOM since these modules were forked to
> PyXML.
Actually, the tests would be easy to fix to use another DOM, (though I'm
not sure how you would do it in Zope as I ran into many hurdles
executing ZDOM outside of the Zope environment). Hoever, to do this,
edit the file test_harness.py. It is used by every 4XSLT test script.
Either add a test for ParsedXML, or replace all of the existing tests
with a parsedXML test. Then just run test.py and all of the 4XSLT tests
will use Parsed XML.
I don't understand the more reliant part. How have we become more
reliant. Are you talking about the fact that MArtin did a lot of work
when he first moved 4XSLT into PyXML to disentangle 4XSLT from Ft.Lib?
Then its not really more reliant, just not ported yet.
FYI to all, we will be synching 4XSLT with Martins changes in the near
future.
>
> --
> Karl Anderson karl@digicool.com
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From karl@digicool.com Tue May 22 03:07:25 2001
From: karl@digicool.com (Karl Anderson)
Date: 21 May 2001 19:07:25 -0700
Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib
In-Reply-To: Mike Olson's message of "Mon, 21 May 2001 17:57:53 -0600"
References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com> <3B09AB81.910B27F2@FourThought.com>
Message-ID:
Mike Olson writes:
> Karl Anderson wrote:
> >
> > There are some hurdles to this, though. The tests are only usable
> > with 4Suite, which makes it harder to find inconsistencies.
> > Submitting patches to 4Suite's implementations wouldn't be helpful for
> > my goals, because 4Suite's XSLT and XPath processors have become more
> > reliant on its particular DOM since these modules were forked to
> > PyXML.
>
> Actually, the tests would be easy to fix to use another DOM, (though I'm
> not sure how you would do it in Zope as I ran into many hurdles
> executing ZDOM outside of the Zope environment).
I don't know that ZDOM is a good measure of usefulness with other
DOMs - I haven't really looked at it, much less tested it. Right now
I'm concentrating on ParsedXML's DOM.
For a simple example of using PyXML's XPath with ParsedXML:
http://www.zope.org/Wikis/DevSite/Projects/ParsedXML/ParsedXMLWith4XPath
You do need a Zope installation with ParsedXML, although you don't
need to actually run Zope :)
If you want to use ParsedXML to test usability with other DOM
implementations, I'd be glad to help.
> Hoever, to do this,
> edit the file test_harness.py. It is used by every 4XSLT test script.
> Either add a test for ParsedXML, or replace all of the existing tests
> with a parsedXML test. Then just run test.py and all of the 4XSLT tests
> will use Parsed XML.
Thanks, I'll look into this when I can.
> I don't understand the more reliant part. How have we become more
> reliant. Are you talking about the fact that MArtin did a lot of work
> when he first moved 4XSLT into PyXML to disentangle 4XSLT from Ft.Lib?
> Then its not really more reliant, just not ported yet.
Perhaps I misread the CVS histories. I was looking into how PyXML and
4Suite depended on the included DOM implementations, and I thought
that 4XPath was copied over to PyXML, and that after that updates to
4Suite's tree made it dependent on its DOM. But looking again (I was
running into trouble with XPath/Conversions.py), there seems to have
been some syncing and stuff going on, I'd have to do some work to
convince myself that I was correct.
--
Karl Anderson karl@digicool.com
From martin@loewis.home.cs.tu-berlin.de Tue May 22 06:15:11 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 22 May 2001 07:15:11 +0200
Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib
In-Reply-To: (message from Karl
Anderson on 21 May 2001 19:07:25 -0700)
References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com> <3B09AB81.910B27F2@FourThought.com>
Message-ID: <200105220515.f4M5FBi00961@mira.informatik.hu-berlin.de>
> Perhaps I misread the CVS histories. I was looking into how PyXML and
> 4Suite depended on the included DOM implementations, and I thought
> that 4XPath was copied over to PyXML, and that after that updates to
> 4Suite's tree made it dependent on its DOM. But looking again (I was
> running into trouble with XPath/Conversions.py), there seems to have
> been some syncing and stuff going on, I'd have to do some work to
> convince myself that I was correct.
Before I first checked 4XPath/4XSLT into PyXML, I had already
significantly modified it; see README.4XPath for an outline of the
changes.
Some of these changes have been integrated into 4Suite. To continue to
keep the two branches similar, I've now integrated the changes of
4Suite 0.11 into PyXML. I have not yet modified them to work
stand-alone, yet, since I got stuck updating the Stylesheet reader. I
think I will write a new stylesheet reader from scratch which only
uses a SAX DOM builder and a DOM implementation, but I haven't started
with that, yet.
Regards,
Martin
From sam@webslingerZ.com Tue May 22 14:55:13 2001
From: sam@webslingerZ.com (Sam Brauer)
Date: Tue, 22 May 2001 09:55:13 -0400 (EDT)
Subject: [XML-SIG] ANN: new release of maki
In-Reply-To:
Message-ID:
I've released a new version of maki at http://maki.sourceforge.net
maki is a mod_python handler which uses various 4Suite components to serve
XML with Apache. It allows a web developer to specify processing rules
based on path-matching regular expressions. Each rule describes a
pipeline with any number of XSLT steps and/or custom processing steps. A
processor that evaluates embedded Python source to dynamically modify the
document is included. maki also supports time-based caching of output.
Also included are two example "logicsheets": one that adds HTTP request
data to the document and another that executes SQL queries and creates
elements from the results.
The overall functionality is similar (though intentionally not identical)
to Cocoon.
For more info, please take a look at the online documentation at
http://maki.sourceforge.net/manual/
Thank you,
Sam
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sam Brauer : sbrauer@users.sourceforge.net
From karl@digicool.com Tue May 22 20:05:36 2001
From: karl@digicool.com (Karl Anderson)
Date: 22 May 2001 12:05:36 -0700
Subject: [XML-SIG] Re: [4suite] Disentangling StylesheetReader from Ft.Lib
In-Reply-To: "Martin v. Loewis"'s message of "Tue, 22 May 2001 07:15:11 +0200"
References: <200105131902.f4DJ2KK14103@mira.informatik.hu-berlin.de> <3AFF3169.29F2B6C8@FourThought.com> <3B09AB81.910B27F2@FourThought.com> <200105220515.f4M5FBi00961@mira.informatik.hu-berlin.de>
Message-ID:
Martin v. Loewis writes:
> > Perhaps I misread the CVS histories. I was looking into how PyXML and
> > 4Suite depended on the included DOM implementations, and I thought
> > that 4XPath was copied over to PyXML, and that after that updates to
> > 4Suite's tree made it dependent on its DOM. But looking again (I was
> > running into trouble with XPath/Conversions.py), there seems to have
> > been some syncing and stuff going on, I'd have to do some work to
> > convince myself that I was correct.
>
> Before I first checked 4XPath/4XSLT into PyXML, I had already
> significantly modified it; see README.4XPath for an outline of the
> changes.
Thanks for clearing that up.
> Some of these changes have been integrated into 4Suite. To continue to
> keep the two branches similar, I've now integrated the changes of
> 4Suite 0.11 into PyXML. I have not yet modified them to work
> stand-alone, yet, since I got stuck updating the Stylesheet reader. I
> think I will write a new stylesheet reader from scratch which only
> uses a SAX DOM builder and a DOM implementation, but I haven't started
> with that, yet.
Just to be clear, is PyXML's XSLT intended to work with already
created DOM trees as well?
--
Karl Anderson karl@digicool.com
From Mike.Olson@fourthought.com Tue May 22 21:43:33 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Tue, 22 May 2001 14:43:33 -0600
Subject: [XML-SIG] ANN: 4Suite and 4SuiteServer 0.11.1 release canidate 1
Message-ID: <3B0ACF75.9694866B@FourThought.com>
All,
Here is the first release canidate for our 0.11.1 release. A handful of
new features in this release and many bug fixes. Please give it a try
as we try to work out the documentation and packaging bugs for the
0.11.1 final release (expected later this week).
Please see http://4Suite.org/download.html for the packages.
4Suite new features:
pure python parser for Xslt, XPath, and XPointer
Support for unicode in the C based XSLT, XPath and XPointer parsers
ODS dictionaries and type definitions
ODS bug fixes and optimizations
4Suite Server new features
FTP server
text indexing using swish
more CORBA support
Backup and Restore command line tools
Better security
True access control lists
Mike
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Tue May 22 21:54:49 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Tue, 22 May 2001 14:54:49 -0600
Subject: [XML-SIG] Re: [4suite] ANN: 4Suite and 4SuiteServer 0.11.1 release canidate 1
References: <3B0ACF75.9694866B@FourThought.com>
Message-ID: <3B0AD219.CB493E17@fourthought.com>
Mike Olson wrote:
> 4Suite Server new features
>
> FTP server
> text indexing using swish
> more CORBA support
> Backup and Restore command line tools
> Better security
> True access control lists
One note. The new 4SS requires a re-initialization of all databases.
We specifically made backup and restore facilities a priority for this
release so that in future, we will provide a smooth migration path
whenever new releases break data.
Hopefully no one has accumulated irreplaceable data in 4SS yet, and if
you have, let us know and we should be able to help with the migration.
Migration testing will be a standard part of every 4SS release form now
so you needn't fear for your data in future.
I apologize for any inconvenience.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From mnot@mnot.net Tue May 22 23:06:41 2001
From: mnot@mnot.net (Mark Nottingham)
Date: Tue, 22 May 2001 15:06:41 -0700
Subject: [XML-SIG] XML and Unicode
Message-ID: <20010522150638.C22396@mnot.net>
How does one detect the charset used in an XML document from a SAX2
parser (PyXML 0.6.5)?
Also, if I have an XML document encoded ISO-8851-1 (and properly
identified), should I have a reasonable expectation that the output
of a SAX processor, post- .encode('utf-8'), should be correct if
viewed in a Web browser with UTF-8 selected as a character encoding?
In other words, is the post-parse unicode string a neutral
representation of the 8851-x string, which can then be encoded as
utf-8? Or, is it in the charset of the original XML document (my
testing seems to indicate the latter - what was a 8851 character in
the original text does not successfully come out the other side)?
(Sorry if this is obtuse - just getting into i18n, and Python docs
are thin on the ground)
--
Mark Nottingham
http://www.mnot.net/
From mal@lemburg.com Tue May 22 23:38:34 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 23 May 2001 00:38:34 +0200
Subject: [XML-SIG] XML and Unicode
References: <20010522150638.C22396@mnot.net>
Message-ID: <3B0AEA6A.9CCD2A1F@lemburg.com>
Mark Nottingham wrote:
>
> How does one detect the charset used in an XML document from a SAX2
> parser (PyXML 0.6.5)?
>
> Also, if I have an XML document encoded ISO-8851-1 (and properly
> identified), should I have a reasonable expectation that the output
> of a SAX processor, post- .encode('utf-8'), should be correct if
> viewed in a Web browser with UTF-8 selected as a character encoding?
This should work...
> In other words, is the post-parse unicode string a neutral
> representation of the 8851-x string, which can then be encoded as
> utf-8?
Unicode is encoding neutral in the sense that it provides
space for the characters of most scripts. If the parser returns
Unicode, then you can encode it as UTF-8 and have the original
contents of the attribute/element represented as UTF-8 string.
> Or, is it in the charset of the original XML document (my
> testing seems to indicate the latter - what was a 8851 character in
> the original text does not successfully come out the other side)?
>
> (Sorry if this is obtuse - just getting into i18n, and Python docs
> are thin on the ground)
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/
From jeremy.kloth@fourthought.com Tue May 22 23:53:45 2001
From: jeremy.kloth@fourthought.com (Jeremy J Kloth)
Date: Tue, 22 May 2001 16:53:45 -0600
Subject: [XML-SIG] New parsers in 4XPath and 4XSLT
Message-ID: <003101c0e312$0e4519e0$f803a8c0@dhcp.fourthought.comfourthought.com>
The new generated parsers in XPath and XSLT are now created in a more
factory-ish method. The parsers are now referenced from:
xml.(xpath|xslt).parser This allows for the changing of parsers easily.
To create a runtime parser, call parser.new(). And to parse expressions
simply use the parse() method on the created object.
Hopefully this change will help ease the integration into PyXML.
--
Jeremy Kloth Consultant
jeremy.kloth@fourthought.com (303)583-9900 x 105
Fourthought, Inc. http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From mnot@mnot.net Wed May 23 03:33:18 2001
From: mnot@mnot.net (Mark Nottingham)
Date: Tue, 22 May 2001 19:33:18 -0700
Subject: [XML-SIG] XML and Unicode
In-Reply-To: <3B0AEA6A.9CCD2A1F@lemburg.com>; from mal@lemburg.com on Wed, May 23, 2001 at 12:38:34AM +0200
References: <20010522150638.C22396@mnot.net> <3B0AEA6A.9CCD2A1F@lemburg.com>
Message-ID: <20010522193314.E22396@mnot.net>
--jI8keyz6grp/JLjh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
OK, so I'm not getting something then. The attached test script (and
data file) is the problem pared down - if u'string' is a neutral
encoding, and .encode('utf-8') generates a utf-8 encoded string of
that encoding, then the utf-8.html output file should display
correctly; however, it doesn't, while the latin-1 output does
(because the input is latin-1).
It seems like the XML parser isn't converting the ISO-8859-1 to
Unicode; does this make sense?
Thanks,
On Wed, May 23, 2001 at 12:38:34AM +0200, M.-A. Lemburg wrote:
> Mark Nottingham wrote:
> >
> > How does one detect the charset used in an XML document from a SAX2
> > parser (PyXML 0.6.5)?
> >
> > Also, if I have an XML document encoded ISO-8851-1 (and properly
> > identified), should I have a reasonable expectation that the output
> > of a SAX processor, post- .encode('utf-8'), should be correct if
> > viewed in a Web browser with UTF-8 selected as a character encoding?
>
> This should work...
>
> > In other words, is the post-parse unicode string a neutral
> > representation of the 8851-x string, which can then be encoded as
> > utf-8?
>
> Unicode is encoding neutral in the sense that it provides
> space for the characters of most scripts. If the parser returns
> Unicode, then you can encode it as UTF-8 and have the original
> contents of the attribute/element represented as UTF-8 string.
>
> > Or, is it in the charset of the original XML document (my
> > testing seems to indicate the latter - what was a 8851 character in
> > the original text does not successfully come out the other side)?
> >
> > (Sorry if this is obtuse - just getting into i18n, and Python docs
> > are thin on the ground)
>
> --
> Marc-Andre Lemburg
> CEO eGenix.com Software GmbH
> ______________________________________________________________________
> Company & Consulting: http://www.egenix.com/
> Python Software: http://www.lemburg.com/python/
--
Mark Nottingham
http://www.mnot.net/
--jI8keyz6grp/JLjh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="testuni.py"
#!/usr/bin/env python2.0
from xml import sax
import string
def run(i, e):
dh = Parser()
p = sax.sax2exts.make_parser()
p.setContentHandler(dh)
p.setFeature(sax.handler.feature_namespaces, 1)
p.parse(i + '.xml')
content = dh.content.encode(e)
file = open(e + ".html", 'w')
file.write(template % (e, content))
file.close()
class Parser(sax.handler.ContentHandler):
def __init__(self):
self._tmp_buf = ''
self.content = None
def startElementNS(self, name, qname, attrs):
pass
def endElementNS(self, name, qname):
if name[1] == 'content':
self.content = string.strip(self._tmp_buf)
def characters(self, content):
self._tmp_buf = self._tmp_buf + content
template = """\
%s
Net 21 – The Survivors
--jI8keyz6grp/JLjh--
From mal@lemburg.com Wed May 23 08:38:14 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 23 May 2001 09:38:14 +0200
Subject: [XML-SIG] XML and Unicode
References: <20010522150638.C22396@mnot.net> <3B0AEA6A.9CCD2A1F@lemburg.com> <20010522193314.E22396@mnot.net>
Message-ID: <3B0B68E6.9CBF7689@lemburg.com>
Mark Nottingham wrote:
>
> OK, so I'm not getting something then. The attached test script (and
> data file) is the problem pared down - if u'string' is a neutral
> encoding, and .encode('utf-8') generates a utf-8 encoded string of
> that encoding, then the utf-8.html output file should display
> correctly; however, it doesn't, while the latin-1 output does
> (because the input is latin-1).
>
> It seems like the XML parser isn't converting the ISO-8859-1 to
> Unicode; does this make sense?
That's a possibility (even though I don't see any funny characters
in your example XML file); looking through the pyexpat.c code
it seems as if the parser assumes that the XML file is encoded
as UTF-8 -- at least all Unicode conversions are done using UTF-8.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/
From hansv@net4all.be Wed May 23 08:44:20 2001
From: hansv@net4all.be (Hans verschooten)
Date: Wed, 23 May 2001 09:44:20 +0200
Subject: [XML-SIG] HTML parsing on Python 2.1
Message-ID:
> This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
--MS_Mac_OE_3073455860_75874_MIME_Part
Content-type: text/plain; charset="US-ASCII"
Content-transfer-encoding: 7bit
Hi,
I am using a freshly installed MacPython 2.1 and would like to know what I
should install extra to use the following script:
[uogbuji@borgia one-offs]$ cat html-to-xhtml-converter.py
import sys
from xml.dom.ext.reader import HtmlLib
import xml.dom.ext
#set up a re-usable reader object
reader = HtmlLib.Reader()
#parse HTML ffrom file or URI given on command line. Return the DOM
document
doc = reader.fromUri(sys.argv[1])
#Just for kicks, write it out as XHTML, i.e. all lowercase, XML syntax for
empty tags, all attributes with given value, etc.
xml.dom.ext.XHtmlPrettyPrint(doc)
If anybody could point me in the right direction, If tried installing PyXML
but keep getting end-of line errors. After trying to correct these I keep
running into errors like, ReleaseNode not found; HtmlLib has no module named
Reader.
Any help as to how and what should be installed on MacPython 2.1 would be
greatly appreciated.
Hans
--MS_Mac_OE_3073455860_75874_MIME_Part
Content-type: text/html; charset="US-ASCII"
Content-transfer-encoding: quoted-printable
HTML parsing on Python 2.1
Hi,
I am using a freshly installed MacPython 2.1 and would like to know what I =
should install extra to use the following script:
#set up a re-usable reader object
reader =3D HtmlLib.Reader()
#parse HTML ffrom file or URI given on command line. Return the DOM d=
ocument
doc =3D reader.fromUri(sys.argv[1])
#Just for kicks, write it out as XHTML, i.e. all lowercase, XML syntax for =
empty tags, all attributes with given value, etc.
xml.dom.ext.XHtmlPrettyPrint(doc)
If anybody could point me in the right direction, If tried installing PyXML=
but keep getting end-of line errors. After trying to correct these I keep r=
unning into errors like, ReleaseNode not found; HtmlLib has no module named =
Reader.
Any help as to how and what should be installed on MacPython 2.1 would be g=
reatly appreciated.
Hans
--MS_Mac_OE_3073455860_75874_MIME_Part--
From Alexandre.Fayolle@logilab.fr Wed May 23 10:57:19 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Wed, 23 May 2001 11:57:19 +0200 (CEST)
Subject: [XML-SIG] ANN: Narval 1.0
Message-ID:
Logilab (www.logilab.com) announces the release of
Narval 1.0
GPL'd Intelligent Personnal Assistant Framework
http://www.logilab.org/narval
News
----
The engine is now stable as it has been working nicely for the past three
months. It's also much faster.
The Horn GUI features lots of usability improvements.
The infopal application (available separately) is now usable.
Description
-----------
Narval is a framework (language + interpreter + GUI/IDE) dedicated to the
setting up of intelligent personal assistants (IPAs).
An Intelligent Personal Assitant is a companion that will help you in your daily
work in the information world. It runs on your machine or on a remote server,
and you can communicate with it via all standard means (email, web, telnet,
phone, specific GUI, etc). It executes recipes (sequences of actions) you wrote,
to perform a wide range of tasks, such as prepare your morning newspaper, help
you surf the web by filtering out junk ads, keep searching the web day after day
for things you want, participe in on-line auctions, learn you interests and
bring you back valuable information, take care of repetitive chores, answer
e-mail, negociate the date and time of a meeting, and much more... It is easy to
extend the built in action library by writing new actions in Python.
Infopal, your information pal, is a Narval application that implements part of
the above, but Narval makes it easy for you to set up new assistants. Others
applications will soon be available from Logilab.
Logilab S.A. is a french company that specializes in the fields of artificial
intelligence, knowledge management, data analysis and natural language
processing.
More info
---------
Please see
http://www.logilab.org/narval
http://www.logilab.com
http://www.logilab.fr
or contact contact@logilab.fr
--
Alexandre Fayolle
http://www.logilab.com
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).
From stuff4gary@hotmail.com Wed May 23 14:40:41 2001
From: stuff4gary@hotmail.com (gary cor)
Date: Wed, 23 May 2001 13:40:41
Subject: [XML-SIG] XLST - Can't show JPEG image from XML abstraction to rendition
Message-ID:
Hi,
I hope someone can help! I have set up some XSL files which use XLST
methods to produce tables of information about images which works great!..
just using the MSXML 4.0 parser with explorer 5.5. However, I can't get the
cells which suppose to show my imagethumbnails to display any images at all
(the transformations for the tables won't work when they have my x:link for
them in the XML).
**** IN XSL *****
etc.
**** IN XML *****