From Mike.Olson@fourthought.com Sun Oct 1 21:06:44 2000
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 01 Oct 2000 14:06:44 -0600
Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests
References: <200009300152.TAA12572@localhost.localdomain>
Message-ID: <39D79954.342F7478@FourThought.com>
uche.ogbuji@fourthought.com wrote:
>
> > On Fri, Sep 29, 2000 at 11:22:54PM +0200, Martin v. Loewis wrote:
> > >The missing piece for a 0.6.1 release is the test suite. Specifically,
> > >the dom tests are not executed, as xml.dom.core is not available
> > >anymore. Should these tests be ported to 4DOM?
> >
> > I vaguely recall that someone at FourThought once asked me if that
> > would be OK, but don't know if anyone actually did it. It would be a
> > good idea to port them, since they made some attempt at being
> > exhaustive (trying the various error cases, etc.).
>
> Yes, we did check in the test suites, but as Martin points out, they're broken
> because they use the now defunct TraceOut module (that's the "Ft" stuff. Mike
> has agreed to tackle some of the task of removing traceouts tonight so I'll
> ask hiom if he can start with the 4DOM test suites. If so, we'll check in the
> fix tomorrow.
All of the traceout stuff has been removed. There still is the problem
of our traceout library. I suppose we can install it to 2 locations so
that xml.dom is not dependent on Ft. I'll work on that today so it
should be in the next snapshot.
Mike
>
> I'll also move the demos as discussed.
>
> --
> Uche Ogbuji Principal Consultant
> uche.ogbuji@fourthought.com +1 303 583 9900 x 101
> Fourthought, Inc. http://Fourthought.com
> 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
> Software-engineering, knowledge-management, XML, CORBA, Linux, Python
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From clarence@netlojix.com Sun Oct 1 21:52:42 2000
From: clarence@netlojix.com (Clarence Gardner)
Date: Sun, 1 Oct 2000 13:52:42 -0700
Subject: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought
Message-ID: <20001001135242.B29839@liberty.sba2.netlojix.net>
Context: I'm using PyXML-0.5.5.1, and interestingly, I never compiled
any of the C code; I just use the .py and it seems to work fine.
So I'm storing some arbitrary textual data under an arbitrarily-named
element node. My test code created the xml and dumped it to a file,
where it looked like this:
This document is read and updated, and I noticed that each time I
added a new username (i.e., read the xml source, inserted a new username
node via DOM, and wrote back to the file), the previous ones changed from
CDATA to TEXT. This seemed like a bug to me. I thought I would see what
would happen if I added a username of "". The first time,
it appeared in the file as
]]>
as expected, then after one more addition, it was now
<markup test>
. But now I see that, if I read that document, the username has not
one TEXT child, but three ('<', 'markup test', and '>').
Does all this seem right to people? That last implies, of course, that
in order to get what I expect to be the text value of a node, I actually
have to get all of the text children and concatenate their values. Which
would seem to be a problem if (I haven't tried this) I originally stored
two separate text children of the username node, because this would cause
them to be merged into one.
--
Clarence Gardner
Software Engineer
NetLojix Communications
clarence@netlojix.com
From martin@loewis.home.cs.tu-berlin.de Sun Oct 1 23:05:50 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 2 Oct 2000 00:05:50 +0200
Subject: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought
In-Reply-To: <20001001135242.B29839@liberty.sba2.netlojix.net> (message from
Clarence Gardner on Sun, 1 Oct 2000 13:52:42 -0700)
References: <20001001135242.B29839@liberty.sba2.netlojix.net>
Message-ID: <200010012205.AAA00753@loewis.home.cs.tu-berlin.de>
> Does all this seem right to people?
I can't see a problem here. AFAIK, ]]> is really
equivalent to <markup test> from an XML point of view.
You did not say exactly how this reading or writing was achieved (SAX,
DOM, something else). Whatever procedure was used to write this, I
guess it should be possible to change it so that Text nodes (in the
DOM sense) come out as CDATA sections - if that is really a
requirement.
Regards,
Martin
From clarence@netlojix.com Sun Oct 1 23:33:03 2000
From: clarence@netlojix.com (Clarence Gardner)
Date: Sun, 1 Oct 2000 15:33:03 -0700
Subject: [martin@loewis.home.cs.tu-berlin.de: Re: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought]
Message-ID: <20001001153303.E29839@liberty.sba2.netlojix.net>
From: "Martin v. Loewis"
>I can't see a problem here. AFAIK, ]]> is really
>equivalent to <markup test> from an XML point of view.
Sure. I did that test to make sure it didn't change it to a text
node containing the literal "". But what really got me
was that [text] got changed to
[]. On further reflection, I can
see that my previous concern about two original TEXT children of
was nonsensical (if they were really distinct, they should be elements),
but nonetheless, the lesson about having to concatenate all TEXT children
to get the original text value seems to be true.
>
>You did not say exactly how this reading or writing was achieved (SAX,
>DOM, something else). Whatever procedure was used to write this, I
>guess it should be possible to change it so that Text nodes (in the
>DOM sense) come out as CDATA sections - if that is really a
>requirement.
It's not that I have a love affair with CDATA, I just wanted to be sure
that arbitrary text wouldn't cause a problem.
I'm doing what is presumably the vanilla use of the package:
p = saxexts.make_parser()
dh = SaxBuilder()
p.setDocumentHandler(dh)
p.parseFile(f)
...
dh.doctoxml()
--
Clarence Gardner
Software Engineer
NetLojix Communications
clarence@netlojix.com
From m.favas@per.dem.csiro.au Mon Oct 2 00:20:52 2000
From: m.favas@per.dem.csiro.au (Mark Favas)
Date: Mon, 02 Oct 2000 07:20:52 +0800
Subject: [XML-SIG] distutils bug with PyXML 0.6.1, Python 2.0b2 (CVS)
Message-ID: <39D7C6D4.FBF17094@per.dem.csiro.au>
With the current (Oct 2) CVS versions of PyXML and Python (2), running
"python setup.py install" produces the following glitch: copies all the
relevant files to /usr/local/lib/python2.0/site-packages/_xmlplus and
then tries to compile them all. Unfortunatley, it tries to byte-compile
sgmlop.so, leading to the traceback below. Is this a PyXML mis-setup of
setup.py or a distutils (Python core version) bug?
byte-compiling
/usr/local/lib/python2.0/site-packages/_xmlplus/parsers/xmlproc/xmlutils.py
to xmlutils.pyc
byte-compiling
/usr/local/lib/python2.0/site-packages/_xmlplus/parsers/xmlproc/xmlval.py
to xmlval.pyc
Traceback (most recent call last):
File "setup.py", line 94, in ?
ext_modules = ext_modules
File "/usr/local/lib/python2.0/distutils/core.py", line 138, in setup
dist.run_commands()
File "/usr/local/lib/python2.0/distutils/dist.py", line 829, in
run_commands
self.run_command(cmd)
File "/usr/local/lib/python2.0/distutils/dist.py", line 849, in
run_command
cmd_obj.run()
File "/usr/local/lib/python2.0/distutils/command/install.py", line
470, in run
self.run_command(cmd_name)
File "/usr/local/lib/python2.0/distutils/cmd.py", line 328, in
run_command
self.distribution.run_command(command)
File "/usr/local/lib/python2.0/distutils/dist.py", line 849, in
run_command
cmd_obj.run()
File "/usr/local/lib/python2.0/distutils/command/install_lib.py", line
61, in
run
self.bytecompile(outfiles)
File "/usr/local/lib/python2.0/distutils/command/install_lib.py", line
88, in
bytecompile
verbose=self.verbose, dry_run=self.dry_run)
File "/usr/local/lib/python2.0/distutils/util.py", line 381, in
byte_compile
raise ValueError, \
ValueError: invalid filename:
'/usr/local/lib/python2.0/site-packages/_xmlplus/p
arsers/sgmlop.so' doesn't end with '.py'
--
Mark Favas - m.favas@per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA
From martin@loewis.home.cs.tu-berlin.de Mon Oct 2 00:33:12 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 2 Oct 2000 01:33:12 +0200
Subject: [martin@loewis.home.cs.tu-berlin.de: Re: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought]
In-Reply-To: <20001001153303.E29839@liberty.sba2.netlojix.net> (message from
Clarence Gardner on Sun, 1 Oct 2000 15:33:03 -0700)
References: <20001001153303.E29839@liberty.sba2.netlojix.net>
Message-ID: <200010012333.BAA00983@loewis.home.cs.tu-berlin.de>
> On further reflection, I can see that my previous concern about two
> original TEXT children of was nonsensical (if they were
> really distinct, they should be elements), but nonetheless, the
> lesson about having to concatenate all TEXT children to get the
> original text value seems to be true.
I think you have a point on splitting a text fragment into multiple
Text nodes; the DOM spec says about the interface Text:
# If there is no markup inside an element's content, the text is
# contained in a single object implementing the Text interface that is
# the only child of the element. If there is markup, it is parsed into
# a list of elements and Text nodes that form the list of children of
# the element.
# When a document is first made available via the DOM, there is only
# one Text node for each block of text. Users may create adjacent Text
# nodes that represent the contents of a given element without any
# intervening markup, but should be aware that there is no way to
# represent the separations between these nodes in XML or HTML, so
# they will not (in general) persist between DOM editing sessions. The
# normalize() method on Element [p.38] merges any such adjacent Text
# objects into a single node for each block of text; this is
# recommended before employing operations that depend on a particular
# document structure, such as navigation with XPointers.
[from REC-DOM-Level-1-19981001]
I'm not sure what that means for parsing <hallo> - is it
permitted that these are split into three Text nodes, is it required
that they are split, or is it disallowed?
According to section 2.4 of XML 1.0 [REC-xml-19980210] says that an
entity reference is markup; 4.1 says that > is an entity reference
(*not* a character reference) - so it appears permitted that multiple
Text nodes are created.
You *should* be able to merge them by calling normalize() on the tree;
I'm not sure whether that worked in 0.5.5.1, it does work with 4DOM in
PyXML 0.6. Please note that normalize won't merge CDATA sections.
Regards,
Martin
From gward@python.net Mon Oct 2 00:50:19 2000
From: gward@python.net (Greg Ward)
Date: Sun, 1 Oct 2000 19:50:19 -0400
Subject: [XML-SIG] Re: distutils bug with PyXML 0.6.1, Python 2.0b2 (CVS)
In-Reply-To: <39D7C6D4.FBF17094@per.dem.csiro.au>; from m.favas@per.dem.csiro.au on Mon, Oct 02, 2000 at 07:20:52AM +0800
References: <39D7C6D4.FBF17094@per.dem.csiro.au>
Message-ID: <20001001195018.A11937@beelzebub>
On 02 October 2000, Mark Favas said:
> With the current (Oct 2) CVS versions of PyXML and Python (2), running
> "python setup.py install" produces the following glitch: copies all the
> relevant files to /usr/local/lib/python2.0/site-packages/_xmlplus and
> then tries to compile them all. Unfortunatley, it tries to byte-compile
> sgmlop.so, leading to the traceback below. Is this a PyXML mis-setup of
> setup.py or a distutils (Python core version) bug?
D'ohh! That's a Distutils bug, introduced last night. Just checked in
a fix -- thanks for the quick report!
BTW, it doesn't matter if you follow the Python CVS or the Distutils
CVS, you'll get my latest code either way.
Greg
--
Greg Ward gward@python.net
http://starship.python.net/~gward/
From clarence@netlojix.com Mon Oct 2 03:17:54 2000
From: clarence@netlojix.com (Clarence Gardner)
Date: Sun, 1 Oct 2000 19:17:54 -0700
Subject: [martin@loewis.home.cs.tu-berlin.de: Re: [martin@loewis.home.cs.tu-berlin.de: Re: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought]]
Message-ID: <20001001191754.F29839@liberty.sba2.netlojix.net>
From: "Martin v. Loewis"
>I think you have a point on splitting a text fragment into multiple
>Text nodes; the DOM spec says about the interface Text:
>
># If there is no markup inside an element's content, the text is
># contained in a single object implementing the Text interface that is
># the only child of the element. If there is markup, it is parsed into
># a list of elements and Text nodes that form the list of children of
># the element.
>
>[from REC-DOM-Level-1-19981001]
>
>I'm not sure what that means for parsing <hallo> - is it
>permitted that these are split into three Text nodes, is it required
>that they are split, or is it disallowed?
>
>According to section 2.4 of XML 1.0 [REC-xml-19980210] says that an
>entity reference is markup; 4.1 says that > is an entity reference
>(*not* a character reference) - so it appears permitted that multiple
>Text nodes are created.
Thanks, Martin. (And please accept my apologies for posting from a
state of abysmal ignorance regarding XML. Being a person who actually
enjoys reading standards documents, I'm going to read through the
document you referenced.)
>You *should* be able to merge them by calling normalize() on the tree;
>I'm not sure whether that worked in 0.5.5.1, it does work with 4DOM in
>PyXML 0.6. Please note that normalize won't merge CDATA sections.
It does work, at least on my test data.
--
Clarence Gardner
Software Engineer
NetLojix Communications
clarence@netlojix.com
From uche.ogbuji@fourthought.com Mon Oct 2 19:05:35 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 02 Oct 2000 12:05:35 -0600
Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests
In-Reply-To: Message from "Martin v. Loewis"
of "Sat, 30 Sep 2000 09:34:43 +0200." <200009300734.JAA00694@loewis.home.cs.tu-berlin.de>
Message-ID: <200010021805.MAA11294@localhost.localdomain>
I still get masses of errors working with the pyxml CVS. I won't clutter the
list, but I've placed a (long) transcript of my last commit effort, erros and
all at
ftp://ftp.fourthought.com/pub/etc/pyxml-cvs-errors.txt
Some excerpts:
cvs diff: [10:57:45] waiting for uche's lock in /cvsroot/pyxml/xml/test/dom
Mailed xml-checkins@python.org
Traceback (innermost last):
File "/cvsroot/pyxml/CVSROOT/syncmail", line 321, in ?
blast_mail(mailcmd, specs[1:])
File "/cvsroot/pyxml/CVSROOT/syncmail", line 198, in blast_mail
fp = os.popen(cmd, 'w')
os.error: (11, 'Resource temporarily unavailable')
Mailed xml-checkins@python.org
Traceback (innermost last):
File "/cvsroot/pyxml/CVSROOT/syncmail", line 321, in ?
blast_mail(mailcmd, specs[1:])
File "/cvsroot/pyxml/CVSROOT/syncmail", line 193, in blast_mail
if not os.fork():
os.error: (11, 'Resource temporarily unavailable')
Mailed xml-checkins@python.org
Traceback (innermost last):
File "/cvsroot/pyxml/CVSROOT/syncmail", line 321, in ?
blast_mail(mailcmd, specs[1:])
File "/cvsroot/pyxml/CVSROOT/syncmail", line 203, in blast_mail
fp.write(calculate_diff(file))
IOError: (32, 'Broken pipe')
cvs [diff aborted]: no such tag NONE
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Mon Oct 2 19:06:22 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 02 Oct 2000 12:06:22 -0600
Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests
In-Reply-To: Message from "Martin v. Loewis"
of "Sat, 30 Sep 2000 09:34:43 +0200." <200009300734.JAA00694@loewis.home.cs.tu-berlin.de>
Message-ID: <200010021806.MAA11305@localhost.localdomain>
I've updated the structure to eliminate duplicate demos and test suites and
I've updated the DOM code-base.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From martin@loewis.home.cs.tu-berlin.de Mon Oct 2 22:35:05 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 2 Oct 2000 23:35:05 +0200
Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests
In-Reply-To: <200010021805.MAA11294@localhost.localdomain>
(uche.ogbuji@fourthought.com)
References: <200010021805.MAA11294@localhost.localdomain>
Message-ID: <200010022135.XAA01128@loewis.home.cs.tu-berlin.de>
> I still get masses of errors working with the pyxml CVS. I won't
> clutter the list, but I've placed a (long) transcript of my last
> commit effort, erros and all at
It seems that your check-ins made it through, anyway - Thanks!
You may want to check whether you have there were changes that did not
get committed (just update, and see whether any conflicts are
reported; cvs will store the original files with your modifications in
a .# name in that case).
> ftp://ftp.fourthought.com/pub/etc/pyxml-cvs-errors.txt
# fp = os.popen(diffcmd)
# os.error: (11, 'Resource temporarily unavailable')
fork(2) will give error 11 if no process could be started anymore. It
seems that the machine was running out of processes, so the problem is
hopefully indeed temporarily.
In any case - this was just an attempt to send a commit email message;
the main operation was not affected. To see the code of the syncmail
script, just do 'cvs co CVSROOT' in the xml topleve directory.
If the problem persists, SF would need to kill some processes, or
reboot the machine - I believe none of us could actually log into it.
Regards,
Martin
From dwallace@udel.edu Wed Oct 4 03:26:43 2000
From: dwallace@udel.edu (Dave)
Date: Tue, 03 Oct 2000 22:26:43 -0400
Subject: [XML-SIG] Can't make more than one parser
Message-ID: <39DA9563.1000602@delanet.com>
Hello,
I appologize if it is premature to report this on your development tree
but I have been seeing it for a several days now and wanted you to be aware.
Using the latest PyXML CVS checkout, the following code throws an
exception at the second make_parser call. This happens in Python2.0
(also latest CVS) and Python1.6.
from xml.sax.saxexts import make_parser
p = make_parser()
if p:
print "* got one *"
q = make_parser()
if q:
print "* got the other *"
The exception thrown is:
Traceback (most recent call last):
File "test_xml.py", line 8, in ?
q = make_parser()
File "/usr/local/lib/python1.6/site-packages/_xmlplus/sax/saxexts.py",
line 158, in make_parser
return XMLParserFactory.make_parser(parser_list)
File "/usr/local/lib/python1.6/site-packages/_xmlplus/sax/saxexts.py",
line 63, in make_parser
return self._create_parser(parser_name)
File "/usr/local/lib/python1.6/site-packages/_xmlplus/sax/saxexts.py",
line 43, in _create_parser
return drv_module.create_parser()
AttributeError: create_parser
Dave.
From martin@loewis.home.cs.tu-berlin.de Wed Oct 4 07:41:06 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 4 Oct 2000 08:41:06 +0200
Subject: [XML-SIG] Can't make more than one parser
In-Reply-To: <39DA9563.1000602@delanet.com> (message from Dave on Tue, 03 Oct
2000 22:26:43 -0400)
References: <39DA9563.1000602@delanet.com>
Message-ID: <200010040641.IAA00909@loewis.home.cs.tu-berlin.de>
> File "/usr/local/lib/python1.6/site-packages/_xmlplus/sax/saxexts.py",
> line 43, in _create_parser
> return drv_module.create_parser()
> AttributeError: create_parser
Apparently, it attempts to load a parser module which does not
implement a create_parser function. Before line 43, could you please
insert a line "print drv_module", and report what it prints? (of
course, you can also run it in the IDLE debugger to see what happens)
Regards,
Martin
From bkc@murkworks.com Wed Oct 4 15:45:50 2000
From: bkc@murkworks.com (Brad Clements)
Date: Wed, 4 Oct 2000 10:45:50 -0400
Subject: [XML-SIG] Build problem on Win2k
Message-ID: <39DB0A5C.7184.8D6529D@localhost>
I have Python 1.5.2, trying to install XML v0.5.2
I have dist-utils installed (perhaps its too old).
python setup.py build produces this traceback.. I'll start looking to see
what's happening, but here's the info in case someone knows what's up.
creating build\lib.win32\xml\utils
copying xml\utils\iso8601.py -> build\lib.win32\xml\utils
copying xml\utils\qp_xml.py -> build\lib.win32\xml\utils
copying xml\utils\__init__.py -> build\lib.win32\xml\utils
running build_ext
warning: build_ext: old-style (ext_name, build_info) tuple found in ext_modules
for extension 'sgmlop'-- please convert to Extension instance
warning: build_ext: old-style (ext_name, build_info) tuple found in ext_modules
for extension 'xml.unicode.wstrop'-- please convert to Extension instance
warning: build_ext: old-style (ext_name, build_info) tuple found in ext_modules
for extension 'xml.parsers.pyexpat'-- please convert to Extension instance
building 'sgmlop' extension
creating build\temp.win32
creating build\temp.win32\Release
creating build\temp.win32\Release\extensions
D:\Program Files\Microsoft Visual Studio\VC98\BIN\cl.exe /c /nologo /Ox /MD /W3
-Id:\progra~1\python\Include /Tcextensions/sgmlop.c /Fobuild\temp.win32\Release\
extensions/sgmlop.obj
sgmlop.c
Traceback (innermost last):
File "setup.py", line 53, in ?
ext_modules = wstr_modules +
File "d:\Program Files\Python\Lib\distutils\core.py", line 112, in setup
dist.run_commands ()
File "d:\Program Files\Python\Lib\distutils\dist.py", line 776, in run_command
s
self.run_command (cmd)
File "d:\Program Files\Python\Lib\distutils\dist.py", line 797, in run_command
cmd_obj.run ()
File "d:\Program Files\Python\Lib\distutils\command\build.py", line 117, in ru
n
self.run_command ('build_ext')
File "d:\Program Files\Python\Lib\distutils\cmd.py", line 310, in run_command
self.distribution.run_command (command)
File "d:\Program Files\Python\Lib\distutils\dist.py", line 797, in run_command
cmd_obj.run ()
File "d:\Program Files\Python\Lib\distutils\command\build_ext.py", line 224, i
n run
self.build_extensions ()
File "d:\Program Files\Python\Lib\distutils\command\build_ext.py", line 428, i
n build_extensions
libraries=self.get_libraries(ext),
File "d:\Program Files\Python\Lib\distutils\command\build_ext.py", line 571, i
n get_libraries
return ext.libraries + [pythonlib]
TypeError: bad operand type(s) for +
Brad Clements, bkc@murkworks.com (315)268-1000
http://www.murkworks.com (315)268-9812 Fax
netmeeting: ils://ils.murkworks.com AOL-IM: BKClements
From bkc@murkworks.com Wed Oct 4 15:53:18 2000
From: bkc@murkworks.com (Brad Clements)
Date: Wed, 4 Oct 2000 10:53:18 -0400
Subject: [XML-SIG] Cancel that build error on win2k
Message-ID: <39DB0C1B.11449.8DD27BB@localhost>
Apparently I had an out-of-date distutils. Upgrading to 1.0 fixed the
problem.
Brad Clements, bkc@murkworks.com (315)268-1000
http://www.murkworks.com (315)268-9812 Fax
netmeeting: ils://ils.murkworks.com AOL-IM: BKClements
From brian@watchmark.com Wed Oct 4 17:17:52 2000
From: brian@watchmark.com (Brian Fritz)
Date: Wed, 04 Oct 2000 09:17:52 -0700
Subject: [XML-SIG] Q: Any Solaris users that have successfully installed PyXML?
Message-ID: <39DB5830.FA2A3507@watchmark.com>
Hi,
I reviewed the archives for the last couple of months and didn't notice any
posts that seemed relevant to installing PyXML on a Sun SparcStation running
Solaris. My apologies if I stopped looking too soon.
I ftp'd the PyXML-0.5.5.1 source yesterday and quickly discovered that
I apparently also needed to install the Distutils-1.0. Reading through the
install isntructions for the Distutils I noticed that it said:
> To use the Distutils under Unix, you must have a *complete* Python
> installation, including the Makefile and config.h used to build Python.
Do I have to build Python from source to install the Distutils to install
the PyXML modules?
Has anyone else been down this path for Solaris and can warn me in advance
of any more "traps", or suggest shortcuts?
TIA!
Brian
From akuchlin@mems-exchange.org Wed Oct 4 17:39:43 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Wed, 4 Oct 2000 12:39:43 -0400
Subject: [XML-SIG] Q: Any Solaris users that have successfully installed PyXML?
In-Reply-To: <39DB5830.FA2A3507@watchmark.com>; from brian@watchmark.com on Wed, Oct 04, 2000 at 09:17:52AM -0700
References: <39DB5830.FA2A3507@watchmark.com>
Message-ID: <20001004123908.A5080@kronos.cnri.reston.va.us>
On Wed, Oct 04, 2000 at 09:17:52AM -0700, Brian Fritz wrote:
>Do I have to build Python from source to install the Distutils to install
>the PyXML modules?
If whoever installed Python for you did the job correctly, no, since
the Makefile and Setup should have been installed at that time. You
don't need to have a complete Python source tree lying around.
Distutils needs /usr/local/lib/python1.5/config/Makefile
(/python1.5/config/Makefile, to be more general), so check if
it's there in your Python installation. If it is, the Distutils
should work fine.
>Has anyone else been down this path for Solaris and can warn me in advance
>of any more "traps", or suggest shortcuts?
I develop on Solaris part of the time, and there aren't any special
problems that I'm unaware of.
--amk
From tgagne@efinnet.com Wed Oct 4 21:04:39 2000
From: tgagne@efinnet.com (Thomas Gagne)
Date: Wed, 04 Oct 2000 16:04:39 -0400
Subject: [XML-SIG] Accessing DOM nodes in Python
Message-ID: <39DB8D57.F59A80D8@ix.netcom.com>
I was just going through the XML howto about creating DOMs. I have a buffer
that looks like:
And I want to get the value of the "status" attribute. My subroutine looks
like:
def isResultValue(buffer):
print buffer
parser = saxexts.make_parser()
dh = SaxBuilder()
parser.setDocumentHandler(dh)
fh = StringIO.StringIO(buffer)
parser.parseFile(fh)
print dh.get_parentNode()
parser.close()
fh.close()
Now, the problem is, I don't know how to get the first node from dh. I
usually try to print variables to see what they can do, but I'm not seeing
anything when I try "print dh". I've tried printing dh.parentNode and
dh.get_parentNode() without success. I think if someone could just point me
in the right direction I'd be zooming right along.
--
.tom
From akuchlin@mems-exchange.org Wed Oct 4 21:11:38 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Wed, 4 Oct 2000 16:11:38 -0400
Subject: [XML-SIG] Accessing DOM nodes in Python
In-Reply-To: <39DB8D57.F59A80D8@ix.netcom.com>; from tgagne@ix.netcom.com on Wed, Oct 04, 2000 at 04:04:39PM -0400
References: <39DB8D57.F59A80D8@ix.netcom.com>
Message-ID: <20001004161138.A7962@kronos.cnri.reston.va.us>
On Wed, Oct 04, 2000 at 04:04:39PM -0400, Thomas Gagne wrote:
>anything when I try "print dh". I've tried printing dh.parentNode and
>dh.get_parentNode() without success. I think if someone could just point me
>in the right direction I'd be zooming right along.
dh is a SAX document handler, not a DOM tree, so I wouldn't expect
get_parentNode() to work. Instead say "doc = dh.document"; doc is
then a DOM tree, so you can call doc.getElementsByTagName() or
whatever.
--amk
From tgagne@efinnet.com Wed Oct 4 21:20:59 2000
From: tgagne@efinnet.com (Thomas Gagne)
Date: Wed, 04 Oct 2000 16:20:59 -0400
Subject: [XML-SIG] Accessing DOM nodes in Python
References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us>
Message-ID: <39DB912B.DC401BDA@ix.netcom.com>
Andrew Kuchling wrote:
> On Wed, Oct 04, 2000 at 04:04:39PM -0400, Thomas Gagne wrote:
> >anything when I try "print dh". I've tried printing dh.parentNode and
> >dh.get_parentNode() without success. I think if someone could just point me
> >in the right direction I'd be zooming right along.
>
> dh is a SAX document handler, not a DOM tree, so I wouldn't expect
> get_parentNode() to work. Instead say "doc = dh.document"; doc is
> then a DOM tree, so you can call doc.getElementsByTagName() or
> whatever.
That sounded good so I tried it. The code now looks like:
def isResultValue(buffer):
print buffer
parser = saxexts.make_parser()
dh = SaxBuilder()
parser.setDocumentHandler(dh)
fh = StringIO.StringIO(buffer)
parser.parseFile(fh)
doc = dh.document
print doc.getElementsByTagName("isResultsInfo")
parser.close()
fh.close()
and when I run it, I get:
Now, looking at the buffer I can see there's a node, but it
*never* seems to show up. It's driving me nuts!!!
>
>
> --amk
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
--
.tom
From tgagne@efinnet.com Wed Oct 4 21:24:49 2000
From: tgagne@efinnet.com (Thomas Gagne)
Date: Wed, 04 Oct 2000 16:24:49 -0400
Subject: [XML-SIG] Accessing DOM nodes in Python
References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us>
Message-ID: <39DB9211.6ED019E0@ix.netcom.com>
One last comment, after creating the doc, and I try: "print doc.toxml()"
I get:
Showing the child of isResult, isResultInfo, is missing completey. Where did
it go???
--
.tom
From tgagne@efinnet.com Wed Oct 4 21:40:33 2000
From: tgagne@efinnet.com (Thomas Gagne)
Date: Wed, 04 Oct 2000 16:40:33 -0400
Subject: [XML-SIG] Accessing DOM nodes in Python
References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> <39DB9211.6ED019E0@ix.netcom.com>
Message-ID: <39DB95C1.F0C1F5F5@ix.netcom.com>
Here's something curious:
If I try the same thing on
I get the output:
Which shows the childnode. Is it possible we have a problem with newline
characters in the buffer? Is this a parser problem or a StringIO problem?
--
.tom
From dag@orion.no Wed Oct 4 21:50:07 2000
From: dag@orion.no (Dag Sunde)
Date: Wed, 4 Oct 2000 22:50:07 +0200
Subject: [XML-SIG] Accessing DOM nodes in Python
References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> <39DB912B.DC401BDA@ix.netcom.com>
Message-ID: <009b01c02e44$aee1a090$43145c3e@orion.no>
Check Your argument to "getElementsByTagName"...
You ask for "isResultsInfo" (plural) (s)...
but your tag is defined as "" (singular)
Remove the "s" in: doc.getElementsByTagName("isResult_s_Info")
and you shold be ok... :-)
Dag.
----- Original Message -----
From: "Thomas Gagne"
Cc: "Python XML-SIG"
Sent: 4. oktober 2000 22:20
Subject: Re: [XML-SIG] Accessing DOM nodes in Python
> Andrew Kuchling wrote:
>
> > On Wed, Oct 04, 2000 at 04:04:39PM -0400, Thomas Gagne wrote:
> > >anything when I try "print dh". I've tried printing dh.parentNode and
> > >dh.get_parentNode() without success. I think if someone could just
point me
> > >in the right direction I'd be zooming right along.
> >
> > dh is a SAX document handler, not a DOM tree, so I wouldn't expect
> > get_parentNode() to work. Instead say "doc = dh.document"; doc is
> > then a DOM tree, so you can call doc.getElementsByTagName() or
> > whatever.
>
> That sounded good so I tried it. The code now looks like:
>
> def isResultValue(buffer):
> print buffer
> parser = saxexts.make_parser()
>
> dh = SaxBuilder()
>
> parser.setDocumentHandler(dh)
>
> fh = StringIO.StringIO(buffer)
> parser.parseFile(fh)
>
> doc = dh.document
> print doc.getElementsByTagName("isResultsInfo")
>
> parser.close()
> fh.close()
>
> and when I run it, I get:
>
>
>
>
>
>
>
> Now, looking at the buffer I can see there's a node, but it
> *never* seems to show up. It's driving me nuts!!!
>
> >
> >
> > --amk
> >
> > _______________________________________________
> > XML-SIG maillist - XML-SIG@python.org
> > http://www.python.org/mailman/listinfo/xml-sig
>
> --
> .tom
>
>
>
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
Admin
Orion System AS
**********************************************************************
From tgagne@efinnet.com Wed Oct 4 21:59:26 2000
From: tgagne@efinnet.com (Thomas Gagne)
Date: Wed, 04 Oct 2000 16:59:26 -0400
Subject: [XML-SIG] Re: The typo doesn't seem to change things...
References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> <39DB912B.DC401BDA@ix.netcom.com> <009b01c02e44$aee1a090$43145c3e@orion.no>
Message-ID: <39DB9A2E.D3103629@ix.netcom.com>
Yes, it was a typo, but the node still seems to disappear after parsing.
--
.tom
From martin@loewis.home.cs.tu-berlin.de Thu Oct 5 02:41:11 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 5 Oct 2000 03:41:11 +0200
Subject: [XML-SIG] Accessing DOM nodes in Python
In-Reply-To: <39DB95C1.F0C1F5F5@ix.netcom.com> (message from Thomas Gagne on
Wed, 04 Oct 2000 16:40:33 -0400)
References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> <39DB9211.6ED019E0@ix.netcom.com> <39DB95C1.F0C1F5F5@ix.netcom.com>
Message-ID: <200010050141.DAA00782@loewis.home.cs.tu-berlin.de>
> Which shows the childnode. Is it possible we have a problem with newline
> characters in the buffer? Is this a parser problem or a StringIO problem?
It's likely not a StringIO problem. Can you find out what parser you
are using? I.e. print parser.
Regards,
Martin
From tgagne@efinnet.com Thu Oct 5 03:31:12 2000
From: tgagne@efinnet.com (Thomas Gagne)
Date: Wed, 04 Oct 2000 22:31:12 -0400
Subject: [XML-SIG] Accessing DOM nodes in Python
References: <39DB8D57.F59A80D8@ix.netcom.com> <20001004161138.A7962@kronos.cnri.reston.va.us> <39DB9211.6ED019E0@ix.netcom.com> <39DB95C1.F0C1F5F5@ix.netcom.com> <200010050141.DAA00782@loewis.home.cs.tu-berlin.de>
Message-ID: <39DBE7F0.A35A303B@ix.netcom.com>
Can you tell which parser you're using?
print parser
When I check /usr/lib/python1.5/site-packages/xml/dom, it appears to be
PyXML-0.5.5.1.
--
.tom
From tgagne@efinnet.com Thu Oct 5 04:18:09 2000
From: tgagne@efinnet.com (Thomas Gagne)
Date: Wed, 04 Oct 2000 23:18:09 -0400
Subject: [XML-SIG] FIXED: Accessing DOM nodes in Python
References: <39DB8D57.F59A80D8@ix.netcom.com>
Message-ID: <39DBF2F0.30675B1A@ix.netcom.com>
The buffer I was getting back was from a middleware routine retrieving each
line one at a time. Since the middleware's API is C based, string are
returned with a trailing NULL byte. Since the API doesn't care whether the
data is text or binary it dutifully returns the trailing NULL byte to the
Python interface which perturbs string processing--especially when one string
is appended to another, the NULL bytes remain.
I have to figure out where the appropriate place is to trim the NULL byte from
the end of each line and things should be cool.
I'd like to thank everyone for their help.
--
.tom
From martin@loewis.home.cs.tu-berlin.de Thu Oct 5 08:32:26 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 5 Oct 2000 09:32:26 +0200
Subject: [XML-SIG] FIXED: Accessing DOM nodes in Python
In-Reply-To: <39DBF2F0.30675B1A@ix.netcom.com> (message from Thomas Gagne on
Wed, 04 Oct 2000 23:18:09 -0400)
References: <39DB8D57.F59A80D8@ix.netcom.com> <39DBF2F0.30675B1A@ix.netcom.com>
Message-ID: <200010050732.JAA00717@loewis.home.cs.tu-berlin.de>
> I have to figure out where the appropriate place is to trim the NULL
> byte from the end of each line and things should be cool.
I'd say it's in the C API, when it creates Python objects. The
terminating 0 should not be counted towards the size of the string if
you are using PyString_FromStringAndSize; you better use
PyString_FromString in this case.
Regards,
Martin
From christian@ellguth.de Thu Oct 5 10:26:46 2000
From: christian@ellguth.de (Christian Ellguth)
Date: Thu, 5 Oct 2000 11:26:46 +0200
Subject: [XML-SIG] German Umlauts
Message-ID: <00100511264600.11550@cellguth>
Hi,
I have some troubles using named entities like ä .
Everytime the parser encounters an entity like this it stops parsing with=
the=20
message "unknown entity at ... " .
If I use the numerical representation like ä for ä everything w=
orks=20
fine.
Is this a bug in the python sax-parser, or did I omit to tell the=20
documenthandler what to do with named entities ?
Thank you for your reply,
Christian
--=20
Universitaetsbibliothek Braunschweig
Christian Ellguth
Pockelsstr. 13
38106 Braunschweig
From Juergen Hermann"
Message-ID:
On Thu, 5 Oct 2000 11:26:46 +0200, Christian Ellguth wrote:
>Is this a bug in the python sax-parser, or did I omit to tell the
>documenthandler what to do with named entities ?
No, you omitted to read the XML specs thoroughly. ;) XML knows exactly =
FIVE entities by default, anything else has to be defined.
You have 3 options:
1) use encoding=3D"iso-8859-1" and then literal =E4=F6=FC=C4=D6=DC char=
s
2) define the entities yourself (or better, include the latin1.ent
file, which is part of XHTML for example (see the w3c site))
3) use the numerical representation
Option 1 is what is normally used, option 2 is when you want to re-use
"old" HTML that is converted to XHTML, option 3 is for quick hacks.
Ciao, J=FCrgen
--
J=FCrgen Hermann, Developer (jhe@webde-ag.de)
WEB.DE AG, http://webde-ag.de/
From jeremy@beopen.com Thu Oct 5 17:32:34 2000
From: jeremy@beopen.com (Jeremy Hylton)
Date: Thu, 5 Oct 2000 12:32:34 -0400 (EDT)
Subject: [XML-SIG] SAX exceptions are odd
Message-ID: <14812.44322.692362.12640@bitdiddle.concentric.net>
I am just learning how to use SAX and am a bit puzzled by a few of the
exceptions that get raised or not raised.
If I call on parse on an empty file, I get no exception. Is this
desirable? I assume it means that "" is well-formed XML, but that
doesn't seem like a very helpful definition. Is this right?
If I get almost any other exception I get an error message that says
something like: "not well-formed at None:1:7"
Why is None being printed? It gave me the initial impression that my
error was no setting up parse call correctly. I assumed that the None
was the cause of the exception and that under normal circumstances it
would have said something like "not well-formed at foo.xml:1:7".
What is a system identifier and why should it be reported in an
exception when it is None?
I also think the format is odd. There are three different pieces of
information separated by colons. I am accustomed to the notation
filename:line number, but not another colon for the cursor position.
It would have been clearer, I think, if the message were more verbose
and explained what each field was.
Jeremy
From larsga@garshol.priv.no Thu Oct 5 17:51:56 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 05 Oct 2000 18:51:56 +0200
Subject: [XML-SIG] SAX exceptions are odd
In-Reply-To: <14812.44322.692362.12640@bitdiddle.concentric.net>
References: <14812.44322.692362.12640@bitdiddle.concentric.net>
Message-ID:
* Jeremy Hylton
|
| If I call on parse on an empty file, I get no exception. Is this
| desirable? I assume it means that "" is well-formed XML, but that
| doesn't seem like a very helpful definition. Is this right?
No, it's not right. You should get an error telling you that the
document element is required.
| If I get almost any other exception I get an error message that says
| something like: "not well-formed at None:1:7"
Expat is not very good at providing informative error messages, so I
don't think you can expect much more. If you want better error
messages you should probably use xmlproc or xmllib.
As for the None that should imply that you just gave the parser a
string to parse and didn't provide it with a system identifier (ie:
URL or file name).
| Why is None being printed? It gave me the initial impression that my
| error was no setting up parse call correctly. I assumed that the None
| was the cause of the exception and that under normal circumstances it
| would have said something like "not well-formed at foo.xml:1:7".
If you told it that you were parsing from foo.xml it should definitely
return that information in the error message. Can you show us the
exact call to parse?
| What is a system identifier and why should it be reported in an
| exception when it is None?
The system identifier is SGML-speak (and XML-speak) for the location
of the document being parsed. I guess we could leave it out in the
cases where it is None, if people prefer that. (I personally have no
opinion on that.)
| I also think the format is odd. There are three different pieces of
| information separated by colons. I am accustomed to the notation
| filename:line number, but not another colon for the cursor position.
| It would have been clearer, I think, if the message were more
| verbose and explained what each field was.
How about this:
"Not well-formed in foo.xml at line %d, column %d."
If you prefer that I'd be happy to change both that and the lost
system identifier (if that is indeed the problem).
--Lars M.
From jeremy@beopen.com Thu Oct 5 21:59:08 2000
From: jeremy@beopen.com (Jeremy Hylton)
Date: Thu, 5 Oct 2000 16:59:08 -0400 (EDT)
Subject: [XML-SIG] Re: SAX exceptions are odd
Message-ID: <14812.60316.775448.910249@bitdiddle.concentric.net>
[Lars M. writes:]
>* Jeremy Hylton
>|
>| If I call on parse on an empty file, I get no exception. Is this
>| desirable? I assume it means that "" is well-formed XML, but that
>| doesn't seem like a very helpful definition. Is this right?
>
>No, it's not right. You should get an error telling you that the
>document element is required.
Ok. Then consider it a bug report :-). Can you fix this and add a
test case to the test suite?
>
>| If I get almost any other exception I get an error message that says
>| something like: "not well-formed at None:1:7"
>
>Expat is not very good at providing informative error messages, so I
>don't think you can expect much more. If you want better error
>messages you should probably use xmlproc or xmllib.
I think the explanation part of the error message is okay, could be
better but not terrible. The part that's confusing is the
formatting.
>As for the None that should imply that you just gave the parser a
>string to parse and didn't provide it with a system identifier (ie:
>URL or file name).
How does it know when I pass it a string and when I pass it a system
identifier? In Python, system identifiers are strings?!? What if I
have a file called "" will it open that file or attempt to parse
it as a string?
>| Why is None being printed? It gave me the initial impression that my
>| error was no setting up parse call correctly. I assumed that the None
>| was the cause of the exception and that under normal circumstances it
>| would have said something like "not well-formed at foo.xml:1:7".
>
>If you told it that you were parsing from foo.xml it should definitely
>return that information in the error message. Can you show us the
>exact call to parse?
I have a file foo in my current directory. I fire up Python:
> ls -l foo
-rw-rw-r-- 1 jeremy admin 0 Oct 5 16:57 foo
c> python
Python 2.0b2 (#18, Oct 5 2000, 09:53:11)
[GCC 2.95.2 19991024 (release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from xml.sax import parse, ContentHandler
>>> parse("foo", ContentHandler())
>>>
>| What is a system identifier and why should it be reported in an
>| exception when it is None?
>
>The system identifier is SGML-speak (and XML-speak) for the location
>of the document being parsed. I guess we could leave it out in the
>cases where it is None, if people prefer that. (I personally have no
>opinion on that.)
I personally prefer that.
>
>| I also think the format is odd. There are three different pieces of
>| information separated by colons. I am accustomed to the notation
>| filename:line number, but not another colon for the cursor position.
>| It would have been clearer, I think, if the message were more
>| verbose and explained what each field was.
>
>How about this:
>
> "Not well-formed in foo.xml at line %d, column %d."
>
>If you prefer that I'd be happy to change both that and the lost
>system identifier (if that is indeed the problem).
I would like this a lot better. It will be appreciated by novice
programmers and whiners like me.
Jeremy
From fdrake@beopen.com Fri Oct 6 03:04:07 2000
From: fdrake@beopen.com (Fred L. Drake, Jr.)
Date: Thu, 5 Oct 2000 22:04:07 -0400 (EDT)
Subject: [XML-SIG] xml/dom/ext/reader/test_suite/ ?
Message-ID: <14813.13079.166524.629646@cj42289-a.reston1.va.home.com>
Will this test remain in it's current location? This seems like the
wrong place for it, and it isn't a package. I suspect the test/
directory would provide a better home, but I don't want to move it
without the 4Thought team having a chance to object. ;)
-Fred
--
Fred L. Drake, Jr.
BeOpen PythonLabs Team Member
From Juergen Hermann"
On Thu, 5 Oct 2000 21:18:27 +0200, Martin v. Loewis wrote:
>Actually, there is a fourth one which I believe is the officially
>preferred one: Encode your text as UTF-8 (i.e. no encoding=3D
>attribute). That will remove the need to have any character entities,
>except for the five predefined ones.
That is a variation of 1), which you can use when you have an UTF-
enabled editor (or the files are machine-generated anyway).
Ciao, J=FCrgen
--
J=FCrgen Hermann, Developer (jhe@webde-ag.de)
WEB.DE AG, http://webde-ag.de/
From martin@loewis.home.cs.tu-berlin.de Thu Oct 5 22:36:18 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 5 Oct 2000 23:36:18 +0200
Subject: [XML-SIG] SAX exceptions are odd
In-Reply-To: <14812.44322.692362.12640@bitdiddle.concentric.net> (message from
Jeremy Hylton on Thu, 5 Oct 2000 12:32:34 -0400 (EDT))
References: <14812.44322.692362.12640@bitdiddle.concentric.net>
Message-ID: <200010052136.XAA01520@loewis.home.cs.tu-berlin.de>
> If I call on parse on an empty file, I get no exception. Is this
> desirable? I assume it means that "" is well-formed XML, but that
> doesn't seem like a very helpful definition. Is this right?
No, that looks like a bug in the expat parser. The xmlproc parser (in
PyXML) properly reports
FATAL ERROR in /tmp/foo:1:0: Premature document end, no root element
(when foo is an empty file)
> If I get almost any other exception I get an error message that says
> something like: "not well-formed at None:1:7"
>
> Why is None being printed? It gave me the initial impression that my
> error was no setting up parse call correctly. I assumed that the None
> was the cause of the exception and that under normal circumstances it
> would have said something like "not well-formed at foo.xml:1:7".
If the InputSource object has a proper system identifier, it should
print it. It may be useful to print something different if it is None,
e.g.
"not well-formed at :1:7"
If you did provide a file name, and it got lost somewhere - then that
is a bug.
> What is a system identifier and why should it be reported in an
> exception when it is None?
I believe it is the SGML term for "file name". In SGML, documents may
have "public identifiers", in which case a globally well-known string
refers to the name of the document, and a system identifier - whose
meaning is understood only on the local computer system.
I also believe XML more specifically thinks of system identifiers as
URLs - although it is common to allow strings which are not URLs
(according to the RFC).
> There are three different pieces of
> information separated by colons. I am accustomed to the notation
> filename:line number, but not another colon for the cursor position.
That's a matter of taste - you can write your own ErrorHandler if you
don't like the output. I personally understood immediately that
notation, as this is what Emacs supports as file locations.
> It would have been clearer, I think, if the message were more
> verbose and explained what each field was.
For reproducability, it is probably best if it is terse - we would
probably have a long debate on what it should look like if it had to
change.
Regards,
Martin
From jeremy@beopen.com Fri Oct 6 16:26:37 2000
From: jeremy@beopen.com (Jeremy Hylton)
Date: Fri, 6 Oct 2000 11:26:37 -0400 (EDT)
Subject: [XML-SIG] SAX exceptions are odd
In-Reply-To: <200010052136.XAA01520@loewis.home.cs.tu-berlin.de>
References: <14812.44322.692362.12640@bitdiddle.concentric.net>
<200010052136.XAA01520@loewis.home.cs.tu-berlin.de>
Message-ID: <14813.61229.663642.454479@bitdiddle.concentric.net>
>>>>> "MvL" == Martin v Loewis writes:
>> If I get almost any other exception I get an error message that
>> says something like: "not well-formed at None:1:7"
>>
>> Why is None being printed? It gave me the initial impression
>> that my error was no setting up parse call correctly. I assumed
>> that the None was the cause of the exception and that under
>> normal circumstances it would have said something like "not
>> well-formed at foo.xml:1:7".
MvL> If the InputSource object has a proper system identifier, it
MvL> should print it. It may be useful to print something different
MvL> if it is None, e.g.
MvL> "not well-formed at :1:7"
MvL> If you did provide a file name, and it got lost somewhere -
MvL> then that is a bug.
(It looks like you may have missed my second message on this subject.)
I did pass a filename that was lost.
>> There are three different pieces of information separated by
>> colons. I am accustomed to the notation filename:line number,
>> but not another colon for the cursor position.
MvL> That's a matter of taste - you can write your own ErrorHandler
MvL> if you don't like the output. I personally understood
MvL> immediately that notation, as this is what Emacs supports as
MvL> file locations.
It is a matter of taste. We have been trying to improve the quality
and verbosity of error messages raised by Python code so that novices
have a better chance of understanding them. It is no help to tell a
beginner: "The error messages produced by the xml packages are a tad
obscure. Just write a subclass that makes the errors clearer."
>> It would have been clearer, I think, if the message were more
>> verbose and explained what each field was.
MvL> For reproducability, it is probably best if it is terse - we
MvL> would probably have a long debate on what it should look like
MvL> if it had to change.
I don't understand what you mean by reproducability. The ability to
reproduce an error message has nothing to do with whether it is terse
or verbose.
I liked the suggested error message that Lars proposed a *lot* better.
Jeremy
From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 16:26:38 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 6 Oct 2000 17:26:38 +0200
Subject: [XML-SIG] Re: SAX exceptions are odd
In-Reply-To: <14812.60316.775448.910249@bitdiddle.concentric.net> (message
from Jeremy Hylton on Thu, 5 Oct 2000 16:59:08 -0400 (EDT))
References: <14812.60316.775448.910249@bitdiddle.concentric.net>
Message-ID: <200010061526.RAA00826@loewis.home.cs.tu-berlin.de>
> How does it know when I pass it a string and when I pass it a system
> identifier? In Python, system identifiers are strings?!? What if I
> have a file called "" will it open that file or attempt to parse
> it as a string?
If you invoke xml.sax.parse, it will always be understood as a system
identifier - you should invoke parseString if you have a "here"
document.
These are convenience functions - the full API has the notion of
InputSource objects, which are the primary means to tell a parser what
to process. There is some magic telling file names apart from file
objects, but that can't also tell apart system identifiers and here
documents - hence the two functions.
> >How about this:
> >
> > "Not well-formed in foo.xml at line %d, column %d."
> >
> >If you prefer that I'd be happy to change both that and the lost
> >system identifier (if that is indeed the problem).
>
> I would like this a lot better. It will be appreciated by novice
> programmers and whiners like me.
I'd like to caution again: No matter what string is taken now, it will
have to stay forever. Other tools will expect that a certain Python
application formats its XML error messages in a certain way, and they
will whine if that is ever changed.
If that consequence is accepted, then it's fine with me to change that
string...
Regards,
Martin
From jeremy@beopen.com Fri Oct 6 16:41:49 2000
From: jeremy@beopen.com (Jeremy Hylton)
Date: Fri, 6 Oct 2000 11:41:49 -0400 (EDT)
Subject: [XML-SIG] SAX exceptions are odd
In-Reply-To: <200010052136.XAA01520@loewis.home.cs.tu-berlin.de>
References: <14812.44322.692362.12640@bitdiddle.concentric.net>
<200010052136.XAA01520@loewis.home.cs.tu-berlin.de>
Message-ID: <14813.62141.233397.345264@bitdiddle.concentric.net>
Here is another potential problem with xml exceptions. There may not
be anything to do about it, because the sax package is, by design,
very clever about imports.
>>> from xml import sax
>>> sax.parse("", sax.ContentHandler())
Traceback (most recent call last):
File "", line 1, in ?
File "./../Lib/xml/sax/__init__.py", line 29, in parse
parser = make_parser()
File "./../Lib/xml/sax/__init__.py", line 79, in make_parser
raise SAXException("No parsers found", None)
xml.sax._exceptions.SAXException: No parsers found
This puzzled me for quite a while, because I was sure I had a parser.
[continuing same session:]
>>> import pyexpat
>>>
I start poking around in the internals of the sax implementation. I
see that I ought to be able to import xml.sax.expatreader. So I fire
up a new session and try it:
>>> import xml.sax.expatreader
Traceback (most recent call last):
File "", line 1, in ?
File "/home/jeremy/src/python/dist/src/Lib/xml/sax/expatreader.py", line 10, in ?
from xml.sax import xmlreader, saxutils, handler
File "/home/jeremy/src/python/dist/src/Lib/xml/sax/saxutils.py", line 6, in ?
import os, urlparse, urllib, types
File "/home/jeremy/src/python/dist/src/Lib/urllib.py", line 26, in ?
import socket
File "/home/jeremy/src/python/dist/src/Lib/socket.py", line 41, in ?
from _socket import *
ImportError: libssl.so.0: cannot open shared object file: No such file or directory
So the problem is a bogus local configuration for shared libraries.
On the one hand, I haven't installed Python properly, so I shouldn't
expect things to work. On the other hand, it would be helpful if
unexpected exceptions could be reported. Is there any way to provide
an informative error message in this case?
Jeremy
From jeremy@beopen.com Fri Oct 6 16:44:31 2000
From: jeremy@beopen.com (Jeremy Hylton)
Date: Fri, 6 Oct 2000 11:44:31 -0400 (EDT)
Subject: [XML-SIG] Re: SAX exceptions are odd
In-Reply-To: <200010061526.RAA00826@loewis.home.cs.tu-berlin.de>
References: <14812.60316.775448.910249@bitdiddle.concentric.net>
<200010061526.RAA00826@loewis.home.cs.tu-berlin.de>
Message-ID: <14813.62303.243577.289385@bitdiddle.concentric.net>
>>>>> "MvL" == Martin v Loewis writes:
MvL> I'd like to caution again: No matter what string is taken now,
MvL> it will have to stay forever. Other tools will expect that a
MvL> certain Python application formats its XML error messages in a
MvL> certain way, and they will whine if that is ever changed.
MvL> If that consequence is accepted, then it's fine with me to
MvL> change that string...
What tools are there that depend on the string representation of the
exception object raised by a Python library? The proposal is not to
change the API of the exception object, just what gets printed when
the program exits.
It would be bad form for an external program to depend in some way on
the error messages Python prints. There is definitely no guarantee
that they will remain unchanged from version to version.
Jeremy
From noreply@sourceforge.net Fri Oct 6 16:47:06 2000
From: noreply@sourceforge.net (noreply@sourceforge.net)
Date: Fri, 6 Oct 2000 08:47:06 -0700
Subject: [XML-SIG] [Bug #116246] 2.0b2: the Windows installer installs empty .py files
Message-ID: <200010061547.IAA23110@bush.i.sourceforge.net>
Bug #116246, was updated on 2000-Oct-06 08:47
Here is a current snapshot of the bug.
Project: Python/XML
Category: None
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Summary: 2.0b2: the Windows installer installs empty .py files
Details: Running PyXML-0.6.0.win32.exe (on NT4 SP6, Python 2.0b2
installation) appears to run correctly but all of the whatever.py files it places in _xmlplus appear to be empty (zero-length).
For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=116246&group_id=6473
From akuchlin@mems-exchange.org Fri Oct 6 16:50:01 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Fri, 6 Oct 2000 11:50:01 -0400
Subject: [XML-SIG] Re: SAX exceptions are odd
In-Reply-To: <200010061526.RAA00826@loewis.home.cs.tu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Fri, Oct 06, 2000 at 05:26:38PM +0200
References: <14812.60316.775448.910249@bitdiddle.concentric.net> <200010061526.RAA00826@loewis.home.cs.tu-berlin.de>
Message-ID: <20001006115001.A27789@kronos.cnri.reston.va.us>
On Fri, Oct 06, 2000 at 05:26:38PM +0200, Martin v. Loewis wrote:
>have to stay forever. Other tools will expect that a certain Python
>application formats its XML error messages in a certain way, and they
>will whine if that is ever changed.
Shouldn't the exception class have attributes for .filename, .line,
.column, though, which is all an application needs to be concerned
with? In fact I thought SAX exceptions already had this, but perhaps
I'm misremembering.
--amk
From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 16:55:10 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 6 Oct 2000 17:55:10 +0200
Subject: [XML-SIG] SAX exceptions are odd
In-Reply-To: <14813.61229.663642.454479@bitdiddle.concentric.net> (message
from Jeremy Hylton on Fri, 6 Oct 2000 11:26:37 -0400 (EDT))
References: <14812.44322.692362.12640@bitdiddle.concentric.net>
<200010052136.XAA01520@loewis.home.cs.tu-berlin.de> <14813.61229.663642.454479@bitdiddle.concentric.net>
Message-ID: <200010061555.RAA01020@loewis.home.cs.tu-berlin.de>
> MvL> If you did provide a file name, and it got lost somewhere -
> MvL> then that is a bug.
>
> (It looks like you may have missed my second message on this subject.)
> I did pass a filename that was lost.
Actually, part of Germany was cut-off part of the US for the last day,
so some messages got sent later.
> I don't understand what you mean by reproducability. The ability to
> reproduce an error message has nothing to do with whether it is terse
> or verbose.
If people write test suites, then they expact a certain output to
determine that a test failed. Any changes to the format of the output
will break the test case. The more verbose the text is, the more
likely are people to change it from release to release - not
considering that they may break things by changing some error message
formats.
Likewise, I expect that programs will parse the output of Python
programs, and expect a certain formatting. Such programs won't work if
people change strings.
> I liked the suggested error message that Lars proposed a *lot* better.
Reviewing the formats that Emacs' compilation-error-regexp-alist
supports, I found that :::error message is quite
common (and GNU standard), and that it would also recognize an
additional 'program name', so it correctly parses
xml.sax._exceptions.SAXParseException: a.c:10:16:not well-formed
(it does not parse the current text, as the error message precedes the
line information)
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 17:03:06 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 6 Oct 2000 18:03:06 +0200
Subject: [XML-SIG] SAX exceptions are odd
In-Reply-To: <14813.62141.233397.345264@bitdiddle.concentric.net> (message
from Jeremy Hylton on Fri, 6 Oct 2000 11:41:49 -0400 (EDT))
References: <14812.44322.692362.12640@bitdiddle.concentric.net>
<200010052136.XAA01520@loewis.home.cs.tu-berlin.de> <14813.62141.233397.345264@bitdiddle.concentric.net>
Message-ID: <200010061603.SAA01153@loewis.home.cs.tu-berlin.de>
> So the problem is a bogus local configuration for shared libraries.
> On the one hand, I haven't installed Python properly, so I shouldn't
> expect things to work. On the other hand, it would be helpful if
> unexpected exceptions could be reported. Is there any way to provide
> an informative error message in this case?
One idea is that drivers should specifically distinguish between
"expected" import errors and unexpected ones, e.g.
class MissingFeature(ImportError):
pass
Then, drivers should catch ImportError when they expect a failure, and
a plain (unexpected) ImportError would get through (*).
I can try to come up with a patch for that, as this is repeatedly
causing problems.
Regards,
Martin
(*) Actually, we have to separate the case that the driver module
itself does not exist, and that processing it caused an
ImportError. That can be done by looking at sys.modules.
From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 17:07:42 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 6 Oct 2000 18:07:42 +0200
Subject: [XML-SIG] Re: SAX exceptions are odd
In-Reply-To: <14813.62303.243577.289385@bitdiddle.concentric.net> (message
from Jeremy Hylton on Fri, 6 Oct 2000 11:44:31 -0400 (EDT))
References: <14812.60316.775448.910249@bitdiddle.concentric.net>
<200010061526.RAA00826@loewis.home.cs.tu-berlin.de> <14813.62303.243577.289385@bitdiddle.concentric.net>
Message-ID: <200010061607.SAA01157@loewis.home.cs.tu-berlin.de>
> What tools are there that depend on the string representation of the
> exception object raised by a Python library?
None at the moment (although I wish Emacs would recognize these
strings - it would with a slight change).
When Python 2.1 will be released, it is quite possible that tools
might rely on that - at which time it won't be possible anymore to
change the string. At the moment, it still is.
> It would be bad form for an external program to depend in some way on
> the error messages Python prints. There is definitely no guarantee
> that they will remain unchanged from version to version.
The external programs may have no other option - stdin/stdout/sterr is
the typical way of communicating with a program.
Nobody would try to parse a genuine Python traceback, as that often is
a bug in the script. However, SAX exceptions are raised for errors in
the XML, so this is different.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 17:45:13 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 6 Oct 2000 18:45:13 +0200
Subject: [XML-SIG] Re: SAX exceptions are odd
In-Reply-To: <20001006115001.A27789@kronos.cnri.reston.va.us> (message from
Andrew Kuchling on Fri, 6 Oct 2000 11:50:01 -0400)
References: <14812.60316.775448.910249@bitdiddle.concentric.net> <200010061526.RAA00826@loewis.home.cs.tu-berlin.de> <20001006115001.A27789@kronos.cnri.reston.va.us>
Message-ID: <200010061645.SAA01323@loewis.home.cs.tu-berlin.de>
> Shouldn't the exception class have attributes for .filename, .line,
> .column, though, which is all an application needs to be concerned
> with? In fact I thought SAX exceptions already had this, but perhaps
> I'm misremembering.
They certainly do. Discussion is what __str__ should return for them.
Regards,
Martin
From alf@logilab.com Fri Oct 6 22:17:39 2000
From: alf@logilab.com (Alexandre Fayolle)
Date: Fri, 6 Oct 2000 23:17:39 +0200 (CEST)
Subject: [XML-SIG] Problem parsing the xhtml dtd
Message-ID:
Hi,
I'm trying to parse the XHTML dtd () with xmlproc (as of PyXml
0.5.5.1). The python code I use is the following:
from xml.parsers.xmlproc.dtdparser import DTDParser
from xml.parsers.xmlproc.xmldtd import CompleteDTD
parser = DTDParser()
dtd = CompleteDTD(parser)
parser.set_dtd_consumer(dtd)
parser.set_dtd_object(dtd)
parser.parse_resource('xhtml1-strict.dtd')
parser.deref()
I get the following message:
ERROR: xml:space must have exactly the values 'default' and 'preserve' at
xhtml1-strict.dtd:315:47
TEXT: '>
The problem occurs on the following block :
The correction involved modifying the last line of the block:
xml:space (default|preserve) #FIXED "preserve" >
Is this a bug in xmlproc or in the W3C DTD ?
--
Alexandre Fayolle
http://www.logilab.com - "Mais où est donc Ornicar ?" -
LOGILAB, Paris (France).
From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 22:43:08 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 6 Oct 2000 23:43:08 +0200
Subject: [XML-SIG] SAX exceptions are odd
In-Reply-To: <14813.62141.233397.345264@bitdiddle.concentric.net> (message
from Jeremy Hylton on Fri, 6 Oct 2000 11:41:49 -0400 (EDT))
References: <14812.44322.692362.12640@bitdiddle.concentric.net>
<200010052136.XAA01520@loewis.home.cs.tu-berlin.de> <14813.62141.233397.345264@bitdiddle.concentric.net>
Message-ID: <200010062143.XAA27398@loewis.home.cs.tu-berlin.de>
> So the problem is a bogus local configuration for shared libraries.
> On the one hand, I haven't installed Python properly, so I shouldn't
> expect things to work. On the other hand, it would be helpful if
> unexpected exceptions could be reported. Is there any way to provide
> an informative error message in this case?
I've just committed a change that will give you an ImportError in this
case - only a failure to import xml.parsers.expat will be ignored.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 22:44:49 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 6 Oct 2000 23:44:49 +0200
Subject: [XML-SIG] SAX exceptions are odd
In-Reply-To: <14812.44322.692362.12640@bitdiddle.concentric.net> (message from
Jeremy Hylton on Thu, 5 Oct 2000 12:32:34 -0400 (EDT))
References: <14812.44322.692362.12640@bitdiddle.concentric.net>
Message-ID: <200010062144.XAA27414@loewis.home.cs.tu-berlin.de>
> If I call on parse on an empty file, I get no exception. Is this
> desirable? I assume it means that "" is well-formed XML, but that
> doesn't seem like a very helpful definition. Is this right?
I have installed a patch to fix this.
> If I get almost any other exception I get an error message that says
> something like: "not well-formed at None:1:7"
>
> Why is None being printed?
I have also installed a patch to fix that.
> I also think the format is odd.
I did not (and will not) change that, though - somebody else might go
ahead, though.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Fri Oct 6 23:53:17 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 7 Oct 2000 00:53:17 +0200
Subject: [XML-SIG] Problem parsing the xhtml dtd
In-Reply-To: (message
from Alexandre Fayolle on Fri, 6 Oct 2000 23:17:39 +0200 (CEST))
References:
Message-ID: <200010062253.AAA27989@loewis.home.cs.tu-berlin.de>
> Is this a bug in xmlproc or in the W3C DTD ?
XML 1.0 says
# A special attribute named xml:space may be attached to an element to
# signal an intention that in that element, white space should be
# preserved by applications. In valid documents, this attribute, like
# any other, must be declared if it is used. When declared, it must be
# given as an enumerated type whose only possible values are "default"
# and "preserve".
As a non-native speaker of English, that sentence sounds ambiguous to
me: Does it mean that xml:space must have no more, no less than
"default" and "preserve" as possible values, or does it mean it may
have less than these values?
Regards,
Martin
From tpassin@home.com Sat Oct 7 04:29:12 2000
From: tpassin@home.com (tpassin@home.com)
Date: Fri, 6 Oct 2000 23:29:12 -0400
Subject: [XML-SIG] Problem parsing the xhtml dtd
References: <200010062253.AAA27989@loewis.home.cs.tu-berlin.de>
Message-ID: <004501c0300e$c34db8e0$7cac1218@reston1.va.home.com>
Martin v. Loewis asks -
> XML 1.0 says
>
> # A special attribute named xml:space may be attached to an element to
> # signal an intention that in that element, white space should be
> # preserved by applications. In valid documents, this attribute, like
> # any other, must be declared if it is used. When declared, it must be
> # given as an enumerated type whose only possible values are "default"
> # and "preserve".
>
> As a non-native speaker of English, that sentence sounds ambiguous to
> me: Does it mean that xml:space must have no more, no less than
> "default" and "preserve" as possible values, or does it mean it may
> have less than these values?
>
There is an illustration in the Rec, right after the section that Martin
quoted:
To this native speaker of English, the text seems to mean exactly the same
thing as the example does. To arrive at this conclusion, I must take the
text in a very literal (or 'formal') way. A colloquial reading could give
the impression that one of the two values could be omitted.
Cheers,
Tom Passin
From martin@loewis.home.cs.tu-berlin.de Sat Oct 7 07:42:11 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 7 Oct 2000 08:42:11 +0200
Subject: [XML-SIG] Problem parsing the xhtml dtd
In-Reply-To: <004501c0300e$c34db8e0$7cac1218@reston1.va.home.com>
(tpassin@home.com)
References: <200010062253.AAA27989@loewis.home.cs.tu-berlin.de> <004501c0300e$c34db8e0$7cac1218@reston1.va.home.com>
Message-ID: <200010070642.IAA00667@loewis.home.cs.tu-berlin.de>
>
>
> To this native speaker of English, the text seems to mean exactly the same
> thing as the example does. To arrive at this conclusion, I must take the
> text in a very literal (or 'formal') way. A colloquial reading could give
> the impression that one of the two values could be omitted.
Very interesting. That means that the W3C XHTML DTD is ill-formed, and
that xmlproc properly detected that error.
Regards,
Martin
From alf@logilab.com Sat Oct 7 09:59:56 2000
From: alf@logilab.com (Alexandre Fayolle)
Date: Sat, 7 Oct 2000 10:59:56 +0200 (CEST)
Subject: [XML-SIG] Problem parsing the xhtml dtd
In-Reply-To: <200010070642.IAA00667@loewis.home.cs.tu-berlin.de>
Message-ID:
On Sat, 7 Oct 2000, Martin v. Loewis wrote:
> >
> >
> > To this native speaker of English, the text seems to mean exactly the same
> > thing as the example does. To arrive at this conclusion, I must take the
> > text in a very literal (or 'formal') way. A colloquial reading could give
> > the impression that one of the two values could be omitted.
>
> Very interesting. That means that the W3C XHTML DTD is ill-formed, and
> that xmlproc properly detected that error.
Has anyone contacted the person responsible for this DTD ate W3C to inform
them of the problem yet? I guess some people here have better contacts
with W3C members than me ;o)
--
Alexandre Fayolle
http://www.logilab.com - "Mais où est donc Ornicar ?" -
LOGILAB, Paris (France).
From larsga@garshol.priv.no Sat Oct 7 13:51:16 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 07 Oct 2000 14:51:16 +0200
Subject: [XML-SIG] Problem parsing the xhtml dtd
In-Reply-To:
References:
Message-ID:
* Alexandre Fayolle
|
| Is this a bug in xmlproc or in the W3C DTD ?
It's a bug in xmlproc. This is one of several cases where I guessed
wrong about ambiguities in the XML 1.0 spec, according to the second
edition of that spec.
* Martin v. Loewis
|
| XML 1.0 says
|
| # A special attribute named xml:space may be attached to an element to
| # signal an intention that in that element, white space should be
| # preserved by applications. In valid documents, this attribute, like
| # any other, must be declared if it is used. When declared, it must be
| # given as an enumerated type whose only possible values are "default"
| # and "preserve".
|
| As a non-native speaker of English, that sentence sounds ambiguous to
| me: Does it mean that xml:space must have no more, no less than
| "default" and "preserve" as possible values, or does it mean it may
| have less than these values?
I had exactly the same problem as you with this part of the spec when
I implemented this. However, the second edition of the XML
specification has improved this section and is now crystal clear:
# A special attribute named xml:space may be attached to an element to
# signal an intention that in that element, white space should be
# preserved by applications. In valid documents, this attribute, like
# any other, must be declared if it is used. When declared, it must be
# given as an enumerated type whose values are one or both of "default"
# and "preserve". ^^^^^^^^^^^
Once Python 2.0 is out I'm planning to improve xmlproc by
- writing a full-featured SAX 2.0 driver with lots of features and
properties
- updating it to conform to the XML 1.0 2nd edition spec
- adding full Unicode support
The order and timing of these releases is still unclear.
I've fixed this particular problem now in my private CVS tree.
--Lars M.
From MichaelDyck@home.com Sat Oct 7 22:23:53 2000
From: MichaelDyck@home.com (Michael Dyck)
Date: Sat, 07 Oct 2000 14:23:53 -0700
Subject: [XML-SIG] Problem parsing the xhtml dtd
References: <200010062253.AAA27989@loewis.home.cs.tu-berlin.de> <004501c0300e$c34db8e0$7cac1218@reston1.va.home.com>
Message-ID: <39DF9469.76A822ED@home.com>
Martin v. Loewis wrote:
>
> XML 1.0 says
>
> # A special attribute named xml:space may be attached to an element to
> # signal an intention that in that element, white space should be
> # preserved by applications. In valid documents, this attribute, like
> # any other, must be declared if it is used. When declared, it must be
> # given as an enumerated type whose only possible values are "default"
> # and "preserve".
>
> As a non-native speaker of English, that sentence sounds ambiguous to
> me: Does it mean that xml:space must have no more, no less than
> "default" and "preserve" as possible values, or does it mean it may
> have less than these values?
Apparently, it sounded ambiguous to others as well. The errata for
XML 1.0 (http://www.w3.org/XML/xml-19980210-errata#E81) rewords it:
---------------------------
Section 2.10
In the third paragraph, replace the sentence:
When declared, it must be given as an enumerated type whose
only possible values are "default" and "preserve".
with:
When declared, it must be given as an enumerated type whose
values are one or both of "default" and "preserve".
Add an example after the existing one (in the same table):
Rationale
The wording in the spec was ambigous on whether the value of the
xml:space attribute could be limited to one of the two possible
values.
----------------------------
The change has been incorporated into the 2nd edition of XML 1.0
(see http://www.w3.org/TR/2000/REC-xml-20001006#sec-white-space).
-Michael Dyck
From liu@netease.com Sat Oct 7 22:08:00 2000
From: liu@netease.com (liu)
Date: Sun, 08 Oct 2000 05:08:00 +0800
Subject: [XML-SIG] Ô¤²â
Message-ID: <20001007160504.7E7AA1C745A4B@mx1.netease.com>
This is a Multipart MIME message.
------=_ST3201_0001_00DF2B82_01BE5704
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 7bit
Ò»¸öÈËÒ»ÉúµÄÃüÔ˵½µ×ÊÇÓÉʲôÀ´¾ö¶¨µÄ£¿´ð°¸²»ÊÇΨһµÄ£¬Ò»¸öÈ˵ÄÃû×Ö¡¢Éú»î¡¢¹¤×÷»·¾³µÈ¶ÔÒ»¸öÈ˵ÄÃüÔ˶¼ÓÐÖ±½ÓµÄÓ°Ï죬ͬÑù·¿×Ó£¬²»Í¬µÄÖ÷È˾Óס£¬·¿×Ó¶ÔÖ÷È˵ÄÓ°ÏìÊDz»Í¬µÄ£¬ÕâÒª¿´Õâ¸öÈ˵ij¡ÊÇÔõÑùµÄ£¿Ò»¸ö»·¾³ÓÐËû×Ô¼ºµÄ³¡£¬ÄÇôÁ½¸ö³¡ÈçºÎ²ÅÄÜÏàÅäÄØ£¿Õâ²»ÊÇÆÕͨµÄÈË¿ÉÒÔÖªµÀµÄ£¬ÖÜÒ×µÈһЩԤ²âѧ½²¾¿µÄÊǼÆË㣬¶øÓÐһЩÈË£¬ËûÃÇÓÐÌìÉúµÄÌØÒ칦ÄÜ£¬ÄÜ¿´µ½³£ÈË¿´²»µ½
µÄ¶«Î÷£¬ËûÃǶԳ£È˵ÄÖ¸µã£¬ÍùÍù·Ç³£ÓÐÓã¬ÔںܶàÄêÒÔÇ°£¬ÔÚÎÒ¹úµÄºÓ±±Ê¡£¬ÓиöСÄк¢£¬Í»È»ÓÐÒ»ÌìµÃÁËÒ»³¡Öز¡£¬²¡ºÃÖ®ºó£¬ÈËÃǾ³£»áÌýµ½Ð¡Äк¢µÄ¶Ç×ÓÀïÓÐÈËÔÚ˵»°£¬´Ó´Ë£¬ÈËÃÇÖªµÀÕâ¸öСÄк¢µÄ¶Ç×Ó»á˵»°£¬µ«ÊǺóÀ´£¬ÈËÃÇÓÖ·¢ÏÖ£¬Õâ¸öСÄк¢¾ßÓÐÄÜ¿´µ½±ðÈË¿´²»µ½µÄ¶«Î÷µÄÌØÒ칦ÄÜ£¬»¹ÄÜ°ïÈËÖβ¡£¬½ñÌ죬Õâ¸öСÄк¢ÒѾ³¤´óÁË£¬ÏÖÔÚÔڹ㶫£¬ËûÏÖÔÚΪÈËÃÇÌṩ¹«Ë¾»ò¸öÈËÆðÃû×Ö¡¢·çË®¡¢¹ÉƱ¡¢É̱ꡢ¼²²¡µÈµÄÔ¤²â¡£Õ⼸Ä꣬ËûΪºÜ¶àÈË×ö¹ýÔ¤²â£¬×¼È·Âʷdz£µÄ¸ß£¬ËûûÓÐÄÄÃÅÄÄÅÉ£¬ÍêÈ«¿¿×Ô¼ºµÄÌØÒ칦ÄÜ¡£Èç¹ûÄúÏë׼ȷͶ×Ê¡¢¸ÄÉƲ»ÀûÐÎÊƵȣ¬ÇëÓëÎÒÃÇÁªÏµ£¬ÎÒÃǽ«ÎªÄúÌṩ×î׼ȷµÄÒâ¼û¡£
ÁªÏµµç»°£º0757-2252618
ÁªÏµÈË£ºÁõС½ã »ò Áé¸ë
ÁªÏµÇëÓõ绰£¬²»ÒªÊ¹Óõç×ÓÓʼþ
------=_ST3201_0001_00DF2B82_01BE5704
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="Ðû´«ÐÅ.txt"
Content-Length: 1342
0ru49sjL0rvJ+rXEw/zUy7W9tdfKx9PJyrLDtMC0vva2qLXEo7+08LC4srvKx86o0ru1xKOs
0ru49sjLtcTD+9fWoaLJ+rvuoaK5pNf3u7e+s7XIttTSu7j2yMu1xMP81Mu2vNPQ1rG907XE
07DP7KOszazR+be/19OjrLK7zay1xNb3yMu+09eho6y3v9fTttTW98jLtcTTsM/syseyu82s
tcSjrNXi0qq/tNXiuPbIy7XEs6HKx9T10fm1xKO/0ru49ru3vrPT0Mv719S8urXEs6GjrMTH
w7TBvbj2s6HI57rOssXE3M/gxeTE2KO/1eKyu8rHxtXNqLXEyMu/ydLU1qq1wLXEo6zW3NLX
tcjSu9Cp1KSy4tGnvbK+v7XEyse8xsvjo6y2+NPQ0rvQqcjLo6zL+8PH09DM7Mn6tcTM2NLs
uabE3KOsxNy/tLW9s6PIy7+0sru1vSC1xLarzvejrMv7w8e21LOjyMu1xNa4teOjrM35zfm3
x7Oj09DTw6Os1Nq63LbgxOrS1Mewo6zU2s7Sufq1xLrTsbHKoaOs09C49tChxNC6oqOszbvI
u9PQ0rvM7LXDwcvSu7Oh1tiyoaOssqG6w9auuvOjrMjLw8e+rbOju+HM/bW90KHE0LqitcS2
x9fTwO/T0MjL1NrLtbuwo6y007TLo6zIy8PH1qq1wNXiuPbQocTQuqK1xLbH19O74cu1u7Cj
rLWryse688C0o6zIy8PH09a3os/Wo6zV4rj20KHE0Lqivt/T0MTcv7S1vbHwyMu/tLK7tb21
xLarzve1xMzY0uy5psTco6y7ucTcsO/Iy9bOsqGjrL3xzOyjrNXiuPbQocTQuqLS0b6ts6S0
88HLo6zP1tTa1Nq547aro6zL+8/W1NrOqsjLw8fM4bmpuavLvrvyuPbIy8bww/vX1qGit+fL
rqGiucnGsaGiycyx6qGivLKyobXItcTUpLLioaPV4ry4xOqjrMv7zqq63LbgyMvX9rn91KSy
4qOs17zIt8LKt8ezo7XEuN+jrMv7w7vT0MTEw8XExMXJo6zN6sirv7/X1Ly6tcTM2NLsuabE
3KGjyOe5+8T6z+vXvMi3zbbXyqGiuMTJxrK7wPvQzsrGtcijrMfr0+vO0sPHwarPtaOsztLD
x72rzqrE+szhuanX7te8yLe1xNLivPuhow0KDQrBqs+1tee7sKO6MDc1Ny0yMjUyNjE4ICAg
DQoNCsGqz7XIy6O6wfXQob3jILvyIMHpuOsNCsGqz7XH69PDtee7sKOssrvSqsq508O159fT
08q8/g==
------=_ST3201_0001_00DF2B82_01BE5704--
From martin@loewis.home.cs.tu-berlin.de Sun Oct 8 09:33:35 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 8 Oct 2000 10:33:35 +0200
Subject: [XML-SIG] 4DOM bugs
Message-ID: <200010080833.KAA00815@loewis.home.cs.tu-berlin.de>
While porting PyXML's test_dom to 4DOM, I noticed a number of
problems, which I believe are bugs in 4DOM. Consider
from xml.dom import implementation
doc = implementation.createDocument(None,None,None)
n1 = doc.createElement('n1') ; n2 = doc.createElement('n2')
pi = doc.createProcessingInstruction("Processing", "Instruction")
doc.appendChild(pi)
doc.appendChild(n1)
#doc.appendChild(n1) # fails, but shouldn't
doc.replaceChild(n2, n1)
doc.replaceChild(pi, n2)
print doc.documentElement
The line "doc.appendChild(n1)" raises a hierarchy exception, as n1 is
already in the tree. However, this is incorrect: it should first
remove n1, then reinsert it.
The second fragment does not cause an exception. However, in the end,
the "documentElement" of the document is a processing
instruction. That is very strange - it should always be an element.
I've been using the 4DOM version that is currently in the PyXML CVS.
Regards,
Martin
From loewis@informatik.hu-berlin.de Sun Oct 8 15:57:35 2000
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Sun, 8 Oct 2000 16:57:35 +0200 (MET DST)
Subject: [XML-SIG] PyXML 0.6.1 release
Message-ID: <200010081457.QAA09543@pandora.informatik.hu-berlin.de>
Version 0.6.1 of the Python/XML distribution is now available. It
should be considered a beta release, and can be downloaded from
the following URLs:
http://download.sourceforge.net/pyxml/PyXML-0.6.1.tar.gz
http://download.sourceforge.net/pyxml/PyXML-0.6.1.win32-py1.5.exe
http://download.sourceforge.net/pyxml/PyXML-0.6.1.win32-py2.0.exe
http://download.sourceforge.net/pyxml/PyXML-0.6.1-1.5.2.i386.rpm
http://download.sourceforge.net/pyxml/PyXML-0.6.1-2.0b2.i386.rpm
Changes in this version, compared to 0.6.0:
* Support for Python 1.5.2 was restored, as long as no character
set recoding is required
* The 4DOM package was updated.
* Most of the test suite now passes again.
* The tutorial was updated.
Changes of version 0.6.0, compared to 0.5.x:
* The 4DOM package has been integrated into PyXML.
* The package supports now SAX2 interfaces in addition to the
SAX1 interfaces. Currently, pyexpat and xmlproc can serve as SAX2
drivers.
* The proprietary Unicode type has been removed. Instead,
PyXML now relies on the standard Python Unicode type. In turn, PyXML
0.6.0 will not work with Python 1.5. It has been tested with 2.0b1.
* PyXML now operates on top of the XML package coming in
Python 2.
The Python/XML distribution contains the basic tools required for
processing XML data using the Python programming language, assembled
into one easy-to-install package. The distribution includes parsers
and standard interfaces such as SAX and DOM, along with various other
useful modules.
The package currently contains:
* XML parsers: Pyexpat (Jack Jansen), xmlproc (Lars Marius
Garshol), xmllib.py (Sjoerd Mullender) using the sgmlop.c accelerator
module (Fredrik Lundh).
* SAX interface (Lars Marius Garshol)
* DOM interface (Stefane Fermigier, A.M. Kuchling)
* 4DOM interface from Fourthought (Uche Ogbuji, Mike Olson)
* xmlarch.py, for architectural forms processing (Geir Ove Grønmo)
* Various utility modules and functions (various people)
* Documentation and example programs (various people)
The code is being developed bazaar-style by contributors from the
Python XML Special Interest Group, so please send comments, questions,
or bug reports to .
For more information about Python and XML, see:
http://www.python.org/topics/xml/
--
Martin v. Löwis http://www.informatik.hu-berlin.de/~loewis
From el@buch.biblio.etc.tu-bs.de Mon Oct 9 11:33:46 2000
From: el@buch.biblio.etc.tu-bs.de (Christian Ellguth)
Date: Mon, 9 Oct 2000 12:33:46 +0200
Subject: [XML-SIG] Parsers and their behaviours
Message-ID: <00100912334600.27248@cellguth>
Is there any documentation on the various XML-parsers and their capabilit=
ies ?
If yes, where can I find it !
I'am using the
drv_xmlproc.SAX_XPParser
parser.=20
I am a newbie to the Python/XML Library and have tried to understand the=20
symple_appl.py script from Simon Pepping.
If the script encounters the numeric representation of a german umlaut it=
=20
diplays the umlaut in the correct way but the parser adds an additional \=
n to=20
the character but continues parsing the XML-File.
If I use the drv_xmlproc.SAX_XPParser in my script the parser continues t=
o=20
parse the file but all strings containing numerical representations of=20
entities are shortened after the entity and the rest of the string is los=
t.
Thank you for your replies
Christian
--=20
Universitaetsbibliothek Braunschweig
Christian Ellguth
Pockelsstr. 13
38106 Braunschweig
From larsga@garshol.priv.no Mon Oct 9 11:43:14 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 09 Oct 2000 12:43:14 +0200
Subject: [XML-SIG] Parsers and their behaviours
In-Reply-To: <00100912334600.27248@cellguth>
References: <00100912334600.27248@cellguth>
Message-ID:
* Christian Ellguth
|
| Is there any documentation on the various XML-parsers and their
| capabilities ?
No, not really. It would be nice to produce this as part of the SAX
2.0 effort, but I'm afraid that will take some time.
| If I use the drv_xmlproc.SAX_XPParser in my script the parser
| continues to parse the file but all strings containing numerical
| representations of entities are shortened after the entity and the
| rest of the string is lost.
Most likely this is a bug in your script. SAX allows parsers to call
the characters() method more than once for a single block of character
data and character references (and entity references) are just the
sort of thing that will cause a parser to call it more than once.
So most likely your script does not handle this case correctly.
--Lars M.
From martin@loewis.home.cs.tu-berlin.de Mon Oct 9 20:06:00 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 9 Oct 2000 21:06:00 +0200
Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests
In-Reply-To: <20000929180212.C20008@kronos.cnri.reston.va.us> (message from
Andrew Kuchling on Fri, 29 Sep 2000 18:02:12 -0400)
References: <200009292122.XAA01777@loewis.home.cs.tu-berlin.de> <20000929180212.C20008@kronos.cnri.reston.va.us>
Message-ID: <200010091906.VAA00791@loewis.home.cs.tu-berlin.de>
> I vaguely recall that someone at FourThought once asked me if that
> would be OK, but don't know if anyone actually did it. It would be
> a good idea to port them, since they made some attempt at being
> exhaustive (trying the various error cases, etc.).
I've started doing that (test_dom.py), but found that 4DOM simply
won't pass the tests. I'd appreciate if somebody could look at the
current failures in the code, and tell me whether the test case is
overly strict (or simply broken), or whether there are genuine bugs in
4DOM.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Mon Oct 9 20:11:46 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 9 Oct 2000 21:11:46 +0200
Subject: [XML-SIG] PyXML 0.6.1 status, demos, tests
In-Reply-To: <39D79954.342F7478@FourThought.com> (message from Mike Olson on
Sun, 01 Oct 2000 14:06:44 -0600)
References: <200009300152.TAA12572@localhost.localdomain> <39D79954.342F7478@FourThought.com>
Message-ID: <200010091911.VAA00840@loewis.home.cs.tu-berlin.de>
> All of the traceout stuff has been removed. There still is the problem
> of our traceout library. I suppose we can install it to 2 locations so
> that xml.dom is not dependent on Ft. I'll work on that today so it
> should be in the next snapshot.
Thanks! I hope I've properly updated the MANIFEST.in so that
everything that should get shipped actually is - I'd appreciate if you
could verify this based on the PyXML 0.6.1 tar file.
Also, what is the status of xml/dom/html/test_suite? Running test.py
in that directory gives
Traceback (most recent call last):
File "test.py", line 76, in ?
test(fileList)
File "test.py", line 69, in test
_mod.test();
File "test_element.py", line 20, in test
e._set_ID('1');
File "/usr/local/lib/python2.0/site-packages/_xmlplus/dom/Node.py", line 84, in __getattr__
return getattr(Node, name)
AttributeError: _set_ID
Is that a known problem?
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Mon Oct 9 21:50:49 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 9 Oct 2000 22:50:49 +0200
Subject: [XML-SIG] documentation
In-Reply-To: <14791.56264.4267.959626@cj42289-a.reston1.va.home.com>
(fdrake@beopen.com)
References: <14791.56264.4267.959626@cj42289-a.reston1.va.home.com>
Message-ID: <200010092050.WAA01029@loewis.home.cs.tu-berlin.de>
> I'm starting documentation for the xml.sax package. We already have
> the material that's part of the PyXML package, and I'm currently
> working on the xml.sax package module itself. If anyone else would
> like to take a portion of the documentation to work on, I'd certainly
> appreciate some help!
Hi Fred,
I'm currently working on the major body of the SAX interfaces, mostly
by extracting the doc strings into TeX.
Regards,
Martin
From fdrake@beopen.com Mon Oct 9 21:55:58 2000
From: fdrake@beopen.com (Fred L. Drake, Jr.)
Date: Mon, 9 Oct 2000 16:55:58 -0400 (EDT)
Subject: [XML-SIG] documentation
In-Reply-To: <200010092050.WAA01029@loewis.home.cs.tu-berlin.de>
References: <14791.56264.4267.959626@cj42289-a.reston1.va.home.com>
<200010092050.WAA01029@loewis.home.cs.tu-berlin.de>
Message-ID: <14818.12510.721242.680124@cj42289-a.reston1.va.home.com>
[CC'd to the Doc-SIG as well.]
Martin v. Loewis writes:
> I'm currently working on the major body of the SAX interfaces, mostly
> by extracting the doc strings into TeX.
Great, since I was planning to work on them this week! I'll merge
some of your xml.dom documentation with some that Paul sent directly
to me, and get that in later this week. (Hopefully by Wed.)
I've told Paul that I'd really like to have everything that will be
in the final release in by Thursday, and will have a doc freeze on
Friday.
-Fred
--
Fred L. Drake, Jr.
BeOpen PythonLabs Team Member
From fdrake@beopen.com Mon Oct 9 22:18:30 2000
From: fdrake@beopen.com (Fred L. Drake, Jr.)
Date: Mon, 9 Oct 2000 17:18:30 -0400 (EDT)
Subject: [XML-SIG] test cases
Message-ID: <14818.13862.787442.476097@cj42289-a.reston1.va.home.com>
I just commented to Jeremy that I'd like to see more test cases for
the XML package. I don't know how much time there will be for that
this week, but if anyone has time to create some good tests, I'd
certainly be interested in getting more tests into the regression
test. I've definately not had enough time to write test cases; there
are a lot I'd like to add and extend across the standard library as a
whole. We won't be able to get more tests into 2.0, but we can extend
the tests in PyXML and Python 2.1.
-Fred
--
Fred L. Drake, Jr.
BeOpen PythonLabs Team Member
From martin@loewis.home.cs.tu-berlin.de Mon Oct 9 23:31:33 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 10 Oct 2000 00:31:33 +0200
Subject: [XML-SIG] test cases
In-Reply-To: <14818.13862.787442.476097@cj42289-a.reston1.va.home.com>
(fdrake@beopen.com)
References: <14818.13862.787442.476097@cj42289-a.reston1.va.home.com>
Message-ID: <200010092231.AAA05930@loewis.home.cs.tu-berlin.de>
> I just commented to Jeremy that I'd like to see more test cases for
> the XML package. I don't know how much time there will be for that
> this week
I'll be out of town for the rest of the week, so I won't be able to do
anything more...
I don't feel that things are too bad - most of the problems we've seen
are "border cases" (e.g. reporting errors on empty files, properly
setting all attributes in minidom which is a DOM subset anyway); I'd
hope that the core functionality is working.
Still, volunteers for writing new test cases would certainly be
welcome - even reporting things that you feel are not right would
help; others might then fix the problem and submit a test case.
Regards,
Martin
From clarence@netlojix.com Tue Oct 10 22:44:55 2000
From: clarence@netlojix.com (Clarence Gardner)
Date: Tue, 10 Oct 2000 14:44:55 -0700
Subject: [XML-SIG] Moving DOM node hierarchies
Message-ID: <20001010144455.B12546@liberty.sba2.netlojix.net>
I have a program in which a bunch of unrelated functions return a
node hierarchy to a central function, which then wants to package them
all up into a document and send it on its way. Unfortunately, I'm
getting WRONG_DOCUMENT errors when I do this, because each of the trees
was generated using a throw-away document. It would be obnoxious to have
to create the result document first and then provide it everywhere that
a subtree is created.
Does this seem like a strange way to be going about things? It seemed
quite reasonable to me. In fact, I don't understand the reasoning behind
the only-append-in-the-creating-document restriction. Can anyone shed
light on this?
Thanks.
--
Clarence Gardner
Software Engineer
NetLojix Communications
clarence@netlojix.com
From jsydik@BINARY.NET Wed Oct 11 00:39:17 2000
From: jsydik@BINARY.NET (Jeremy J. Sydik)
Date: Tue, 10 Oct 2000 18:39:17 -0500
Subject: [XML-SIG] Moving DOM node hierarchies
References: <20001010144455.B12546@liberty.sba2.netlojix.net>
Message-ID: <39E3A8A5.F58BD069@BINARY.NET>
I had a similar problem a while back, but that was before the XML-SIG/4DOM
integration. It would be helpful to see the code that is failing or a test
case that shows the same error. In the meantime, here are the sig archive
messages related to my problems:
http://www.python.org/pipermail/xml-sig/2000-March/003656.html
http://www.python.org/pipermail/xml-sig/2000-March/003668.html
http://www.python.org/pipermail/xml-sig/2000-April/003747.html
http://www.python.org/pipermail/xml-sig/2000-April/003748.html
Clarence Gardner wrote:
>
> I have a program in which a bunch of unrelated functions return a
> node hierarchy to a central function, which then wants to package them
> all up into a document and send it on its way. Unfortunately, I'm
> getting WRONG_DOCUMENT errors when I do this, because each of the trees
> was generated using a throw-away document. It would be obnoxious to have
> to create the result document first and then provide it everywhere that
> a subtree is created.
>
> Does this seem like a strange way to be going about things? It seemed
> quite reasonable to me. In fact, I don't understand the reasoning behind
> the only-append-in-the-creating-document restriction. Can anyone shed
> light on this?
>
> Thanks.
>
> --
> Clarence Gardner
> Software Engineer
> NetLojix Communications
> clarence@netlojix.com
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
From clarence@netlojix.com Wed Oct 11 02:19:53 2000
From: clarence@netlojix.com (Clarence Gardner)
Date: Tue, 10 Oct 2000 18:19:53 -0700
Subject: [XML-SIG] Moving DOM node hierarchies
In-Reply-To: <39E3A8A5.F58BD069@BINARY.NET>
References: <20001010144455.B12546@liberty.sba2.netlojix.net> <39E3A8A5.F58BD069@BINARY.NET>
Message-ID: <20001010181953.C12546@liberty.sba2.netlojix.net>
Thanks. I read the references, but my posting was more oriented toward
why the insertion was disallowed in the first place. Of course, the
people writing the specs put a lot more thought into these things than
I do, but I would think this would come up quite often and they might
have rationalized it. I was particularly amazed, after coming across
this, that DocumentFragments couldn't even be inserted. I see in the
spec for Level 2 references to how default attributes are handled between
the two documents, which is certainly an issue. I guess maybe this was
in fact just overlooked the first time. Oh well.
On Tue, Oct 10, 2000 at 06:39:17PM -0500, Jeremy J. Sydik wrote:
> I had a similar problem a while back, but that was before the XML-SIG/4DOM
> integration. It would be helpful to see the code that is failing or a test
> case that shows the same error. In the meantime, here are the sig archive
> messages related to my problems:
>
> http://www.python.org/pipermail/xml-sig/2000-March/003656.html
> http://www.python.org/pipermail/xml-sig/2000-March/003668.html
>
> http://www.python.org/pipermail/xml-sig/2000-April/003747.html
> http://www.python.org/pipermail/xml-sig/2000-April/003748.html
>
> Clarence Gardner wrote:
> >
> > I have a program in which a bunch of unrelated functions return a
> > node hierarchy to a central function, which then wants to package them
> > all up into a document and send it on its way. Unfortunately, I'm
> > getting WRONG_DOCUMENT errors when I do this, because each of the trees
> > was generated using a throw-away document. It would be obnoxious to have
> > to create the result document first and then provide it everywhere that
> > a subtree is created.
> >
> > Does this seem like a strange way to be going about things? It seemed
> > quite reasonable to me. In fact, I don't understand the reasoning behind
> > the only-append-in-the-creating-document restriction. Can anyone shed
> > light on this?
> >
> > Thanks.
> >
> > --
> > Clarence Gardner
> > Software Engineer
> > NetLojix Communications
> > clarence@netlojix.com
> >
> > _______________________________________________
> > XML-SIG maillist - XML-SIG@python.org
> > http://www.python.org/mailman/listinfo/xml-sig
--
Clarence Gardner
Software Engineer
NetLojix Communications
clarence@netlojix.com
From alf@logilab.com Wed Oct 11 11:38:15 2000
From: alf@logilab.com (Alexandre Fayolle)
Date: Wed, 11 Oct 2000 12:38:15 +0200 (CEST)
Subject: [XML-SIG] Moving DOM node hierarchies
In-Reply-To: <20001010181953.C12546@liberty.sba2.netlojix.net>
Message-ID:
On Tue, 10 Oct 2000, Clarence Gardner wrote:
>
> Thanks. I read the references, but my posting was more oriented toward
> why the insertion was disallowed in the first place. Of course, the
> people writing the specs put a lot more thought into these things than
> I do, but I would think this would come up quite often and they might
> have rationalized it. I was particularly amazed, after coming across
> this, that DocumentFragments couldn't even be inserted.
They can be inserted. The DOM core spec says:
insertBefore :
Inserts the node newChild before the existing child node refChild. If
refChild is null, insert newChild at the end of the list of children. If
newChild is a DocumentFragment object, all of its children are inserted,
in the same order, before refChild. If the newChild is already in the
tree, it is first removed.
DOM3 should provide facilities for moving nodes across documents. In the
meanwhile, you have to use Document.importNode(node,deep_copy=true) before
inserting the new copy in the tree. If you really want to mimick DOM2
behaviour, you also have to manually remove the original node from the
first document.
So this gives something like
def appendFromOtherDoc(node1,node2):
imported = node2.ownerDocument.importNode(node1,1)
node2.appendChild(imported)
node1.parent.removeNode(node1)
# optionnally with 4DOM, you may want to remove circular references
from xml.dom.ext import ReleaseNode
ReleaseNode(node1)
Notice that this does not answer your question on why this is done the way
it is done. I'm sometimes as baffled as you are. My biggest grudge is Why
non qualified attributes do not inherit the element default namespace ?
--
Alexandre Fayolle
http://www.logilab.com - "Mais où est donc Ornicar ?" -
LOGILAB, Paris (France).
From pblanchette@pixelsystems.com Wed Oct 11 16:02:07 2000
From: pblanchette@pixelsystems.com (Patrick Blanchette)
Date: Wed, 11 Oct 2000 11:02:07 -0400
Subject: [XML-SIG] Generating XML documents
Message-ID: <39E480EF.5C56215E@pixelsystems.com>
This is a multi-part message in MIME format.
--------------3EDB20303C3E34468D29EB7A
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Hi,
I'm a newbie in xml. I want to generate new xml documents using
python code. In the HOWTO doc, there is a "xml.dom.builder" class but
this class did not seem to be part of the PyXML 0.6.1.
Where can I found a python base class for generating xml documents?
--------------3EDB20303C3E34468D29EB7A
Content-Type: text/x-vcard; charset=us-ascii;
name="pblanchette.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Patrick Blanchette
Content-Disposition: attachment;
filename="pblanchette.vcf"
begin:vcard
n:Blanchette;Patrick
x-mozilla-html:FALSE
org:Pixel Systems Inc.;Display team
adr:;;;;;;
version:2.1
email;internet:pblanchette@pixelsystems.com
title:Software developper
fn:Patrick Blanchette
end:vcard
--------------3EDB20303C3E34468D29EB7A--
From jeremy.kloth@fourthought.com Wed Oct 11 18:40:41 2000
From: jeremy.kloth@fourthought.com (Jeremy Kloth)
Date: Wed, 11 Oct 2000 11:40:41 -0600
Subject: [XML-SIG] Re: [4suite] 4DOM bugs
References: <200010080833.KAA00815@loewis.home.cs.tu-berlin.de>
Message-ID: <39E4A619.281CBE77@fourthought.com>
"Martin v. Loewis" wrote:
>
> While porting PyXML's test_dom to 4DOM, I noticed a number of
> problems, which I believe are bugs in 4DOM. Consider
>
> from xml.dom import implementation
>
> doc = implementation.createDocument(None,None,None)
>
> n1 = doc.createElement('n1') ; n2 = doc.createElement('n2')
> pi = doc.createProcessingInstruction("Processing", "Instruction")
> doc.appendChild(pi)
> doc.appendChild(n1)
>
> #doc.appendChild(n1) # fails, but shouldn't
> doc.replaceChild(n2, n1)
> doc.replaceChild(pi, n2)
> print doc.documentElement
>
> The line "doc.appendChild(n1)" raises a hierarchy exception, as n1 is
> already in the tree. However, this is incorrect: it should first
> remove n1, then reinsert it.
>
> The second fragment does not cause an exception. However, in the end,
> the "documentElement" of the document is a processing
> instruction. That is very strange - it should always be an element.
>
> I've been using the 4DOM version that is currently in the PyXML CVS.
>
We do remove the child first for regular elements, but apparently didn't
propagate the change into the code for modifing elements in the Document.
We'll get this fixed up and checked into the PyXML CVS as soon as possible.
--
Jeremy Kloth Consultant
jeremy.kloth@fourthought.com (303)583-9900 x 105
Fourthought, Inc. http://www.fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From bwiegert@learningbyte.com Wed Oct 11 19:55:15 2000
From: bwiegert@learningbyte.com (Ben Wiegert)
Date: Wed, 11 Oct 2000 13:55:15 -0500
Subject: [XML-SIG] Getting DOCTYPE information using SAX
Message-ID: <10227D9D561DD31181E100A0C9655237012E4E11@gofastc3h.gofast.net>
I am a Python newbie. I have gotten my code to read in and parse XML using
SAXLIB from PyXML. I can also manipulate what I read in and output it to
XML. The only thing that I can not seem to grab is the DOCTYPE line (or the
XML header Line, but I am mainly concerned with the DOCTYPE). I need to
specify my DTD in the outbound XML file? Is there an event in SAX that
allows me to get that info? Any help appreciated
Ben
From BestFriend@twcny.rr.com Wed Oct 11 13:06:39 2000
From: BestFriend@twcny.rr.com (BestFriend@twcny.rr.com)
Date: Wed, 11 Oct 2000 12:06:39
Subject: [XML-SIG] WHAT CAN YOU GET FOR $20???
Message-ID: <701.490608.793281@twcny.rr.com>
What can you get for $20.00?
A pizza A tank of gas A haircut
Lunch with a friend A parking place
How About
FINANCIAL INDEPENDENCE!!!!
Looking for that extra something, to help your life have that
little extra
comfort?
Do you work to cover the bills? Fed up with paying out and not
receiving the
rewards you wish for? Then have an open mind And read all of
this,
before you
make a decision- it will be worth your while.
_______________________________________________
Subject: MUST READ! ! ! ... TV Advertised! ! ! ...
Fun-Lucrative
Fellow Entrepreneur
If you wish to learn about an exceptional
opportunity in the Home Business arena...Read On.
"Your living is determined not so much by what life brings to
you as by the attitude you bring to life; not so much by what
happens to you as by the way your mind looks at what happens."
This is going to be a great New Year for you!
Please read all of this!
EARN $100,000 PER YEAR SENDING E-MAIL!!!
****************************************************************
You can earn $50,000 or more in the next 90 days sending e-mail,
seem impossible? Read on for details (no, there is no
'catch')...
----------------------------------------------------------------
"AS SEEN ON NATIONAL TV"
Thank you for your time and Interest. This is the letter you've
been hearing about in the news lately.
Due to the popularity of this letter on the internet, a major
nightly news program recently devoted an entire show to the
investigation of the program, described below, to see if it
really can make people money.
The show also investigated whether or not the program was legal.
Their findings proved once and for all that there are,
absolutely no laws prohibiting the participation in the program.
This has helped to show people that this is a simple, harmless
and fun way to make some extra money at home.
The results of this show have been truly remarkable. Since so
many people are participating now, those involved are doing much
better than ever before. Everyone makes more as more people try
it out. It is very, very exciting to be a part of this plan. You
will understand once you experience it.
"HERE IT IS, BELOW"
================================================
================================================
*** Print This Now For Future Reference ***
The following income opportunity is one you may be interested in
taking a look at. It can be started with VERY LITTLE investment
and the income return is TREMENDOUS!!!
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
If you would like to make at least $50,000 in less than 90 days!
Please read the enclosed program...THEN READ IT AGAIN!!!
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
THIS IS A LEGITIMATE, LEGAL, MONEYMAKING OPPORTUNITY. It does
not require you to come into contact with people, do any hard
work and best of all, you never have to leave the house except
to get the mail. If you believe that someday you'll get that big
break that you've been waiting for, THIS IS IT! Simply follow
the instructions, and your dreams will come true. This e-mail
marketing program works perfectly...100%, EVERY TIME. E-mail is
the sales tool of the future. Take advantage of this non-
commercialized method of advertising NOW!!! The longer you wait,
the more people will be doing business using e-mail. Get your
piece of this program now!
MULTI-LEVEL MARKETING (MLM) has finally gained respectability.
It is being taught in the Harvard Business School, both Stanford
Research and the Wall Street Journal have stated that between
50% and 65% of all goods and services will be sold through
multi-level methods by the late 1990's. This is a Multi-Billion
Dollar industry and of the 500,000 millionaires in the U.S., 20%
(100,000) made their fortune in the last few years in MLM.
Moreover, statistics show 45 people become millionaires everyday
through Multi-Level Marketing.
You may have heard this story before, but over the summer Donald
Trump made an appearance on the David Letterman Show. Dave asked
him what he would do if he lost everything and had to start over
from scratch. Without hesitating, Trump said he would find a
good network marketing company and get to work. The audience
started to hoot and boo him. He looked out at the audience and
dead-panned his response - "That's why I'm sitting up here and
you are all sitting out there!"
With network marketing you have two sources of income. Direct
commissions from sales you make yourself and commissions from
sales made by people you introduce to the business.
Residual income is the secret of the wealthy. It means investing
time or money once and getting paid again and again and again.
In network marketing, it also means getting paid for the work of
others.
The enclosed information is something I almost let slip through
my fingers. Fortunately, sometime later I re-read everything and
gave some thought and study to it.
My name is Ellie Gilbert. Two years ago, the corporation I
worked for, the past twelve years, down-sized and my position
was eliminated.
After many unproductive job interviews, I decided to open my own
business. Over the past year,
I incurred many unforeseen financial problems. I owed my family,
friends and creditors over $40,000... I just couldn't seem to
make ends meet. I had to refinance and borrow against my home to
support my family and struggling business. AT THAT MOMENT
something significant happened in my life and I am writing to
share the experience in hopes that this will change your life,
FINANCIALLY, FOREVER!!!
In mid December, I received this program via e-mail. Six month's
prior to receiving this program I had been sending away for
information on various business opportunities. All of the
programs I received, in my opinion, were not cost effective.
They were either too difficult for me to comprehend or the
initial investment was too much for me to risk to see if they
would work or not. One claimed that I would make a million
dollars in one year...it didn't tell me I'd have to write a best
selling book to make it!
But, as I was saying, in December of 1997 I received this
program. I didn't send for it, or ask for it, they just got my
name off a mailing list. THANK GOODNESS FOR THAT! After reading
it several times, to make sure I was reading it correctly, I
couldn't believe my eyes. Here was a MONEY MAKING PHENOMENON. I
could invest as much as I wanted to start, without putting me
further into debt. After I got a pencil and paper and figured it
out, I would at least get my money back. But like most of you I
was still a little skeptical and a little worried about the
legal aspects of it all. So I checked it out with the U.S. Post
Office (1-800-725-2161 24-hrs) and they confirmed that it is
indeed legal! After determining the program was LEGAL and NOT A
CHAIN LETTER, I decided "WHY NOT."
Initially I sent out 10,000 e-mails. The great thing about e-
mail is that I don't need any money for printing to send out the
program, and because all of my orders are fulfilled via e-mail,
the only expense is my time. I'm telling you as it is, I hope it
doesn't turn you off, but I promised myself that I would not
"rip-off" anyone, no matter how much money it cost me.
In less than one week, I was starting to receive orders for
REPORT #1. By January 13, I had received 26 orders for REPORT
#1. Your goal is to "RECEIVE at least 20 ORDERS FOR REPORT #1
WITHIN 2 WEEKS. If you don't, SEND OUT MORE PROGRAMS UNTIL YOU
DO!" My first step in making $50,000 in 90 days was done. By
January 30, I had received 196 orders for REPORT #2. Your goal
is to "RECEIVE AT LEAST 100+ ORDERS FOR REPORT #2 WITHIN 2
WEEKS. IF NOT, SEND OUT MORE PROGRAMS UNTIL YOU DO. ONCE YOU
HAVE 100 ORDERS, THE REST IS EASY, RELAX, YOU WILL MAKE YOUR
$50,000 GOAL." Well, I had 196 orders for REPORT #2, 96 more
than I needed. So I sat back and relaxed. By March 1, of my e-
mailing of 10,000, I received $58,000 with more coming in every
day.
I paid off ALL my debts and bought a much needed new car. Please
take time to read the attached program, IT WILL CHANGE YOUR LIFE
FOREVER! Remember, it won't work if you don't try it. This
program does work, but you must follow it EXACTLY! Especially
the rules of not trying to place your name in a different
place. It won't work, you'll lose out on a lot of money! In
order for this program to work, you must meet your goal of 20+
orders for REPORT #1, and 100+ orders for REPORT #2 and you will
make $50,000 or more in 90 days. I AM LIVING PROOF THAT IT
WORKS!
If you choose not to participate in this program, I am sorry. It
really is a great opportunity with little cost or risk to you.
If you choose to participate, follow the program and you will be
on your way to financial security.
If you are a business owner and in financial trouble, as I was,
or you want to start your own business, consider this a good
luck sign. I DID!
Sincerely,
Ellie Gilbert
P.S. Do you have any idea what $58,000 looks like piled up on a
kitchen table? IT'S AWESOME!
A PERSONAL NOTE FROM THE ORIGINATOR OF THIS PROGRAM:
By the time you have read the enclosed program and reports you
should have concluded that such a program, one that is legal,
could not have been created by an amateur.
Let me tell you a little about myself. I had a profitable
business for 10 years. Then in 1979 my business began falling
off. I was doing the same things that were previously successful
for me, but it wasn't working. Finally, I figured it out. It
wasn't me, it was the economy. Inflation and recession had
replaced the stable economy that had been with us since 1945. I
don't have to tell you what happened to the unemployment rate...
because many of you know from first hand experience. There were
more failures and bankruptcies than ever before.
The middle class was vanishing. Those who knew what they were
doing invested wisely and moved up. Those who did not,
including those who never had anything to save or invest, were
moving down into the ranks of the poor. As the saying goes,
"THE RICH GET RICHER AND THE POOR GET POORER."
The traditional methods of making money will never allow you to
"move up"
or
"get rich".
You have just received information that can give you financial
freedom for the rest of your life, with "NO RISK" and "JUST A
LITTLE BIT OF EFFORT." You can make more money in the next few
months than you have ever imagined. I should also point out
that I will not see a penny of this money, nor anyone else who
has provided a testimonial for this program. I have already made
over 4 MILLION DOLLARS! I have retired from the program after
sending out over 16,000 programs.
Follow the program EXACTLY AS INSTRUCTED. Do not change it in
any way. It works exceedingly well as it is now. Remember to e-
mail a copy of this exciting report to everyone you can think
of. One of the people you send this to may send out 50,000...and
your name will be on every one of them! Remember though, the
more
you send out the more potential customers you will reach.
So my friend, I have given you the ideas, information, materials
and opportunity to become financially independent, IT IS NOW UP
TO YOU!
"THINK ABOUT IT"
Before you delete this program from your mailbox, as I almost
did, take a little time to read it and REALLY THINK ABOUT IT.
Get a pencil and figure out what could happen when YOU
participate. Figure out the worst possible response and no
matter how you calculate it, you will still make a lot of money!
You will definitely get back what you invested. Any doubts you
have will vanish when your first orders come in. IT WORKS!
Jody Jacobs,
Richmond, VA
HERE'S HOW THIS AMAZING PROGRAM WILL MAKE YOU THOUSANDS OF
DOLLARS
INSTRUCTIONS:
This method of raising capital REALLY WORKS 100 %, EVERY TIME. I
am sure that you could use up to $50,000 or more in the next 90
days. Before you say "BULL... ", please read this program
carefully.
This is not a chain letter, but a perfectly legal money making
opportunity. Basically, this is what you do: As with all multi-
level businesses, we build our business by recruiting new
partners and selling our products. Every state in the USA allows
you to recruit new multi-level business partners, and we offer a
product for EVERY dollar sent. YOUR ORDERS COME BY MAIL AND ARE
FILLED BY E-MAIL, so you are not involved in personal selling.
You do it privately in your own home, store or office. This is
the GREATEST Multi-Level Mail Order Marketing anywhere:
This is what you MUST do:
1. Order all 4 reports shown on the list below (you can't sell
them if you don't order them).
* For each report, send $5.00 (£5) CASH, the NAME & NUMBER OF
THE
REPORT YOU ARE ORDERING, YOUR E-MAIL ADDRESS, and YOUR NAME
& RETURN ADDRESS (in case of a problem) to the person whose name
appears on the list next to the report.
MAKE SURE YOUR RETURN ADDRESS IS ON YOUR ENVELOPE IN CASE OF ANY
MAIL PROBLEMS!
* When you place your order, make sure you order each of the
four reports. You will need all four reports so that you can
save them on your computer and resell them.
* Within a few days you will receive, via e-mail, each of
the four reports. Save them on your computer so they will be
accessible for you to send to the 1,000's of people who will
order them from you.
2. IMPORTANT-- DO NOT alter the names of the people who are
listed next to each report, or their sequence on the list, in
any way other than is instructed below in steps "a" through "f"
or you will lose out on the majority of your profits. Once you
understand the way this works, you'll also see how it doesn't
work if you change it. Remember, this method has been tested,
and if you alter it, it will not work.
a. Look below for the listing of available reports.
b. After you've ordered the four reports, take this letter and
remove the name and address under REPORT #4. This person has
made it through the cycle and is no doubt counting their
$50,000!
c. Move the name and address under REPORT #3 down to REPORT #4.
d. Move the name and address under REPORT #2 down to REPORT #3.
e. Move the name and address under REPORT #1 down to REPORT #2.
f. Insert your name/address in the REPORT #1 position. Please
make sure you copy every name and address ACCURATELY!
3. Take this entire letter, including the modified list of
names, and save it to your computer. Make NO changes to the
instruction portion of this letter.
4. Now you're ready to start an advertising campaign on the
WORLD WIDE WEB! SEND OUT THIS LETTER (with your name added) TO
AS MANY PEOPLE AS YOU CAN, EVEN FRIENDS AND FAMILY. Advertising
on the WEB can be very, very inexpensive, and there are HUNDREDS
of FREE places to advertise. Another avenue which you could use
for advertising is e-mail lists. You can buy these lists for
under $20/20,000 addresses or you can pay someone to take care
of it for you. BE SURE TO START YOUR AD CAMPAIGN IMMEDIATELY!
5. For every $5.00(£5) you receive, all you must do is e-mail
them
the report they ordered. THAT'S IT! ALWAYS PROVIDE SAME-DAY
SERVICE ON ALL ORDERS! This will help guarantee that the e-mail
THEY send out, with YOUR name and address on it, will be prompt
because they can't advertise until they receive the report! To
grow fast be prompt and courteous.
------------------------------------------
AVAILABLE REPORTS
------------------------------------------
***Order Each REPORT by NUMBER and NAME***
Notes:
* - ALWAYS SEND $5(£5) CASH FOR EACH REPORT
* - ALWAYS SEND YOUR ORDER VIA THE QUICKEST DELIVERY
* - Make sure the cash is concealed by wrapping it in at least
two sheets of paper
* - On one of those sheets of paper, include:
(a) the number & name of the report you are ordering,
(b) your e-mail address, and
(c) your postal address.
___________________________________________________________
REPORT #1 "HOW TO MAKE $250,000 THROUGH MULTI-LEVEL SALES"
ORDER REPORT #1 FROM:
K. Winchell (will accept your currency)
PO Box 283
Sandy Creek, NY
USA 13145
_______________________________________________________
REPORT #2 "MAJOR CORPORATIONS AND MULTI-LEVEL SALES"
ORDER REPORT #2 FROM:
E.Mills (will accept your currency)
PO Box 2
Mowbray Heights
Launceston,Tasmania
Australia 7248
________________________________________________
REPORT #3 "SOURCES FOR THE BEST MAILING LISTS"
Jim Wright
38 Pentyla Baglan Rd
Port Talbot
West Glamorgan SA12 8AA
Wales UK
________________________________________________
REPORT #4 "EVALUATING MULTI-LEVEL SALES PLANS"
ORDER REPORT #4 FROM:
Conrad Fry
1 Avon Gardens
West Bridgford
Nottingham England
NG2 6BP
----------------------------------------------------------------
-----
HERE'S HOW THIS AMAZING PLAN WILL MAKE YOU $MONEY$
----------------------------------------------------------------
-----
Let's say you decide to start small just to see how well it
works. Assume your goal is to get 10 people to participate on
your first level. (Placing a lot of FREE ads on the Internet
will EASILY get a larger response.) Also assume that everyone
else in YOUR ORGANIZATION gets ONLY 10 downline members. Follow
this example to achieve the STAGGERING results below.
1st level--your 10 members with $5.......................$50
2nd level--10 members from those 10 ($5 x 100)........$500
3rd level--10 members from those 100 ($5 x 1,000)...$5,000
4th level--10 members from those 1,000 ($5x10,000).$50,000
THIS TOTALS ------ $55,550
Remember, this assumes that the people who participate only
recruit 10 people each. Think for a moment what would happen if
they got 20 people to participate! Lots of people get 100s of
participants! THINK ABOUT IT!
Your cost to participate in this is practically nothing (surely
you can afford $20). You obviously already have an Internet
connection and e-mail is FREE! REPORT #3 shows you the most
productive methods for bulk e-mailing and purchasing e-mail
lists. Some list & bulk e-mail vendors even work on trade!
Over 50,000, new people, get on the Internet EVERYDAY (CBS
NEWS)!
*******TIPS FOR SUCCESS*******
* TREAT THIS AS YOUR BUSINESS! Be prompt, professional, and
follow the directions accurately.
* Send for the four reports IMMEDIATELY so you will have them
when the orders start coming in because: When you receive a $5
order, you MUST send out the requested product (report) to
comply with the U.S. Postal & Lottery Laws, Title 18, Sections
1302 and 1341 or Title 18, Section 3005 in the U.S. Code, also
Code of Federal Regs. vol. 16, Sections 255 and 436, which
state that "a product or service must be exchanged for money
received."
* ALWAYS PROVIDE SAME-DAY SERVICE ON THE ORDERS YOU RECEIVE.
* Be patient and persistent with this program. If you follow
the instructions exactly, the results WILL undoubtedly be
SUCCESSFUL!
* ABOVE ALL, HAVE FAITH IN YOURSELF AND KNOW YOU WILL SUCCEED!
*******YOUR SUCCESS GUIDELINE*******
Follow these guidelines to help assure your success:
If you don't receive 10 to 20 orders for REPORT #1 within two
weeks, continue advertising until you do. Then, a couple of
weeks later you should receive at least 100 orders for REPORT
#2. If you don't, continue advertising until you do. Once you
have received 100 or more orders for REPORT #2, YOU CAN RELAX,
because the system is already working for you, and the cash can
continue to roll in!
THIS IS IMPORTANT TO REMEMBER:
Every time your name is moved down on the list, you are placed
in front of a DIFFERENT report. You can KEEP TRACK of your
PROGRESS by watching which report people are ordering from you.
If you want to generate more income, send another batch of e-
mails and start the whole process again! There is no limit to
the income you will generate from this business!
PLEASE NOTE: If you need help with starting a business,
registering a business name, learning how income tax is handled,
etc., contact your local office of the Small Business
Administration (a Federal agency) 1-(800)827-5722 for free help
and answers to questions. Also, the Internal Revenue Service
offers free help via telephone and free seminars about business
tax requirements. Your earnings and results are highly dependent
on your activities and advertising. This letter constitutes no
guarantees stated nor implied. In the event that it is
determined that this letter constitutes a guarantee of any kind,
that guarantee is now void. Any testimonials or amounts of
earnings listed in this letter may be factual or fictitious. If
you have any question of the legality of this letter contact the
Office of Associate Director for Marketing Practices Federal
Trade Commission Bureau of Consumer Protection in Washington DC.
*******T E S T I M O N I A L S*******
This program does work, but you must follow it EXACTLY!
Especially the rule of not trying to place your name in a
different position, it won't work and you'll lose a lot of
potential income. I'm living proof that it works. It really is a
great opportunity to make relatively easy money, with little
cost to you. If you do choose to participate, follow the program
exactly, and you'll be on your way to financial security.
Sean McLaughlin, Jackson, MS
My name is Frank. My wife, Doris, and I live in Bel-Air, MD. I
am a cost accountant with a major U.S. Corporation and I make
pretty good money. When I received the program I grumbled to
Doris about receiving "junk mail." I made fun of the whole
thing, spouting my knowledge of the population and percentages
involved. I "knew" it wouldn't work. Doris totally ignored my
supposed intelligence and jumped in with both feet. I made
merciless fun of her, and was ready to lay the old "I told you
so" on her when the thing didn't work... well, the laugh was on
me! Within two weeks she had received over 50 responses. Within
45 days she had received over $147,200 in $5 bills! I was
shocked! I was sure that I had it all figured and that it
wouldn't work. I AM a believer now. I have joined Doris in her
"hobby." I did have seven more years until retirement, but I
think of the "rat race" and it's not for me. We owe it all to
MLM.
Frank T., Bel-Air, MD
I just want to pass along my best wishes and encouragement to
you. Any doubts you have will vanish when your first orders come
in. I even checked with the U.S. Post Office to verify that the
plan was legal. It definitely is! IT WORKS!
Paul Johnson, Raleigh, NC
The main reason for this letter is to convince you that this
system is honest, lawful, extremely profitable, and is a way to
get a large amount of money in a short time. I was approached
several times before I checked this out. I joined just to see
what one could expect in return for the minimal effort and money
required. To my astonishment, I received $36,470.00 in the first
14 weeks, with money still coming in.
Phillip A. Brown, Esq.
Not being the gambling type, it took me several weeks to make up
my mind to participate in this plan. But conservative that I am,
I decided that the initial investment was so little that there
was just no way that I wouldn't get enough orders to at least
get my money back. Boy, was I surprised when I found my medium-
size post office box crammed with orders! For a while, it got so
overloaded that I had to start picking up my mail at the
window. I'll make more money this year than any 10 years of my
life before. The nice thing about this plan is that it doesn't
matter where in the U.S. people live. There simply isn't a
better investment with a faster return.
Mary Rockland, Lansing, MI
I had received this program before. I deleted it, but later I
wondered if I shouldn't have given it a try. Of course, I had
no idea who to contact to get another copy, so I had to wait
until I was e-mailed another program...11 months passed then it
came...I didn't delete this one!...I made more than $41,000 on
the first try!!
D. Wilburn, Muncie, IN
This is my third time to participate in this plan. We have quit
our jobs, and will soon buy a home on the beach and live off the
interest on our money. The only way on earth that this plan will
work for you is if you do it. For your sake, and for your
family's sake don't pass up this golden opportunity. Good luck
and happy spending!
Charles Fairchild, Spokane, WA
ORDER YOUR REPORTS TODAY AND GET STARTED ON YOUR ROAD TO
FINANCIAL FREEDOM!
NOW IS THE HOUR!
DECISIVE ACTION YIELDS
POWERFUL RESULTS !
*********************************************************
Your request to be removed will be processed within 24 hours.
DISCLAIMER:
Under Bill s.1618 TITLE III
passed by the 105th US Congress this letter Cannot be considered
Spam as
long as the sender includes
contact information & a method of removal.To be removed from
future
mailings just reply with REMOVE in
the subject line.Thank you for your kind consideration.
From larsga@garshol.priv.no Wed Oct 11 23:20:33 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Oct 2000 00:20:33 +0200
Subject: [XML-SIG] Getting DOCTYPE information using SAX
In-Reply-To: <10227D9D561DD31181E100A0C9655237012E4E11@gofastc3h.gofast.net>
References: <10227D9D561DD31181E100A0C9655237012E4E11@gofastc3h.gofast.net>
Message-ID:
* Ben Wiegert
|
| I am a Python newbie. I have gotten my code to read in and parse
| XML using SAXLIB from PyXML. I can also manipulate what I read in
| and output it to XML. The only thing that I can not seem to grab is
| the DOCTYPE line (or the XML header Line, but I am mainly concerned
| with the DOCTYPE). I need to specify my DTD in the outbound XML
| file? Is there an event in SAX that allows me to get that info?
In SAX 2.0 as it exists in Python 2.0 and PyXML there is not.
There is an extension handler (in SAX 2.0 ext) known as LexicalHandler
that does have an event for this. xmlproc will support this, once
Python 2.0 is out the door and I have time to sit down and write a SAX
driver for it. (There is one written already, but it's for an older
form of SAX.)
--Lars M.
From BestFriend@twcny.rr.com Wed Oct 11 18:56:16 2000
From: BestFriend@twcny.rr.com (BestFriend@twcny.rr.com)
Date: Wed, 11 Oct 2000 17:56:16
Subject: [XML-SIG] WHAT CAN YOU GET FOR $20???
Message-ID: <316.413246.418392@twcny.rr.com>
What can you get for $20.00?
A pizza A tank of gas A haircut
Lunch with a friend A parking place
How About
FINANCIAL INDEPENDENCE!!!!
Looking for that extra something, to help your life have that
little extra
comfort?
Do you work to cover the bills? Fed up with paying out and not
receiving the
rewards you wish for? Then have an open mind And read all of
this,
before you
make a decision- it will be worth your while.
_______________________________________________
Subject: MUST READ! ! ! ... TV Advertised! ! ! ...
Fun-Lucrative
Fellow Entrepreneur
If you wish to learn about an exceptional
opportunity in the Home Business arena...Read On.
"Your living is determined not so much by what life brings to
you as by the attitude you bring to life; not so much by what
happens to you as by the way your mind looks at what happens."
This is going to be a great New Year for you!
Please read all of this!
EARN $100,000 PER YEAR SENDING E-MAIL!!!
****************************************************************
You can earn $50,000 or more in the next 90 days sending e-mail,
seem impossible? Read on for details (no, there is no
'catch')...
----------------------------------------------------------------
"AS SEEN ON NATIONAL TV"
Thank you for your time and Interest. This is the letter you've
been hearing about in the news lately.
Due to the popularity of this letter on the internet, a major
nightly news program recently devoted an entire show to the
investigation of the program, described below, to see if it
really can make people money.
The show also investigated whether or not the program was legal.
Their findings proved once and for all that there are,
absolutely no laws prohibiting the participation in the program.
This has helped to show people that this is a simple, harmless
and fun way to make some extra money at home.
The results of this show have been truly remarkable. Since so
many people are participating now, those involved are doing much
better than ever before. Everyone makes more as more people try
it out. It is very, very exciting to be a part of this plan. You
will understand once you experience it.
"HERE IT IS, BELOW"
================================================
================================================
*** Print This Now For Future Reference ***
The following income opportunity is one you may be interested in
taking a look at. It can be started with VERY LITTLE investment
and the income return is TREMENDOUS!!!
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
If you would like to make at least $50,000 in less than 90 days!
Please read the enclosed program...THEN READ IT AGAIN!!!
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
THIS IS A LEGITIMATE, LEGAL, MONEYMAKING OPPORTUNITY. It does
not require you to come into contact with people, do any hard
work and best of all, you never have to leave the house except
to get the mail. If you believe that someday you'll get that big
break that you've been waiting for, THIS IS IT! Simply follow
the instructions, and your dreams will come true. This e-mail
marketing program works perfectly...100%, EVERY TIME. E-mail is
the sales tool of the future. Take advantage of this non-
commercialized method of advertising NOW!!! The longer you wait,
the more people will be doing business using e-mail. Get your
piece of this program now!
MULTI-LEVEL MARKETING (MLM) has finally gained respectability.
It is being taught in the Harvard Business School, both Stanford
Research and the Wall Street Journal have stated that between
50% and 65% of all goods and services will be sold through
multi-level methods by the late 1990's. This is a Multi-Billion
Dollar industry and of the 500,000 millionaires in the U.S., 20%
(100,000) made their fortune in the last few years in MLM.
Moreover, statistics show 45 people become millionaires everyday
through Multi-Level Marketing.
You may have heard this story before, but over the summer Donald
Trump made an appearance on the David Letterman Show. Dave asked
him what he would do if he lost everything and had to start over
from scratch. Without hesitating, Trump said he would find a
good network marketing company and get to work. The audience
started to hoot and boo him. He looked out at the audience and
dead-panned his response - "That's why I'm sitting up here and
you are all sitting out there!"
With network marketing you have two sources of income. Direct
commissions from sales you make yourself and commissions from
sales made by people you introduce to the business.
Residual income is the secret of the wealthy. It means investing
time or money once and getting paid again and again and again.
In network marketing, it also means getting paid for the work of
others.
The enclosed information is something I almost let slip through
my fingers. Fortunately, sometime later I re-read everything and
gave some thought and study to it.
My name is Ellie Gilbert. Two years ago, the corporation I
worked for, the past twelve years, down-sized and my position
was eliminated.
After many unproductive job interviews, I decided to open my own
business. Over the past year,
I incurred many unforeseen financial problems. I owed my family,
friends and creditors over $40,000... I just couldn't seem to
make ends meet. I had to refinance and borrow against my home to
support my family and struggling business. AT THAT MOMENT
something significant happened in my life and I am writing to
share the experience in hopes that this will change your life,
FINANCIALLY, FOREVER!!!
In mid December, I received this program via e-mail. Six month's
prior to receiving this program I had been sending away for
information on various business opportunities. All of the
programs I received, in my opinion, were not cost effective.
They were either too difficult for me to comprehend or the
initial investment was too much for me to risk to see if they
would work or not. One claimed that I would make a million
dollars in one year...it didn't tell me I'd have to write a best
selling book to make it!
But, as I was saying, in December of 1997 I received this
program. I didn't send for it, or ask for it, they just got my
name off a mailing list. THANK GOODNESS FOR THAT! After reading
it several times, to make sure I was reading it correctly, I
couldn't believe my eyes. Here was a MONEY MAKING PHENOMENON. I
could invest as much as I wanted to start, without putting me
further into debt. After I got a pencil and paper and figured it
out, I would at least get my money back. But like most of you I
was still a little skeptical and a little worried about the
legal aspects of it all. So I checked it out with the U.S. Post
Office (1-800-725-2161 24-hrs) and they confirmed that it is
indeed legal! After determining the program was LEGAL and NOT A
CHAIN LETTER, I decided "WHY NOT."
Initially I sent out 10,000 e-mails. The great thing about e-
mail is that I don't need any money for printing to send out the
program, and because all of my orders are fulfilled via e-mail,
the only expense is my time. I'm telling you as it is, I hope it
doesn't turn you off, but I promised myself that I would not
"rip-off" anyone, no matter how much money it cost me.
In less than one week, I was starting to receive orders for
REPORT #1. By January 13, I had received 26 orders for REPORT
#1. Your goal is to "RECEIVE at least 20 ORDERS FOR REPORT #1
WITHIN 2 WEEKS. If you don't, SEND OUT MORE PROGRAMS UNTIL YOU
DO!" My first step in making $50,000 in 90 days was done. By
January 30, I had received 196 orders for REPORT #2. Your goal
is to "RECEIVE AT LEAST 100+ ORDERS FOR REPORT #2 WITHIN 2
WEEKS. IF NOT, SEND OUT MORE PROGRAMS UNTIL YOU DO. ONCE YOU
HAVE 100 ORDERS, THE REST IS EASY, RELAX, YOU WILL MAKE YOUR
$50,000 GOAL." Well, I had 196 orders for REPORT #2, 96 more
than I needed. So I sat back and relaxed. By March 1, of my e-
mailing of 10,000, I received $58,000 with more coming in every
day.
I paid off ALL my debts and bought a much needed new car. Please
take time to read the attached program, IT WILL CHANGE YOUR LIFE
FOREVER! Remember, it won't work if you don't try it. This
program does work, but you must follow it EXACTLY! Especially
the rules of not trying to place your name in a different
place. It won't work, you'll lose out on a lot of money! In
order for this program to work, you must meet your goal of 20+
orders for REPORT #1, and 100+ orders for REPORT #2 and you will
make $50,000 or more in 90 days. I AM LIVING PROOF THAT IT
WORKS!
If you choose not to participate in this program, I am sorry. It
really is a great opportunity with little cost or risk to you.
If you choose to participate, follow the program and you will be
on your way to financial security.
If you are a business owner and in financial trouble, as I was,
or you want to start your own business, consider this a good
luck sign. I DID!
Sincerely,
Ellie Gilbert
P.S. Do you have any idea what $58,000 looks like piled up on a
kitchen table? IT'S AWESOME!
A PERSONAL NOTE FROM THE ORIGINATOR OF THIS PROGRAM:
By the time you have read the enclosed program and reports you
should have concluded that such a program, one that is legal,
could not have been created by an amateur.
Let me tell you a little about myself. I had a profitable
business for 10 years. Then in 1979 my business began falling
off. I was doing the same things that were previously successful
for me, but it wasn't working. Finally, I figured it out. It
wasn't me, it was the economy. Inflation and recession had
replaced the stable economy that had been with us since 1945. I
don't have to tell you what happened to the unemployment rate...
because many of you know from first hand experience. There were
more failures and bankruptcies than ever before.
The middle class was vanishing. Those who knew what they were
doing invested wisely and moved up. Those who did not,
including those who never had anything to save or invest, were
moving down into the ranks of the poor. As the saying goes,
"THE RICH GET RICHER AND THE POOR GET POORER."
The traditional methods of making money will never allow you to
"move up"
or
"get rich".
You have just received information that can give you financial
freedom for the rest of your life, with "NO RISK" and "JUST A
LITTLE BIT OF EFFORT." You can make more money in the next few
months than you have ever imagined. I should also point out
that I will not see a penny of this money, nor anyone else who
has provided a testimonial for this program. I have already made
over 4 MILLION DOLLARS! I have retired from the program after
sending out over 16,000 programs.
Follow the program EXACTLY AS INSTRUCTED. Do not change it in
any way. It works exceedingly well as it is now. Remember to e-
mail a copy of this exciting report to everyone you can think
of. One of the people you send this to may send out 50,000...and
your name will be on every one of them! Remember though, the
more
you send out the more potential customers you will reach.
So my friend, I have given you the ideas, information, materials
and opportunity to become financially independent, IT IS NOW UP
TO YOU!
"THINK ABOUT IT"
Before you delete this program from your mailbox, as I almost
did, take a little time to read it and REALLY THINK ABOUT IT.
Get a pencil and figure out what could happen when YOU
participate. Figure out the worst possible response and no
matter how you calculate it, you will still make a lot of money!
You will definitely get back what you invested. Any doubts you
have will vanish when your first orders come in. IT WORKS!
Jody Jacobs,
Richmond, VA
HERE'S HOW THIS AMAZING PROGRAM WILL MAKE YOU THOUSANDS OF
DOLLARS
INSTRUCTIONS:
This method of raising capital REALLY WORKS 100 %, EVERY TIME. I
am sure that you could use up to $50,000 or more in the next 90
days. Before you say "BULL... ", please read this program
carefully.
This is not a chain letter, but a perfectly legal money making
opportunity. Basically, this is what you do: As with all multi-
level businesses, we build our business by recruiting new
partners and selling our products. Every state in the USA allows
you to recruit new multi-level business partners, and we offer a
product for EVERY dollar sent. YOUR ORDERS COME BY MAIL AND ARE
FILLED BY E-MAIL, so you are not involved in personal selling.
You do it privately in your own home, store or office. This is
the GREATEST Multi-Level Mail Order Marketing anywhere:
This is what you MUST do:
1. Order all 4 reports shown on the list below (you can't sell
them if you don't order them).
* For each report, send $5.00 (£5) CASH, the NAME & NUMBER OF
THE
REPORT YOU ARE ORDERING, YOUR E-MAIL ADDRESS, and YOUR NAME
& RETURN ADDRESS (in case of a problem) to the person whose name
appears on the list next to the report.
MAKE SURE YOUR RETURN ADDRESS IS ON YOUR ENVELOPE IN CASE OF ANY
MAIL PROBLEMS!
* When you place your order, make sure you order each of the
four reports. You will need all four reports so that you can
save them on your computer and resell them.
* Within a few days you will receive, via e-mail, each of
the four reports. Save them on your computer so they will be
accessible for you to send to the 1,000's of people who will
order them from you.
2. IMPORTANT-- DO NOT alter the names of the people who are
listed next to each report, or their sequence on the list, in
any way other than is instructed below in steps "a" through "f"
or you will lose out on the majority of your profits. Once you
understand the way this works, you'll also see how it doesn't
work if you change it. Remember, this method has been tested,
and if you alter it, it will not work.
a. Look below for the listing of available reports.
b. After you've ordered the four reports, take this letter and
remove the name and address under REPORT #4. This person has
made it through the cycle and is no doubt counting their
$50,000!
c. Move the name and address under REPORT #3 down to REPORT #4.
d. Move the name and address under REPORT #2 down to REPORT #3.
e. Move the name and address under REPORT #1 down to REPORT #2.
f. Insert your name/address in the REPORT #1 position. Please
make sure you copy every name and address ACCURATELY!
3. Take this entire letter, including the modified list of
names, and save it to your computer. Make NO changes to the
instruction portion of this letter.
4. Now you're ready to start an advertising campaign on the
WORLD WIDE WEB! SEND OUT THIS LETTER (with your name added) TO
AS MANY PEOPLE AS YOU CAN, EVEN FRIENDS AND FAMILY. Advertising
on the WEB can be very, very inexpensive, and there are HUNDREDS
of FREE places to advertise. Another avenue which you could use
for advertising is e-mail lists. You can buy these lists for
under $20/20,000 addresses or you can pay someone to take care
of it for you. BE SURE TO START YOUR AD CAMPAIGN IMMEDIATELY!
5. For every $5.00(£5) you receive, all you must do is e-mail
them
the report they ordered. THAT'S IT! ALWAYS PROVIDE SAME-DAY
SERVICE ON ALL ORDERS! This will help guarantee that the e-mail
THEY send out, with YOUR name and address on it, will be prompt
because they can't advertise until they receive the report! To
grow fast be prompt and courteous.
------------------------------------------
AVAILABLE REPORTS
------------------------------------------
***Order Each REPORT by NUMBER and NAME***
Notes:
* - ALWAYS SEND $5(£5) CASH FOR EACH REPORT
* - ALWAYS SEND YOUR ORDER VIA THE QUICKEST DELIVERY
* - Make sure the cash is concealed by wrapping it in at least
two sheets of paper
* - On one of those sheets of paper, include:
(a) the number & name of the report you are ordering,
(b) your e-mail address, and
(c) your postal address.
___________________________________________________________
REPORT #1 "HOW TO MAKE $250,000 THROUGH MULTI-LEVEL SALES"
ORDER REPORT #1 FROM:
K. Winchell (will accept your currency)
PO Box 283
Sandy Creek, NY
USA 13145
_______________________________________________________
REPORT #2 "MAJOR CORPORATIONS AND MULTI-LEVEL SALES"
ORDER REPORT #2 FROM:
E.Mills (will accept your currency)
PO Box 2
Mowbray Heights
Launceston,Tasmania
Australia 7248
________________________________________________
REPORT #3 "SOURCES FOR THE BEST MAILING LISTS"
Jim Wright
38 Pentyla Baglan Rd
Port Talbot
West Glamorgan SA12 8AA
Wales UK
________________________________________________
REPORT #4 "EVALUATING MULTI-LEVEL SALES PLANS"
ORDER REPORT #4 FROM:
Conrad Fry
1 Avon Gardens
West Bridgford
Nottingham England
NG2 6BP
----------------------------------------------------------------
-----
HERE'S HOW THIS AMAZING PLAN WILL MAKE YOU $MONEY$
----------------------------------------------------------------
-----
Let's say you decide to start small just to see how well it
works. Assume your goal is to get 10 people to participate on
your first level. (Placing a lot of FREE ads on the Internet
will EASILY get a larger response.) Also assume that everyone
else in YOUR ORGANIZATION gets ONLY 10 downline members. Follow
this example to achieve the STAGGERING results below.
1st level--your 10 members with $5.......................$50
2nd level--10 members from those 10 ($5 x 100)........$500
3rd level--10 members from those 100 ($5 x 1,000)...$5,000
4th level--10 members from those 1,000 ($5x10,000).$50,000
THIS TOTALS ------ $55,550
Remember, this assumes that the people who participate only
recruit 10 people each. Think for a moment what would happen if
they got 20 people to participate! Lots of people get 100s of
participants! THINK ABOUT IT!
Your cost to participate in this is practically nothing (surely
you can afford $20). You obviously already have an Internet
connection and e-mail is FREE! REPORT #3 shows you the most
productive methods for bulk e-mailing and purchasing e-mail
lists. Some list & bulk e-mail vendors even work on trade!
Over 50,000, new people, get on the Internet EVERYDAY (CBS
NEWS)!
*******TIPS FOR SUCCESS*******
* TREAT THIS AS YOUR BUSINESS! Be prompt, professional, and
follow the directions accurately.
* Send for the four reports IMMEDIATELY so you will have them
when the orders start coming in because: When you receive a $5
order, you MUST send out the requested product (report) to
comply with the U.S. Postal & Lottery Laws, Title 18, Sections
1302 and 1341 or Title 18, Section 3005 in the U.S. Code, also
Code of Federal Regs. vol. 16, Sections 255 and 436, which
state that "a product or service must be exchanged for money
received."
* ALWAYS PROVIDE SAME-DAY SERVICE ON THE ORDERS YOU RECEIVE.
* Be patient and persistent with this program. If you follow
the instructions exactly, the results WILL undoubtedly be
SUCCESSFUL!
* ABOVE ALL, HAVE FAITH IN YOURSELF AND KNOW YOU WILL SUCCEED!
*******YOUR SUCCESS GUIDELINE*******
Follow these guidelines to help assure your success:
If you don't receive 10 to 20 orders for REPORT #1 within two
weeks, continue advertising until you do. Then, a couple of
weeks later you should receive at least 100 orders for REPORT
#2. If you don't, continue advertising until you do. Once you
have received 100 or more orders for REPORT #2, YOU CAN RELAX,
because the system is already working for you, and the cash can
continue to roll in!
THIS IS IMPORTANT TO REMEMBER:
Every time your name is moved down on the list, you are placed
in front of a DIFFERENT report. You can KEEP TRACK of your
PROGRESS by watching which report people are ordering from you.
If you want to generate more income, send another batch of e-
mails and start the whole process again! There is no limit to
the income you will generate from this business!
PLEASE NOTE: If you need help with starting a business,
registering a business name, learning how income tax is handled,
etc., contact your local office of the Small Business
Administration (a Federal agency) 1-(800)827-5722 for free help
and answers to questions. Also, the Internal Revenue Service
offers free help via telephone and free seminars about business
tax requirements. Your earnings and results are highly dependent
on your activities and advertising. This letter constitutes no
guarantees stated nor implied. In the event that it is
determined that this letter constitutes a guarantee of any kind,
that guarantee is now void. Any testimonials or amounts of
earnings listed in this letter may be factual or fictitious. If
you have any question of the legality of this letter contact the
Office of Associate Director for Marketing Practices Federal
Trade Commission Bureau of Consumer Protection in Washington DC.
*******T E S T I M O N I A L S*******
This program does work, but you must follow it EXACTLY!
Especially the rule of not trying to place your name in a
different position, it won't work and you'll lose a lot of
potential income. I'm living proof that it works. It really is a
great opportunity to make relatively easy money, with little
cost to you. If you do choose to participate, follow the program
exactly, and you'll be on your way to financial security.
Sean McLaughlin, Jackson, MS
My name is Frank. My wife, Doris, and I live in Bel-Air, MD. I
am a cost accountant with a major U.S. Corporation and I make
pretty good money. When I received the program I grumbled to
Doris about receiving "junk mail." I made fun of the whole
thing, spouting my knowledge of the population and percentages
involved. I "knew" it wouldn't work. Doris totally ignored my
supposed intelligence and jumped in with both feet. I made
merciless fun of her, and was ready to lay the old "I told you
so" on her when the thing didn't work... well, the laugh was on
me! Within two weeks she had received over 50 responses. Within
45 days she had received over $147,200 in $5 bills! I was
shocked! I was sure that I had it all figured and that it
wouldn't work. I AM a believer now. I have joined Doris in her
"hobby." I did have seven more years until retirement, but I
think of the "rat race" and it's not for me. We owe it all to
MLM.
Frank T., Bel-Air, MD
I just want to pass along my best wishes and encouragement to
you. Any doubts you have will vanish when your first orders come
in. I even checked with the U.S. Post Office to verify that the
plan was legal. It definitely is! IT WORKS!
Paul Johnson, Raleigh, NC
The main reason for this letter is to convince you that this
system is honest, lawful, extremely profitable, and is a way to
get a large amount of money in a short time. I was approached
several times before I checked this out. I joined just to see
what one could expect in return for the minimal effort and money
required. To my astonishment, I received $36,470.00 in the first
14 weeks, with money still coming in.
Phillip A. Brown, Esq.
Not being the gambling type, it took me several weeks to make up
my mind to participate in this plan. But conservative that I am,
I decided that the initial investment was so little that there
was just no way that I wouldn't get enough orders to at least
get my money back. Boy, was I surprised when I found my medium-
size post office box crammed with orders! For a while, it got so
overloaded that I had to start picking up my mail at the
window. I'll make more money this year than any 10 years of my
life before. The nice thing about this plan is that it doesn't
matter where in the U.S. people live. There simply isn't a
better investment with a faster return.
Mary Rockland, Lansing, MI
I had received this program before. I deleted it, but later I
wondered if I shouldn't have given it a try. Of course, I had
no idea who to contact to get another copy, so I had to wait
until I was e-mailed another program...11 months passed then it
came...I didn't delete this one!...I made more than $41,000 on
the first try!!
D. Wilburn, Muncie, IN
This is my third time to participate in this plan. We have quit
our jobs, and will soon buy a home on the beach and live off the
interest on our money. The only way on earth that this plan will
work for you is if you do it. For your sake, and for your
family's sake don't pass up this golden opportunity. Good luck
and happy spending!
Charles Fairchild, Spokane, WA
ORDER YOUR REPORTS TODAY AND GET STARTED ON YOUR ROAD TO
FINANCIAL FREEDOM!
NOW IS THE HOUR!
DECISIVE ACTION YIELDS
POWERFUL RESULTS !
*********************************************************
Your request to be removed will be processed within 24 hours.
DISCLAIMER:
Under Bill s.1618 TITLE III
passed by the 105th US Congress this letter Cannot be considered
Spam as
long as the sender includes
contact information & a method of removal.To be removed from
future
mailings just reply with REMOVE in
the subject line.Thank you for your kind consideration.
From akuchlin@mems-exchange.org Thu Oct 12 03:46:15 2000
From: akuchlin@mems-exchange.org (A.M. Kuchling)
Date: Wed, 11 Oct 2000 22:46:15 -0400
Subject: [XML-SIG] What's New section on XML
Message-ID: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com>
Here's draft text for a section that briefly discusses the new XML
support in Python 2.0. Criticisms and comments, please...
--amk
13 XML Modules
Python 1.5.2 included a simple XML parser in the form of the xmllib
module, contributed by Sjoerd Mullender. Since 1.5.2's release, two
different interfaces for processing XML have become common: SAX2
(version 2 of the Simple API for XML) provides an event-driven
interface with some similarities to xmllib, and the DOM (Document
Object Model) provides a tree-based interface, transforming an XML
document into a tree of nodes that can be traversed and modified.
Python 2.0 includes a SAX2 interface and a stripped-down DOM interface
as part of the xml package. Here we will give a brief overview of
these new interfaces; consult the Python documentation or the source
code for complete details. The Python XML SIG is also working on
improved documentation.
13.1 SAX2 Support
SAX defines an event-driven interface for parsing XML. To use SAX, you
must write a SAX handler class. Handler classes inherit from various
classes provided by SAX, and override various methods that will then
be called by the XML parser. For example, the startElement and
endElement methods are called for every starting and end tag
encountered by the parser, the characters() method is called for every
chunk of character data, and so forth.
The advantage of the event-driven approach is that that the whole
document doesn't have to be resident in memory at any one time, which
matters if you are processing really huge documents. However, writing
the SAX handler class can get very complicated if you're trying to
modify the document structure in some elaborate way.
For example, this little example program defines a handler that prints
a message for every starting and ending tag, and then parses the file
hamlet.xml using it:
from xml import sax
class SimpleHandler(sax.ContentHandler):
def startElement(self, name, attrs):
print 'Start of element:', name, attrs.keys()
def endElement(self, name):
print 'End of element:', name
# Create a parser object
parser = sax.make_parser()
# Tell it what handler to use
handler = SimpleHandler()
parser.setContentHandler( handler )
# Parse a file!
parser.parse( 'hamlet.xml' )
For more information, consult the Python documentation, or the XML
HOWTO at http://www.python.org/doc/howto/xml/.
13.2 DOM Support
The Document Object Model is a tree-based representation for an XML
document. A top-level Document instance is the root of the tree, and
has a single child which is the top-level Element instance. This
Element has children nodes representing character data and any
sub-elements, which may have further children of their own, and so
forth. Using the DOM you can traverse the resulting tree any way you
like, access element and attribute values, insert and delete nodes,
and convert the tree back into XML.
The DOM is useful for modifying XML documents, because you can create
a DOM tree, modify it by adding new nodes or rearranging subtrees, and
then produce a new XML document as output. You can also construct a
DOM tree manually and convert it to XML, which can be a more flexible
way of producing XML output than simply writing ... to a
file.
The DOM implementation included with Python lives in the
xml.dom.minidom module. It's a lightweight implementation of the Level
1 DOM with support for XML namespaces. The parse() and parseString()
convenience functions are provided for generating a DOM tree:
from xml.dom import minidom
doc = minidom.parse('hamlet.xml')
doc is a Document instance. Document, like all the other DOM classes
such as Element and Text, is a subclass of the Node base class. All
the nodes in a DOM tree therefore support certain common methods, such
as toxml() which returns a string containing the XML representation of
the node and its children. Each class also has special methods of its
own; for example, Element and Document instances have a method to find
all child elements with a given tag name. Continuing from the previous
2-line example:
perslist = doc.getElementsByTagName( 'PERSONA' )
print perslist[0].toxml()
print perslist[1].toxml()
For the Hamlet XML file, the above few lines output:
CLAUDIUS, king of Denmark.
HAMLET, son to the late, and nephew to the present king.
The root element of the document is available as doc.documentElement,
and its children can be easily modified by deleting, adding, or
removing nodes:
root = doc.documentElement
# Remove the first child
root.removeChild( root.childNodes[0] )
# Move the new first child to the end
root.appendChild( root.childNodes[0] )
# Insert the new first child (originally,
# the third child) before the 20th child.
root.insertBefore( root.childNodes[0], root.childNodes[20] )
Again, I will refer you to the Python documentation for a complete
listing of the different Node classes and their various methods.
13.3 Relationship to PyXML
The XML Special Interest Group has been working on XML-related Python
code for a while. Its code distribution, called PyXML, is available
from the SIG's Web pages at http://www.python.org/sigs/xml-sig/. The
PyXML distribution also used the package name "xml". If you've written
programs that used PyXML, you're probably wondering about its
compatibility with the 2.0 xml package.
The answer is that Python 2.0's xml package isn't compatible with
PyXML, but can be made compatible by installing a recent version
PyXML. Many applications can get by with the XML support that is
included with Python 2.0, but more complicated applications will
require that the full PyXML package will be installed. When installed,
PyXML versions 0.6.0 or greater will replace the xml package shipped
with Python, and will be a strict superset of the standard package,
adding a bunch of additional features. Some of the additional features
in PyXML include:
* 4DOM, a full DOM implementation from FourThought LLC.
* The xmlproc validating parser, written by Lars Marius Garshol.
* The sgmlop parser accelerator module, written by Fredrik Lundh
From uche.ogbuji@fourthought.com Thu Oct 12 08:30:17 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 12 Oct 2000 01:30:17 -0600
Subject: [XML-SIG] ANN: 4Suite 0.9.1
Message-ID: <200010120730.BAA18082@localhost.localdomain>
Fourthought, Inc. (http://Fourthought.com) announces the release of
4Suite 0.9.1
---------------------------
Open source tools for standards-based XML, DOM, XPath, XSLT, RDF
XPointer and object-database development in Python
4Suite is a collection of Python tools for XML processing and object
database management. An integrated packaging of several formerly
separately-distributed components: 4DOM, 4XPath and 4XSLT, 4RDF, 4ODS
and featuring the new 4XPointer.
More info and Obtaining 4Suite
------------------------------
Please see
http://Fourthought.com/4Suite
Or you can download 4Suite source from
ftp://Fourthought.com/pub/4Suite
There are Windows Packages, Linux RPM and Linux binary also available at
ftp://Fourthought.com/pub/4Suite
4Suite is distributed under a license similar to that of Python.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +01 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From alf@logilab.com Thu Oct 12 09:25:37 2000
From: alf@logilab.com (Alexandre Fayolle)
Date: Thu, 12 Oct 2000 10:25:37 +0200 (CEST)
Subject: [XML-SIG] Generating XML documents
In-Reply-To: <39E480EF.5C56215E@pixelsystems.com>
Message-ID:
On Wed, 11 Oct 2000, Patrick Blanchette wrote:
> Hi,
> I'm a newbie in xml. I want to generate new xml documents using
> python code. In the HOWTO doc, there is a "xml.dom.builder" class but
> this class did not seem to be part of the PyXML 0.6.1.
> Where can I found a python base class for generating xml documents?
Actually, generating XML document is pretty easy: you just have to build a
string which begins with '' and add your tags in the
string, and flush it to disk. This works well if you know in advance what
you want oin your XML document.
OTOH, if you want to build your document incrementally, you can use DOM.
you've got to import the DOM implementation first:
>>> from xml.dom import implementation
and build a document from the implementation:
>>> docType = implementation.createDocumentType('','','')
>>> doc = implementation.createDocument('',None,docType)
Then you can set your root element:
>>> root = doc.createElementNS('','docRoot')
>>> doc.appendChild(root)
And then you can use the createXXX methods of your document to create new
nodes and use appendChild to add them to other nodes. If you want to set
attributes, you can use the setAttributeNS method in Element. Beware if
you're using python1.5.2, you cannot use no ASCII characters as arguments
to createTextNode() or setAttributeNS() (at least not if you intend to
save your file to disk), since these expect UTF-8 strings, and not
iso-8859-1 strings.
When you're done creating your document, use Print or PrettyPrint to save
it to disk:
>>> from xml.dom.ext import PrettyPrint
>>> f=open('/tmp/test.xml','w')
>>> PrettyPrint(doc,f)
If you want to reload your file, use the readers:
>>> from xml.dom.ext.reader import Sax2
>>> doc = Sax2.FromXmlFile('/tmp/test.xml')
If you saved it with PrettyPrint, the new dom tree will have text nodes
full of whitespace. You can use the StripXml extension to clean the tree:
>>> from xml.dom.ext import StripXml
>>> StripXml(doc)
I'd suggest that you check the W3C DOM spec (at least core DOM), since the
DOM implementation in PyXml0.6 in a good implementation of the spec. I'll
be glad to help you if you have further questions with DOM.
--
Alexandre Fayolle
http://www.logilab.com - "Mais où est donc Ornicar ?" -
LOGILAB, Paris (France).
From alf@logilab.com Thu Oct 12 09:42:10 2000
From: alf@logilab.com (Alexandre Fayolle)
Date: Thu, 12 Oct 2000 10:42:10 +0200 (CEST)
Subject: [XML-SIG] What's New section on XML
In-Reply-To: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com>
Message-ID:
On Wed, 11 Oct 2000, A.M. Kuchling wrote:
> Here's draft text for a section that briefly discusses the new XML
> support in Python 2.0. Criticisms and comments, please...
> The DOM implementation included with Python lives in the
> xml.dom.minidom module. It's a lightweight implementation of the Level
> 1 DOM with support for XML namespaces.
Is minidom a *strict* implementation of DOM L1, with some extensions that
would point towards DOM L2 (namespaces) ?
--
Alexandre Fayolle
http://www.logilab.com - "Mais où est donc Ornicar ?" -
LOGILAB, Paris (France).
From alf@logilab.com Thu Oct 12 11:37:41 2000
From: alf@logilab.com (Alexandre Fayolle)
Date: Thu, 12 Oct 2000 12:37:41 +0200 (CEST)
Subject: [XML-SIG] packaging issue
Message-ID:
Isn't there a packaging issue with 4DOM being possibly provided by to
means (PyXml and 4Suite) ? I don't know about RPMs, but I understand that
this will cause major headaches to .deb maintainers.
--
Alexandre Fayolle
http://www.logilab.com - "Mais où est donc Ornicar ?" -
LOGILAB, Paris (France).
From akuchlin@mems-exchange.org Thu Oct 12 15:37:48 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Thu, 12 Oct 2000 10:37:48 -0400
Subject: [XML-SIG] What's New section on XML
In-Reply-To: ; from alf@logilab.com on Thu, Oct 12, 2000 at 10:42:10AM +0200
References: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com>
Message-ID: <20001012103748.B8959@kronos.cnri.reston.va.us>
On Thu, Oct 12, 2000 at 10:42:10AM +0200, Alexandre Fayolle wrote:
>Is minidom a *strict* implementation of DOM L1, with some extensions that
>would point towards DOM L2 (namespaces) ?
Beats me. It doesn't seem to be a strict L1 implementation, since
last night I found some non-compliances. See bugs #116677 and #116678
on SourceForge. I've assigned them to Paul for fixing, but if someone
else wants to tackle them, feel free; I may attempt to write a patch
myself.
--amk
From larsga@garshol.priv.no Thu Oct 12 16:48:02 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 12 Oct 2000 17:48:02 +0200
Subject: [XML-SIG] What's New section on XML
In-Reply-To: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com>
References: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com>
Message-ID:
* A. M. Kuchling
|
| Here's draft text for a section that briefly discusses the new XML
| support in Python 2.0. Criticisms and comments, please...
I like it!
However, I think it be worth mentioning that expat underlies SAX and
the DOM, comes with 2.0 and supports Unicode. Adding that both SAX
and the DOM are parser-independent may also be worthwhile.
--Lars M.
From jeremy@beopen.com Thu Oct 12 17:11:53 2000
From: jeremy@beopen.com (Jeremy Hylton)
Date: Thu, 12 Oct 2000 12:11:53 -0400 (EDT)
Subject: [XML-SIG] test_minidom non-failure failure? (take 2)
Message-ID: <14821.58057.358947.271778@bitdiddle.concentric.net>
Sorry about the previous message; a mail munger somewhere between my
display and python.org choked on a very long line...
I am getting an occasional, hard-to-reproduce error in test_minidom.
When I run the test, it displays about a thousand lines of garbage,
but the test suite does not report test_minidom as failed or skipped.
The output I see during the test run is this:
test_minidom
garbage: [{'nodeValue': u'Obsolete but implemented...', 'nextSibling':
, 'childNodes': None, 'attributes': None,
'parentNode': None, 'data': u'Obsolete but implemented...',
'previousSibling': None}, ,
{'nodeValue': u'\012', 'nextSibling': None, 'childNodes': None, 'a
[... many hundreds of lines omitted]
At the end of the test, I get a pretty normal result:
95 tests OK.
13 tests skipped: test_al test_cd test_cl test_dbm test_dl test_gl
test_imgfile test_largefile test_nis test_sunaudiodev test_timing
test_winreg test_winsound
So two questions:
Why is test_minidom producing all this output? And why is it only
happening intermittently?
Why does regrtest.py think that test_minidom is working correctly when
it produces all this output?
Jeremy
From brian@watchmark.com Thu Oct 12 17:15:16 2000
From: brian@watchmark.com (Brian Fritz)
Date: Thu, 12 Oct 2000 09:15:16 -0700
Subject: [XML-SIG] Q: Post install testing errors
Message-ID: <39E5E394.84051CDA@watchmark.com>
I just installed the PyXML-0.5.5.1 software on my SparcStation here
at work.
I ran the PyXML-0.5.5.1/test/testxml.py script and got the following
results:
> blackriver /export/home/PyXML-0.5.5.1/test> python testxml.py
> test_dom
> test_dom2
> Warning: can't open ./output/test_dom2
> >
> test_domu
> test_howto
> test_htmlb
> test_marshal
> test_pyexpat
> test test_pyexpat failed -- Writing: 'Summary of XML parser upcalls:', expected:
> 'Parser returned 1\012Summary of X'
> test_sax
> test_unicode
> test_utils
> test_xmllib
> test test_xmllib skipped -- an optional feature could not be imported
> 9 tests OK.
> 1 test failed: test_pyexpat
> 1 test skipped: test_xmllib
> blackriver /export/home/PyXML-0.5.5.1/test>
I then ran the test_xmllib.py script and recieved the following error:
> blackriver /export/home/PyXML-0.5.5.1/test> python test_xmllib.py
> Traceback (innermost last):
> File "test_xmllib.py", line 25, in ?
> from xml.parsers import xmllib
> ImportError: cannot import name xmllib
> blackriver /export/home/PyXML-0.5.5.1/test>
Are these errors worth worrying about, or should I just get on with using
(learning) Python and XML?
Thanks in Advance!
Brian
From nas@arctrix.com Thu Oct 12 10:31:34 2000
From: nas@arctrix.com (Neil Schemenauer)
Date: Thu, 12 Oct 2000 02:31:34 -0700
Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2)
In-Reply-To: <14821.58057.358947.271778@bitdiddle.concentric.net>; from jeremy@beopen.com on Thu, Oct 12, 2000 at 12:11:53PM -0400
References: <14821.58057.358947.271778@bitdiddle.concentric.net>
Message-ID: <20001012023134.A18254@glacier.fnational.com>
On Thu, Oct 12, 2000 at 12:11:53PM -0400, Jeremy Hylton wrote:
> I am getting an occasional, hard-to-reproduce error in test_minidom.
> When I run the test, it displays about a thousand lines of garbage,
> but the test suite does not report test_minidom as failed or skipped.
>
> The output I see during the test run is this:
>
> test_minidom
> garbage: [{'nodeValue': u'Obsolete but implemented...', 'nextSibling':
This is most likely the garbage collector. regrtest.py contains
the following code:
if findleaks:
gc.collect()
if gc.garbage:
print "garbage:", repr(gc.garbage)
found_garbage.extend(gc.garbage)
del gc.garbage[:]
findleaks is true if the -l option is specified (TESTOPS in the
makefile includes it). Something is producing cyclic garbage.
Neil
From guido@python.org Thu Oct 12 18:39:40 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 12 Oct 2000 12:39:40 -0500
Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2)
In-Reply-To: Your message of "Thu, 12 Oct 2000 02:31:34 MST."
<20001012023134.A18254@glacier.fnational.com>
References: <14821.58057.358947.271778@bitdiddle.concentric.net>
<20001012023134.A18254@glacier.fnational.com>
Message-ID: <200010121739.MAA07968@cj20424-a.reston1.va.home.com>
> On Thu, Oct 12, 2000 at 12:11:53PM -0400, Jeremy Hylton wrote:
> > I am getting an occasional, hard-to-reproduce error in test_minidom.
> > When I run the test, it displays about a thousand lines of garbage,
> > but the test suite does not report test_minidom as failed or skipped.
> >
> > The output I see during the test run is this:
> >
> > test_minidom
> > garbage: [{'nodeValue': u'Obsolete but implemented...', 'nextSibling':
[Neil]
> This is most likely the garbage collector. regrtest.py contains
> the following code:
>
> if findleaks:
> gc.collect()
> if gc.garbage:
> print "garbage:", repr(gc.garbage)
> found_garbage.extend(gc.garbage)
> del gc.garbage[:]
>
> findleaks is true if the -l option is specified (TESTOPS in the
> makefile includes it). Something is producing cyclic garbage.
Of course something is producing cyclic garbage!
The DOM tree is full of parent and child links.
Does this output mean that the GC works correctly? Or does it mean
that there is a reason why this garbage cannot be disposed of?
In the latter case, could that be because there are __del__ methods?
--Guido van Rossum (home page: http://www.python.org/~guido/)
From fdrake@beopen.com Thu Oct 12 17:55:19 2000
From: fdrake@beopen.com (Fred L. Drake, Jr.)
Date: Thu, 12 Oct 2000 12:55:19 -0400 (EDT)
Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2)
In-Reply-To: <20001012023134.A18254@glacier.fnational.com>
References: <14821.58057.358947.271778@bitdiddle.concentric.net>
<20001012023134.A18254@glacier.fnational.com>
Message-ID: <14821.60663.213246.179325@cj42289-a.reston1.va.home.com>
Neil Schemenauer writes:
> This is most likely the garbage collector. regrtest.py contains
> the following code:
...
> findleaks is true if the -l option is specified (TESTOPS in the
> makefile includes it). Something is producing cyclic garbage.
This is definately the problem.
Lars, Paul: This looks like a problem in the unlink() method of the
DOM. Could you please check that the unlink() method is updated to
handle the latest version of the other changes?
Thanks!
-Fred
--
Fred L. Drake, Jr.
BeOpen PythonLabs Team Member
From fdrake@beopen.com Thu Oct 12 17:59:33 2000
From: fdrake@beopen.com (Fred L. Drake, Jr.)
Date: Thu, 12 Oct 2000 12:59:33 -0400 (EDT)
Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2)
In-Reply-To: <14821.58057.358947.271778@bitdiddle.concentric.net>
References: <14821.58057.358947.271778@bitdiddle.concentric.net>
Message-ID: <14821.60917.429141.652655@cj42289-a.reston1.va.home.com>
Jeremy Hylton writes:
> Why is test_minidom producing all this output? And why is it only
> happening intermittently?
It isn't. See Neil's excellent explanation.
> Why does regrtest.py think that test_minidom is working correctly when
> it produces all this output?
The test is passing just fine, and is complete before the test for
garbage is performed. The unlink() method on DOM objects is the
culprit; it is updating the Node.allnodes dictionary correctly, but
not the Node instances. I've already asked Paul & Lars to fix this;
it should work just fine with or without GC once they've seen the
report.
-Fred
--
Fred L. Drake, Jr.
BeOpen PythonLabs Team Member
From uche.ogbuji@fourthought.com Thu Oct 12 18:16:14 2000
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Thu, 12 Oct 2000 11:16:14 -0600
Subject: [XML-SIG] Re: [4suite] ANN: 4Suite 0.9.1
References:
Message-ID: <39E5F1DE.14311B@fourthought.com>
Alexandre Fayolle wrote:
>
> Congratulations.
>
> It works great here.
Phew! No one uncovers bugs as diligently as you do, so that's good to
hear. Of course, it's only been a few hours, eh?
> > 4Suite is a collection of Python tools for XML processing and object
> > database management. An integrated packaging of several formerly
> > separately-distributed components: 4DOM, 4XPath and 4XSLT, 4RDF, 4ODS
> > and featuring the new 4XPointer.
>
> Why does 4XPointer live in Ft/ and not in xml/ ? Do you plan to move xpath
> and xslt to Ft/ too ?
The Python XML-SIG really "owns" the "xml" Python package namespace and
we don't intrude on it without its permission. DOM moved there for
obvious reasons (it's now the official full DOM of the SIG). Earlier
this year the decision was made to move xslt and xpath there as part of
initial moves towards incorporating them into the xml-sig package but I
think concerns over their cross-platform compatability are holding up
outright adoption. We haven't discusssed 4RDF or 4XPointer (or the
coming 4XLink).
Is it time to reopen this discussion?
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From nas@arctrix.com Thu Oct 12 11:24:48 2000
From: nas@arctrix.com (Neil Schemenauer)
Date: Thu, 12 Oct 2000 03:24:48 -0700
Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2)
In-Reply-To: <200010121739.MAA07968@cj20424-a.reston1.va.home.com>; from guido@python.org on Thu, Oct 12, 2000 at 12:39:40PM -0500
References: <14821.58057.358947.271778@bitdiddle.concentric.net> <20001012023134.A18254@glacier.fnational.com> <200010121739.MAA07968@cj20424-a.reston1.va.home.com>
Message-ID: <20001012032448.A18407@glacier.fnational.com>
On Thu, Oct 12, 2000 at 12:39:40PM -0500, Guido van Rossum wrote:
> Of course something is producing cyclic garbage!
>
> The DOM tree is full of parent and child links.
>
> Does this output mean that the GC works correctly? Or does it
> mean that there is a reason why this garbage cannot be disposed
> of? In the latter case, could that be because there are
> __del__ methods?
The -l option tries to find any cyclic garbage produced by the
tests. I don't think that that option should be enabled default.
The output means that the GC is working and is finding stuff that
would not be freed by reference counting alone.
I can't tell if the GC would free this garbage. The -l option
sets the DEBUG_SAVEALL option which causes all garbage found to
end up in gc.garbage, not just garbage the can't be cleaned up.
I don't have pyexpat installed here so I can't test it. If you
want to find out if test_minidom is creating garbage the
collector can't free you should comment out the:
gc.set_debug(gc.DEBUG_SAVEALL)
line in regrtest.py and run:
regrtest.py -l test_minidom
If that does what I think it does and you still get the "garbage: "
line then the test is creating evil things. :)
Neil
From jumpytom@yahoo.com Fri Oct 13 16:20:24 2000
From: jumpytom@yahoo.com (Jack Greene)
Date: Fri, 13 Oct 2000 08:20:24 -0700 (PDT)
Subject: [XML-SIG] sax import error on WinNT and Py-1.5.2
Message-ID: <20001013152024.35406.qmail@web9704.mail.yahoo.com>
I use sax (L.M. Garshol's saxlib 1.0) in an
application written in Python 1.5.2 (on WinNT4 SP5).
sax is imported using the standard "from xml.sax
import ...".
That used to work fine until this week when, after a
hard drive failure, I had to reinstall everything on
this machine. Now I get a "ImportError: No module
named xml.sax"
sax is installed in /Lib which is where, I
think, I had it last time. The problem is, I can't
remember what I did when I installed sax originally to
get it to work.
What am I doing wrong (besides not writing down
important stuff like that)?
Jack
__________________________________________________
Do You Yahoo!?
Get Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/
From larsga@garshol.priv.no Fri Oct 13 16:26:59 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 13 Oct 2000 17:26:59 +0200
Subject: [XML-SIG] sax import error on WinNT and Py-1.5.2
In-Reply-To: <20001013152024.35406.qmail@web9704.mail.yahoo.com>
References: <20001013152024.35406.qmail@web9704.mail.yahoo.com>
Message-ID:
* Jack Greene
|
| That used to work fine until this week when, after a hard drive
| failure, I had to reinstall everything on this machine. Now I get a
| "ImportError: No module named xml.sax"
Hmmm. One thing I would check for is whether both xml/ and xml/sax/
contain the needed __init__.py files.
One thing you might also try is to import xml and print xml.__file__.
I can see no obvious mistakes or things that you should have done that
you haven't.
--Lars M.
From nuno.simoes@ruido-visual.pt Fri Oct 13 16:32:35 2000
From: nuno.simoes@ruido-visual.pt (Nuno Simoes)
Date: Fri, 13 Oct 2000 16:32:35 +0100
Subject: [XML-SIG] sax import error on WinNT and Py-1.5.2
References: <20001013152024.35406.qmail@web9704.mail.yahoo.com>
Message-ID: <39E72B13.92A8B307@ruido-visual.pt>
Jack Greene wrote:
Hi.
[...]
> What am I doing wrong (besides not writing down
> important stuff like that)?
You should have a directory tree like this:
~/lib/xml
~/lib/xml/dom
~/lib/xml/marshal
~/lib/xml/parsers
~/lib/xml/sax
~/lib/xml/unicode
~/lib/xml/utils
Remember to do a "python setup.py build" to build the pyc files.
Another thing, in Win32, you must/should copy the files under ~/windows
to ~/lib/xml/parsers/ .
Nuno Simões,
RVTI
From larsga@garshol.priv.no Fri Oct 13 21:08:05 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 13 Oct 2000 22:08:05 +0200
Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure? (take 2)
In-Reply-To: <14821.60663.213246.179325@cj42289-a.reston1.va.home.com>
References: <14821.58057.358947.271778@bitdiddle.concentric.net> <20001012023134.A18254@glacier.fnational.com> <14821.60663.213246.179325@cj42289-a.reston1.va.home.com>
Message-ID:
* Fred L. Drake, Jr.
|
| This looks like a problem in the unlink() method of the DOM. Could
| you please check that the unlink() method is updated to handle the
| latest version of the other changes?
It seems that the current unlink() does not remove sibling cycles.
Patch #101897 adds a line to set sibling references to None, which
seems to make regrtest.py -l happy.
--Lars M.
From prescod@prescod.net Fri Oct 13 21:12:38 2000
From: prescod@prescod.net (Paul)
Date: Fri, 13 Oct 2000 15:12:38 -0500 (CDT)
Subject: [XML-SIG] Re: [Python-Dev] test_minidom non-failure failure?
(take 2)
In-Reply-To:
Message-ID:
Right, I just checked in the fix to that.
Paul Prescod
On 13 Oct 2000, Lars Marius Garshol wrote:
>
> * Fred L. Drake, Jr.
> |
> | This looks like a problem in the unlink() method of the DOM. Could
> | you please check that the unlink() method is updated to handle the
> | latest version of the other changes?
>
> It seems that the current unlink() does not remove sibling cycles.
> Patch #101897 adds a line to set sibling references to None, which
> seems to make regrtest.py -l happy.
>
> --Lars M.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://www.python.org/mailman/listinfo/python-dev
>
From uche.ogbuji@fourthought.com Sun Oct 15 06:25:08 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sat, 14 Oct 2000 23:25:08 -0600
Subject: [XML-SIG] 4RDF on XML.com
Message-ID: <200010150525.XAA24601@localhost.localdomain>
I just wanted to note that my write-up on 4RDF is this week's feature article
on XML.com
It's been an interesting time since we offered up the first public glimpse of
4RDF. A lot of quite excited response. Dan Brickley, editor of the RDF
Schema spec, in an e-mail exchange with me, mentioned he thought the feature
set was good enough that he'd consider learning Python. Indeed I went on to
take a look at the other RDF systems out there and I think 4RDF is way ahead
of the pack. It avoids the horrid contortions of the SIRPAC API and provides
full support from models, containers, full serialization/deserialization,
reification, etc. through schemas. It provides extra amenities such as
pluggable back ends and Inference through RDF Inference Language.
At any rate, there's much more to read at
http://www.xml.com/pub/2000/10/11/rdf/index.html
and until next Thursday, simply
http://www.xml.com
Will suffice.
Comments are welcome, especially from this group.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From pingl_l@yahoo.com Mon Oct 16 00:27:07 2000
From: pingl_l@yahoo.com (Ping Li)
Date: Sun, 15 Oct 2000 18:27:07 -0500
Subject: [XML-SIG] Web Site Translation for Ligand-Protein Docking
Message-ID: <20001015232702.YBDN2291.mtiwmhc21.worldnet.att.net@ea>
From: Ping Li
To: Ligand-Protein Docking
Dear Web Manager,
I visited your Web site at http://www.scripps.edu/pub/olson-web/people/gmm
and would like to let you know that your Web site could also be presented
in other languages for broader recognition. If you feel that my suggestion
has no value, please kindly ignore this message and accept my apology.
According to research done by International Data Corporation, non-English
speaking users will make up over 50% of the total online population by 2002,
and 70% by 2004. Business Web users are three times more likely to buy when
addressed in their own languages (survey by Forrester Research).
We specialize in Web Site Translation and Global URL Submission in 11
languages - English, Spanish, French, German, Italian, Portuguese, Dutch,
Russian, Chinese, Japanese and Korean. Our customized service package
includes:
1. Web page translation and Web programming localization -- We not only
convert HTML pages and graphics, but also ensure that the translated Web
sites fully function in the target language environment.
2. URL submission and re-submission to leading search engines and business
directories in the target languages -- Our Web promotion specialists optimize
the keywords and descriptions of the translated Web sites for ideal search
engine rankings.
3. Company profile translation and a free listing in GlobalListing, a
multi-language business directory.
The translation is conducted by our professional translators who are native
speakers and have years of Web translation experience. We are not hired for
the ability to take a word in one language and convert it into an equivalent
word in another language. Instead, we get to the heart of communication and
express the true meaning of your message, because we are aware that improper
translation may cause unrecoverable damages to your company's image. At
GlobalListing, we are in the business of ensuring that you are 100% satisfied
with our quality services.
Thank you very much for your time. Should you be interested in our services,
please contact me for a free estimate or any further information.
Best regards,
Ping Lee
Director, Web Translation
GlobalListing
Phone: (604) 324-4638
Fax: (413) 431-2597
Office Hour: 9:00AM - 5:00PM (Pacific Time)
From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 08:22:26 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 16 Oct 2000 09:22:26 +0200
Subject: [XML-SIG] What's New section on XML
In-Reply-To: (message
from Alexandre Fayolle on Thu, 12 Oct 2000 10:42:10 +0200 (CEST))
References:
Message-ID: <200010160722.JAA00853@loewis.home.cs.tu-berlin.de>
> Is minidom a *strict* implementation of DOM L1, with some extensions that
> would point towards DOM L2 (namespaces) ?
No, minidom does not support all of DOM L1. See my patch on minidom
documentation on SF for a detailed list of things it does and doesn't.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 08:19:50 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 16 Oct 2000 09:19:50 +0200
Subject: [XML-SIG] Q: Post install testing errors
In-Reply-To: <39E5E394.84051CDA@watchmark.com> (message from Brian Fritz on
Thu, 12 Oct 2000 09:15:16 -0700)
References: <39E5E394.84051CDA@watchmark.com>
Message-ID: <200010160719.JAA00836@loewis.home.cs.tu-berlin.de>
> Are these errors worth worrying about, or should I just get on with using
> (learning) Python and XML?
These errors are nothing to worry about. Please note that parts of the
package have seen some changes recently (in particular, the DOM
implementation); if you want to live on the "cutting edge", you should
install PyXML 0.6.1 (from http://sourceforge.net/projects/pyxml)
instead.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 08:31:24 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 16 Oct 2000 09:31:24 +0200
Subject: [XML-SIG] What's New section on XML
In-Reply-To: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com>
(amk@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com)
References: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com>
Message-ID: <200010160731.JAA00943@loewis.home.cs.tu-berlin.de>
> parser.setContentHandler( handler )
I've always wondered where this style of spacing originates. The BDFL
says he hates it when he sees a space in this place
(http://www.python.org/doc/essays/styleguide.html); so do I.
Otherwise, it looks fine to me.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 08:36:57 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 16 Oct 2000 09:36:57 +0200
Subject: [XML-SIG] packaging issue
In-Reply-To: (message
from Alexandre Fayolle on Thu, 12 Oct 2000 12:37:41 +0200 (CEST))
References:
Message-ID: <200010160736.JAA00987@loewis.home.cs.tu-berlin.de>
> Isn't there a packaging issue with 4DOM being possibly provided by to
> means (PyXml and 4Suite) ? I don't know about RPMs, but I understand that
> this will cause major headaches to .deb maintainers.
I think the problem is beyond what those package systems are designed
to do. It's not that the distributions coincidently include the same
files - they have the same file names on purpose. An intelligent
decision of the system administrator is required to decide which one
to use - the packaging system can't (and shouldn't) make this decision.
Distributors can make a decision for the maintainer; I can see a
number of intelligent decisions: Include the 4DOM modules only in one
of them; produce an independent 4DOM package; provide a single package
containing both PyXML and 4Suite.
In any case: this is a packaging problem; I don't think such a problem
should change what the XML-SIG or Fourthought maintains in there
source code repositories. Patches to setup.py of PyXML are certainly
welcome.
Regards,
Martin
From larsga@garshol.priv.no Mon Oct 16 09:35:59 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 16 Oct 2000 10:35:59 +0200
Subject: [XML-SIG] How to proceed
Message-ID:
Now that SAX 2.0 is more-or-less done I am ready to start working on
other Python-XML projects. I'd like to hear the opinion of those of
you here on how to proceed. Below is a list of projects that I am
thinking about (I can't actually promise that I will get round to all
of them).
The question is, where do you think development of these packages
ought to happen? As part of the XML-SIG work, as a separate
SourceForge project or privately, the way I've done so far.
xmlproc
This needs to be updated to XML 1.0 2nd ed, extended with Unicode
support and a SAX 2.0 driver (I have 95% of one ready) and also
improved in various ways.
dtddoc
This has not been taken very far yet, but could become a useful
package if more thought and effort were put into it.
saxlib
I plan for this package to contain lots of SAX 2.0-related
utilities, like DOM2SAX walkers, XInclude and XBase filters, more
advanced parser instantiation tools, more drivers etc.
RSS-kit
This is a toolkit for working with RSS documents that I have had
lying around for more than a year. It's now getting much closer to
being useful; the question is where I should develop it further.
--Lars M.
From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 09:39:43 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 16 Oct 2000 10:39:43 +0200
Subject: [XML-SIG] Re: [4suite] ANN: 4Suite 0.9.1
In-Reply-To: <39E5F1DE.14311B@fourthought.com> (message from Uche Ogbuji on
Thu, 12 Oct 2000 11:16:14 -0600)
References: <39E5F1DE.14311B@fourthought.com>
Message-ID: <200010160839.KAA01656@loewis.home.cs.tu-berlin.de>
> Is it time to reopen this discussion?
Certainly, yes.
> The Python XML-SIG really "owns" the "xml" Python package namespace and
> we don't intrude on it without its permission. DOM moved there for
> obvious reasons (it's now the official full DOM of the SIG). Earlier
> this year the decision was made to move xslt and xpath there as part of
> initial moves towards incorporating them into the xml-sig package but I
> think concerns over their cross-platform compatability are holding up
> outright adoption.
Could somebody please summarize what these concerns where? I
understand to fully build 4XPath from source, you need BisonGen,
bison, flex, SWIG, ... what else? This is indeed an impressive list of
prerequisites, but I can't see anything inherently platform-dependent
in it.
Since 4Suite 0.9.1 comes with these files prebuilt, it can't be too
difficult to adjust the PyXML build procedure to also assume they are
generated.
Would patches to the build process be accepted? It seems that it
should be easy to get at least SWIG out of the picture, by properly
changing BisonGen. In the long run, I wish Python had a standard
parser generator so the dependency on Bison could be removed; that's
beyond reach at the moment.
Does anybody think it is funny that you need EBNF parsers in XML tools?-)
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Mon Oct 16 09:47:01 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 16 Oct 2000 10:47:01 +0200
Subject: [XML-SIG] What's New section on XML
In-Reply-To: <20001012103748.B8959@kronos.cnri.reston.va.us> (message from
Andrew Kuchling on Thu, 12 Oct 2000 10:37:48 -0400)
References: <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com> <20001012103748.B8959@kronos.cnri.reston.va.us>
Message-ID: <200010160847.KAA01704@loewis.home.cs.tu-berlin.de>
> Beats me. It doesn't seem to be a strict L1 implementation, since
> last night I found some non-compliances. See bugs #116677 and #116678
> on SourceForge. I've assigned them to Paul for fixing, but if someone
> else wants to tackle them, feel free; I may attempt to write a patch
> myself.
minidom is quite a limited implementation of the DOM, with many
details missing. It seems the general rule is not to provide
"convenience functions", i.e. if something can be achieved by other
means, then don't provide this function. It is sufficient for building
a tree from a document and analyzing it; I probably wouldn't attempt
heavy structural manipulations on the tree (*).
We'll have to see how well users accept that approach. It may be
desirable to fully conform to DOM Core (of some level) for a later
Python release, even if that means that minidom will grow in size and
perhaps even slow down.
Regards,
Martin
(*) It would be intereresting to survey what people *do* use the DOM
for; it's not all that clear to me that all features of the DOM are
really in use.
From loewis@informatik.hu-berlin.de Mon Oct 16 12:18:16 2000
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 16 Oct 2000 13:18:16 +0200 (MET DST)
Subject: [XML-SIG] PyXML home page on SF
Message-ID: <200010161118.NAA10370@pandora.informatik.hu-berlin.de>
I've added a new page on http://pyxml.sourceforge.net/, and made this
the project home page. Sometimes, people had ran into the page, and
got what still is in http://pyxml.sourceforge.net/index.php.
Please let me know what you think. Patches to the page are welcome;
comments that having that page is a stupid idea will be considered :-)
Regards,
Martin
From akuchlin@mems-exchange.org Mon Oct 16 15:08:37 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Mon, 16 Oct 2000 10:08:37 -0400
Subject: [XML-SIG] PyXML home page on SF
In-Reply-To: <200010161118.NAA10370@pandora.informatik.hu-berlin.de>; from loewis@informatik.hu-berlin.de on Mon, Oct 16, 2000 at 01:18:16PM +0200
References: <200010161118.NAA10370@pandora.informatik.hu-berlin.de>
Message-ID: <20001016100837.B9235@kronos.cnri.reston.va.us>
On Mon, Oct 16, 2000 at 01:18:16PM +0200, Martin von Loewis wrote:
>I've added a new page on http://pyxml.sourceforge.net/, and made this
>the project home page. Sometimes, people had ran into the page, and
>got what still is in http://pyxml.sourceforge.net/index.php.
Should the XML topic guide be moved to a set of pages on
pyxml.sourceforge.net? I'm really the only person left who can update
the topic guide, and having the Web pages accessible through CVS would
mean more people could keep them up to date.
This would require 2 steps: 1) check the pages into CVS, along with
the required scripts, and 2) set up a redirect from
www.python.org/topics/xml/ to pyxml.sourceforge.net.
--amk
From uche.ogbuji@fourthought.com Mon Oct 16 15:08:55 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 16 Oct 2000 08:08:55 -0600
Subject: [XML-SIG] How to proceed
In-Reply-To: Message from Lars Marius Garshol
of "16 Oct 2000 10:35:59 +0200."
Message-ID: <200010161408.IAA01441@localhost.localdomain>
>
> Now that SAX 2.0 is more-or-less done I am ready to start working on
> other Python-XML projects. I'd like to hear the opinion of those of
> you here on how to proceed. Below is a list of projects that I am
> thinking about (I can't actually promise that I will get round to all
> of them).
>
> The question is, where do you think development of these packages
> ought to happen? As part of the XML-SIG work, as a separate
> SourceForge project or privately, the way I've done so far.
>
> xmlproc
> This needs to be updated to XML 1.0 2nd ed, extended with Unicode
> support and a SAX 2.0 driver (I have 95% of one ready) and also
> improved in various ways.
This has my vote, easily.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Mon Oct 16 15:16:35 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 16 Oct 2000 08:16:35 -0600
Subject: [XML-SIG] What's New section on XML
In-Reply-To: Message from "A.M. Kuchling"
of "Wed, 11 Oct 2000 22:46:15 EDT." <200010120246.WAA07736@207-172-113-151.s151.tnt5.ann.va.dialup.rcn.com>
Message-ID: <200010161416.IAA01501@localhost.localdomain>
> * 4DOM, a full DOM implementation from FourThought LLC.
Fourthought, Inc., actually.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From larsga@garshol.priv.no Mon Oct 16 15:22:51 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 16 Oct 2000 16:22:51 +0200
Subject: [XML-SIG] How to proceed
In-Reply-To: <200010161408.IAA01441@localhost.localdomain>
References: <200010161408.IAA01441@localhost.localdomain>
Message-ID:
* uche ogbuji
|
| This has my vote, easily.
What has? I was asking where you (and the others) think development
should happen, in the XML-SIG, as separate projects on SourceForge or
privately (as has been done so far).
All your email told me was that you have some opinion about xmlproc,
but I haven't a clue what it was. :-)
--Lars M.
From larsga@garshol.priv.no Mon Oct 16 15:23:29 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 16 Oct 2000 16:23:29 +0200
Subject: [XML-SIG] PyXML home page on SF
In-Reply-To: <20001016100837.B9235@kronos.cnri.reston.va.us>
References: <200010161118.NAA10370@pandora.informatik.hu-berlin.de> <20001016100837.B9235@kronos.cnri.reston.va.us>
Message-ID:
* Andrew Kuchling
|
| Should the XML topic guide be moved to a set of pages on
| pyxml.sourceforge.net? I'm really the only person left who can
| update the topic guide, and having the Web pages accessible through
| CVS would mean more people could keep them up to date.
I'm definitely for this. There are many times when I know I would
have updated things on the pages if I'd had access.
--Lars M.
From chris@rpgarchive.com Mon Oct 16 15:58:29 2000
From: chris@rpgarchive.com (chris davis)
Date: Mon, 16 Oct 2000 09:58:29 -0500
Subject: [XML-SIG] the faster way to get a dom.
Message-ID: <39EB1795.395C41D@rpgarchive.com>
I wondering what is the fastest (as in speed of processing) to get a
DOM. Below is the way OI;ve been doing, but lately Ilve had to deal
with very lrage XML documents and I wondeing if ther is a way to imporev
speed.
from xml.dom.ext.reader import Sax
def parseXml(s,ownerDocument=None):
"parse and return doc"
doc = Sax.FromXml(s,ownerDocument)
ext.StripXml(doc)
return doc
From uche.ogbuji@fourthought.com Mon Oct 16 16:19:13 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 16 Oct 2000 09:19:13 -0600
Subject: [XML-SIG] How to proceed
In-Reply-To: Message from Lars Marius Garshol
of "16 Oct 2000 16:22:51 +0200."
Message-ID: <200010161519.JAA02069@localhost.localdomain>
> * uche ogbuji
> |
> | This has my vote, easily.
>
> What has? I was asking where you (and the others) think development
> should happen, in the XML-SIG, as separate projects on SourceForge or
> privately (as has been done so far).
Ah, I hadn't enough sleep when I responded.
> All your email told me was that you have some opinion about xmlproc,
> but I haven't a clue what it was. :-)
I meant that I would much prefer to see development on xmlproc. In answer to
your real question, though, I think they might as well all go on Sourceforge
sonmce it will give others a chance to pitch in. (I would have considered
"others pitching in" a remote contingency until recently when Martin stepped
in and pretty much saved the XML-SIG).
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Mon Oct 16 16:24:38 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Mon, 16 Oct 2000 09:24:38 -0600
Subject: [XML-SIG] the faster way to get a dom.
In-Reply-To: Message from chris davis
of "Mon, 16 Oct 2000 09:58:29 CDT." <39EB1795.395C41D@rpgarchive.com>
Message-ID: <200010161524.JAA02091@localhost.localdomain>
> I wondering what is the fastest (as in speed of processing) to get a
> DOM. Below is the way OI;ve been doing, but lately Ilve had to deal
> with very lrage XML documents and I wondeing if ther is a way to imporev
> speed.
>
> from xml.dom.ext.reader import Sax
>
> def parseXml(s,ownerDocument=None):
> "parse and return doc"
> doc = Sax.FromXml(s,ownerDocument)
> ext.StripXml(doc)
> return doc
There is a lot of overhead in 4DOM's SAX reader. We've cut some out and we
wonder whether we'll soon be reaching the point of diminishing requrns
optimizing that.
Maybe it's time for a c-level DOM builder. We have one for cDomlette, a tiny
DOM written entirely in C (with Python interface, of course) which comes with
4Suite. It would take quite some effort to scale it up to the full 4DOM,
though.
Are the large documents such that a subset of the DOM would suffice for your
use? If so, have a look at cDomlette in 4Suite.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From Mike.Olson@fourthought.com Mon Oct 16 17:54:47 2000
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Mon, 16 Oct 2000 10:54:47 -0600
Subject: [RIL] Re: [XML-SIG] Re: [4suite] ANN: 4Suite 0.9.1
References: <39E5F1DE.14311B@fourthought.com> <200010160839.KAA01656@loewis.home.cs.tu-berlin.de>
Message-ID: <39EB32D7.10BB8074@FourThought.com>
"Martin v. Loewis" wrote:
>
> > Is it time to reopen this discussion?
>
> Certainly, yes.
>
> > The Python XML-SIG really "owns" the "xml" Python package namespace and
> > we don't intrude on it without its permission. DOM moved there for
> > obvious reasons (it's now the official full DOM of the SIG). Earlier
> > this year the decision was made to move xslt and xpath there as part of
> > initial moves towards incorporating them into the xml-sig package but I
> > think concerns over their cross-platform compatability are holding up
> > outright adoption.
>
> Could somebody please summarize what these concerns where? I
> understand to fully build 4XPath from source, you need BisonGen,
> bison, flex, SWIG, ... what else? This is indeed an impressive list of
> prerequisites, but I can't see anything inherently platform-dependent
> in it.
I think we have addressed most of these concerns in the latest releases
of 4Suite. We now check in all of the generated files so all you need
is PyXML and a c compilier.
I think the only concern left is the not all of 4Suite should be
included into the "xml" pacakge. 4ODS for sure does not belong
there...The rest, RDF, XPointer, XLink, I think all fit.
Mike
>
> Does anybody think it is funny that you need EBNF parsers in XML tools?-)
>
> Regards,
> Martin
> _______________________________________________
> RIL mailing list
> RIL@lists.fourthought.com
> http://lists.fourthought.com/mailman/listinfo/ril
--
Mike Olson Principal Consultant
mike.olson@fourthought.com (303)583-9900 x 102
Fourthought, Inc. http://Fourthought.com
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From Nicolas.Chauvat@logilab.fr Mon Oct 16 17:59:18 2000
From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat)
Date: Mon, 16 Oct 2000 18:59:18 +0200 (CEST)
Subject: [XML-SIG] PyXML home page on SF
In-Reply-To:
Message-ID:
On 16 Oct 2000, Lars Marius Garshol wrote:
> * Andrew Kuchling
> |=20
> | Should the XML topic guide be moved to a set of pages on
> | pyxml.sourceforge.net? I'm really the only person left who can
> | update the topic guide, and having the Web pages accessible through
> | CVS would mean more people could keep them up to date.
>=20
> I'm definitely for this. There are many times when I know I would
> have updated things on the pages if I'd had access.
Lots of python projects would benefit from sourceforge-like tools.
Sourceforge itself is open source. What about having python.org or
pythonlabs/beopen host a sourceforge like system as does
www.bioinformatics.org for projects related to bioinformatics ?
That would be pyxml.pythonforge.org, distutils.pythonforge.org, etc. :-)
Opinions ?
--=20
Nicolas Chauvat
http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F=
rance)
From rsalz@caveosystems.com Mon Oct 16 19:19:54 2000
From: rsalz@caveosystems.com (Rich Salz)
Date: Mon, 16 Oct 2000 14:19:54 -0400
Subject: [XML-SIG] PyXML home page on SF
References:
Message-ID: <39EB46CA.9D27CC70@caveosystems.com>
There's a lot more to running a service than just compiling the
software.
> Lots of python projects would benefit from sourceforge-like tools.
> Sourceforge itself is open source. What about having python.org or
> pythonlabs/beopen host a sourceforge like system as does
> www.bioinformatics.org for projects related to bioinformatics ?
>
> That would be pyxml.pythonforge.org, distutils.pythonforge.org, etc. :-)
I suggest just use the free SF service.
From gstein@lyra.org Mon Oct 16 20:08:27 2000
From: gstein@lyra.org (Greg Stein)
Date: Mon, 16 Oct 2000 12:08:27 -0700
Subject: [XML-SIG] PyXML home page on SF
In-Reply-To: <20001016100837.B9235@kronos.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Mon, Oct 16, 2000 at 10:08:37AM -0400
References: <200010161118.NAA10370@pandora.informatik.hu-berlin.de> <20001016100837.B9235@kronos.cnri.reston.va.us>
Message-ID: <20001016120826.A347@lyra.org>
On Mon, Oct 16, 2000 at 10:08:37AM -0400, Andrew Kuchling wrote:
> On Mon, Oct 16, 2000 at 01:18:16PM +0200, Martin von Loewis wrote:
> >I've added a new page on http://pyxml.sourceforge.net/, and made this
> >the project home page. Sometimes, people had ran into the page, and
> >got what still is in http://pyxml.sourceforge.net/index.php.
>
> Should the XML topic guide be moved to a set of pages on
> pyxml.sourceforge.net? I'm really the only person left who can update
> the topic guide, and having the Web pages accessible through CVS would
> mean more people could keep them up to date.
>
> This would require 2 steps: 1) check the pages into CVS, along with
> the required scripts, and 2) set up a redirect from
> www.python.org/topics/xml/ to pyxml.sourceforge.net.
+1 !!
--
Greg Stein, http://www.lyra.org/
From gstein@lyra.org Mon Oct 16 22:28:57 2000
From: gstein@lyra.org (Greg Stein)
Date: Mon, 16 Oct 2000 14:28:57 -0700
Subject: [XML-SIG] How to proceed
In-Reply-To: ; from larsga@garshol.priv.no on Mon, Oct 16, 2000 at 04:22:51PM +0200
References: <200010161408.IAA01441@localhost.localdomain>
Message-ID: <20001016142857.F25097@lyra.org>
On Mon, Oct 16, 2000 at 04:22:51PM +0200, Lars Marius Garshol wrote:
>
> * uche ogbuji
> |
> | This has my vote, easily.
>
> What has? I was asking where you (and the others) think development
> should happen, in the XML-SIG, as separate projects on SourceForge or
> privately (as has been done so far).
It would be nice to have xmlproc bundled as part of PyXML, which means the
source should be included with the rest (on SF, in the PyXML project).
Cheers,
-g
--
Greg Stein, http://www.lyra.org/
From gstein@lyra.org Mon Oct 16 22:39:49 2000
From: gstein@lyra.org (Greg Stein)
Date: Mon, 16 Oct 2000 14:39:49 -0700
Subject: [XML-SIG] PyXML home page on SF
In-Reply-To: <39EB46CA.9D27CC70@caveosystems.com>; from rsalz@caveosystems.com on Mon, Oct 16, 2000 at 02:19:54PM -0400
References: <39EB46CA.9D27CC70@caveosystems.com>
Message-ID: <20001016143949.G25097@lyra.org>
Agreed. There is bandwidth, administration, backups, etc.
I see *very* little benefit to avoiding SourceForge and starting a new one.
Cheers,
-g
On Mon, Oct 16, 2000 at 02:19:54PM -0400, Rich Salz wrote:
> There's a lot more to running a service than just compiling the
> software.
>
> > Lots of python projects would benefit from sourceforge-like tools.
> > Sourceforge itself is open source. What about having python.org or
> > pythonlabs/beopen host a sourceforge like system as does
> > www.bioinformatics.org for projects related to bioinformatics ?
> >
> > That would be pyxml.pythonforge.org, distutils.pythonforge.org, etc. :-)
>
> I suggest just use the free SF service.
>
> _______________________________________________
> XML-SIG maillist - XML-SIG@python.org
> http://www.python.org/mailman/listinfo/xml-sig
--
Greg Stein, http://www.lyra.org/
From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 00:00:29 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 17 Oct 2000 01:00:29 +0200
Subject: [XML-SIG] PyXML home page on SF
In-Reply-To: <20001016100837.B9235@kronos.cnri.reston.va.us> (message from
Andrew Kuchling on Mon, 16 Oct 2000 10:08:37 -0400)
References: <200010161118.NAA10370@pandora.informatik.hu-berlin.de> <20001016100837.B9235@kronos.cnri.reston.va.us>
Message-ID: <200010162300.BAA00783@loewis.home.cs.tu-berlin.de>
> Should the XML topic guide be moved to a set of pages on
> pyxml.sourceforge.net? I'm really the only person left who can
> update the topic guide, and having the Web pages accessible through
> CVS would mean more people could keep them up to date.
If you'd be willing to perform the updates, I'd be fine if they stay
on python.org. It is certainly the case that they are more accessible
on SF. I'm not sure what the future of python.org is - if there is a
chance that it gets as open as SF, it would probably be better if such
stuff stays on python.org. I'm a bit worried how long python.org stays
in its current state, though.
> This would require 2 steps: 1) check the pages into CVS, along with
> the required scripts, and 2) set up a redirect from
> www.python.org/topics/xml/ to pyxml.sourceforge.net.
I'm not really sure how to set up CVS-controlled pages on SF, but I
guess others have done that, so I could find out. I certainly agree
that having such a procedure is a prerequisite for moving any
contents. The redirect could happen once the content has moved.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 00:08:06 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 17 Oct 2000 01:08:06 +0200
Subject: [XML-SIG] How to proceed
In-Reply-To: (message from Lars Marius
Garshol on 16 Oct 2000 10:35:59 +0200)
References:
Message-ID: <200010162308.BAA00830@loewis.home.cs.tu-berlin.de>
> xmlproc
> This needs to be updated to XML 1.0 2nd ed, extended with Unicode
> support and a SAX 2.0 driver (I have 95% of one ready) and also
> improved in various ways.
It's probably in the interest of PyXML users to get updates to xmlproc
together with PyXML updates, instead of collecting things from various
sources - so yes, I'd like to see your continuing support for this
parser in PyXML.
> saxlib
> I plan for this package to contain lots of SAX 2.0-related
> utilities, like DOM2SAX walkers, XInclude and XBase filters, more
> advanced parser instantiation tools, more drivers etc.
It sounds like a good idea to offer more functionality in
saxlib. However, we have to be careful that PyXML continues to be a
strict superset of Python 2.0. To achieve that, I'd like to see the
2.0-provided functionality be split-off before adding more
stuff. Then, it should be possible to use saxlib in a line-by-line
identical form with 2.0 (e.g. by having xml.sax.saxlib20), and still
provide extra functionality in xml.sax.saxlib.
I'd also like to hear opinions on *where* this functionality should be
located - if it is not clearly specific to SAX, xml.sax may not be the
right place.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 00:18:17 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 17 Oct 2000 01:18:17 +0200
Subject: [XML-SIG] How to proceed
In-Reply-To: <200010161519.JAA02069@localhost.localdomain>
(uche.ogbuji@fourthought.com)
References: <200010161519.JAA02069@localhost.localdomain>
Message-ID: <200010162318.BAA00906@loewis.home.cs.tu-berlin.de>
> I meant that I would much prefer to see development on xmlproc. In
> answer to your real question, though, I think they might as well all
> go on Sourceforge sonmce it will give others a chance to pitch in.
I'm all in favour of a bazaar-style operation (release early, release
often). I'm willing to help as I can to allow Lars to release his
stuff through PyXML. If he thinks parts of it are not ready for
consumption by the typical PyXML user, then having different SF
projects might provide the right balance between releasing early and
committing to specific API too early (not that PyXML will guarantee
stable API in all modules - Python 2.0 is there for stable documented
API).
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 00:24:47 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 17 Oct 2000 01:24:47 +0200
Subject: [XML-SIG] the faster way to get a dom.
In-Reply-To: <200010161524.JAA02091@localhost.localdomain>
(uche.ogbuji@fourthought.com)
References: <200010161524.JAA02091@localhost.localdomain>
Message-ID: <200010162324.BAA00966@loewis.home.cs.tu-berlin.de>
> Are the large documents such that a subset of the DOM would suffice for your
> use?
On another note, does it make a difference to use a different parser?
I don't know whether sgmlop is sophisticated and compatible enough for
the ext.reader functions - I'd be interested to learn whether it makes
any difference, though.
It appears that you can't tell FromXml and friends what parser to use;
if you have PyXML 0.6.1, you can influence choice of parser by setting
the PY_SAX_PARSER environment variable (use
xml.sax,saxexts.make_parser() to see whether it really gives you a
sgmlop driver).
Again, it'd be quite interesting to learn about your findings; if you
think something should work but doesn't, we'd like to know as well.
Regards,
Martin
From Nicolas.Chauvat@logilab.fr Tue Oct 17 11:18:41 2000
From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat)
Date: Tue, 17 Oct 2000 12:18:41 +0200 (CEST)
Subject: [XML-SIG] PyXML home page on SF
In-Reply-To: <200010162335.BAA01015@loewis.home.cs.tu-berlin.de>
Message-ID:
[maintaining a pythonforge.org would take time and ressources]
Yes. But I understand a commercial company (BeOpen) has taken over python
development. I think it is the kind of free services they could give
back to the python community.
> > That would be pyxml.pythonforge.org, distutils.pythonforge.org, etc. :-=
)
>=20
> I'd personally hope that python.org becomes accessible in that
> way. You could probably have all of the current content there, and
> then also have python.org/projects/pyxml; python.org/users/someone;
> xml.python.org (and whatever other gimmicks they offer).
>
> One advantage of taking things off SF is that responsiveness of that
> system was really bad. That seems to have improved recently;
My concern is not about responsiveness as much as distribution (as in
Internet is a distributed system). SourceForge is a great service. Good.
Now are we to host every single open source project on SourceForge ? If we
do so, the day SourceForge closes or changes its policy, or whatever,
every single open source project will be halted or maybe discontinued.
The people at SourceForge know their job: they provide a good service and
the tools (code+doc) to implement that same service at other places.
Why wouldn't a community as big an active as python's put up ressources in
common to offer such a useful service, but dedicated to python
development ? There use to be a python.starship.net, maintained by
volunteers, that is now hosted by BeOpen. Why not take the next step ?
> if anybody is to host a similar server, they need to be aware that it
> is probably hard to compete with SF in terms of provided services. For
> example, I trust that SF has a reasonable backup strategy - they
> simply can't risk a desaster. Anybody hosting a server for just a few
> projects would not get the same sort of trust from me.
That's why I think we shouldn't look for someone able to host a server for
a few projects but for some company(ies) able to put up the ressources for
"python projects" and volunteers that help them.
That would also make it easier for people to look for python ressources:
code in development would be at something.python.org and 3rd party
software at www.vex.net/parnassus. But I would agree that's a weak
argument as long as www.python.org continues to be well-maintained with
no broken links and stays the central hub for python information.
I'm sure several people from PythonLabs are on this list. What is their
opinion ?
--=20
Nicolas Chauvat
http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F=
rance)
From larsga@garshol.priv.no Tue Oct 17 12:41:47 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 17 Oct 2000 13:41:47 +0200
Subject: [XML-SIG] How to proceed
In-Reply-To: <200010162318.BAA00906@loewis.home.cs.tu-berlin.de>
References: <200010161519.JAA02069@localhost.localdomain> <200010162318.BAA00906@loewis.home.cs.tu-berlin.de>
Message-ID:
* Martin v. Loewis
|
| I'm all in favour of a bazaar-style operation (release early,
| release often). I'm willing to help as I can to allow Lars to
| release his stuff through PyXML. If he thinks parts of it are not
| ready for consumption by the typical PyXML user, then having
| different SF projects might provide the right balance between
| releasing early and committing to specific API too early (not that
| PyXML will guarantee stable API in all modules - Python 2.0 is there
| for stable documented API).
In general, I think xmlproc fits in well with the XML-SIG stuff.
saxlib (as I now envision it) would probably also fit in there.
dtddoc and rsskit are XML applications rather than XML infrastructure,
and so are substantially different from the other two. They are also
much less mature as ideas and I will probably not develop these as
actively as the other two.
The only reason I can see for not including xmlproc is that I would
like to be able to basically develop it the way I want, and add
whatever features I like and at my own speed. If you think that would
work just as well under the XML-SIG umbrella then I think we can do
that.
--Lars M.
From larsga@garshol.priv.no Tue Oct 17 12:46:01 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 17 Oct 2000 13:46:01 +0200
Subject: [XML-SIG] How to proceed
In-Reply-To: <200010162308.BAA00830@loewis.home.cs.tu-berlin.de>
References: <200010162308.BAA00830@loewis.home.cs.tu-berlin.de>
Message-ID:
* Lars Marius Garshol
|
| saxlib
| I plan for this package to contain lots of SAX 2.0-related
| utilities, like DOM2SAX walkers, XInclude and XBase filters, more
| advanced parser instantiation tools, more drivers etc.
* Martin v. Loewis
|
| It sounds like a good idea to offer more functionality in
| saxlib. However, we have to be careful that PyXML continues to be a
| strict superset of Python 2.0.
My idea for saxlib is that it should be a toolkit with SAX 2.0-related
add-ons. I didn't really intend for it to contain SAX 2.0 itself, just
useful drivers, filters and similar kinds of utilities.
| To achieve that, I'd like to see the 2.0-provided functionality be
| split-off before adding more stuff.
Agreed.
| Then, it should be possible to use saxlib in a line-by-line
| identical form with 2.0 (e.g. by having xml.sax.saxlib20), and still
| provide extra functionality in xml.sax.saxlib.
Well, whether this belongs in the xml.sax package or not is unclear.
It's not part of SAX as such, just utilities built on top of SAX.
| I'd also like to hear opinions on *where* this functionality should
| be located - if it is not clearly specific to SAX, xml.sax may not
| be the right place.
It would be clearly SAX-specific. However, much of it would also be
usable with the DOM as well. For example, you might use the filters
to transparently perform XInclude processing as the DOM tree is built.
xml.saxlib is probably a better location for it. Or xmlplus.saxlib.
Or whatever.
--Lars M.
From Juergen Hermann"
Message-ID:
On 17 Oct 2000 13:41:47 +0200, Lars Marius Garshol wrote:
>dtddoc and rsskit are XML applications rather than XML infrastructure,
>and so are substantially different from the other two. They are also
>much less mature as ideas and I will probably not develop these as
>actively as the other two.
I think they should not necessarily go into the xml module, but I _DO_
think they fit under the PyXML umbrella. We currently have two top-level=
modules in the CVS tree, "html" and "xml".
We could either open a top-level module for each tool/application, or
create a "tools" module and locate them there. IMHO that is easier than =
to
have a separate SF project for each of those smaller tools that are most=
ly
maintained by the XML-SIG people anyway.
Ciao, J=FCrgen
--
J=FCrgen Hermann, Developer (jhe@webde-ag.de)
WEB.DE AG, http://webde-ag.de/
From GuyM@eurodatasystems.com Tue Oct 17 13:22:21 2000
From: GuyM@eurodatasystems.com (Guy Murphy)
Date: Tue, 17 Oct 2000 13:22:21 +0100
Subject: [XML-SIG] Xalan and Xerces...
Message-ID:
Hiya.
Been lurking for quite a while now.
I understand that the fourthought offering might well be a good one, but I
am a little confused as to what the fourthought offering might bring to
Python that Xalan and Xerces would not, and why the fourthought offering has
been blessed by the SIG for pyXML. [genuine pondering, not rhetorical]
Windows developers are in the fortunate position of having a reference
parser, MSXML to develop for. Any product aimed at the Windows platform can
reasonably stipulate a requirement for MSXML.
Xalan and Xerces I would assert are the most likely candidates to become the
cross platform (or indeed Unix focussed with Windows availability)
reflection of MSXML. A stable, well documented, widely distributed XML
parser and XSL processing engine. It's use by Apache ensures a reasonable
degree of development resource.
Also given that C++ and Java versions of Xalan and Xerces are available,
this would have to me at least seemed a perfect fit for Python and JPython
both.
Why has fourthought's offering been chosen over Xalan and Xerces? [again
genuine question, not rhetorical]
It seems to me that a precious opportunity to become *the* language of
choice for cross-platform XML development is being lost by the Python
community. Python's SAX support is good, but it's DOM support to date is
less than "industrial strength", and doesn't look as if it will be for some
time yet.
If Python had production quality XML/XSL support and a core Apache module (I
realise there are two or more such modules existing, but again IMHO they are
not well focused by the community, and of unverified / unproven strength)
then Python could capitalise on a cross-platform web-development role.
In an ideal world a Python DOM/XPath/XSLT wrapper that could mask either a
Xalan/Xerces or MSXML core, with an automatic switch dependant upon platform
and availability might start to qualify for the term "full XML support".
****Moving slightly but not wholly out of the lists scope****
My own personal view is that such a Web development niche focused upon ease
of XML development is essential for Pythons long term viability as a
development language (as opposed to a spare wrench in the toolbox). This
niche is as much about developers perception of Python as much as it's
actual ability, and takes time to build up.
I would like to have developers think of Python and XML in the same way as
they think of Perl and regular expressions.
I've been playing with .NET and this evening will be playing with Python.NET
(with the .NET Xml library), and if MS ever does manage to get the port
they're after out of Corel then Python like many others risks having itself
assimilated unless it has a real strong niche offering.
Looking forward to my re-education as to why Python has moved this way,
Best Regards,
Guy J Murphy
guym@eurodatasystems.com
From rsalz@caveosystems.com Tue Oct 17 14:45:24 2000
From: rsalz@caveosystems.com (Rich Salz)
Date: Tue, 17 Oct 2000 09:45:24 -0400
Subject: [XML-SIG] Xalan and Xerces...
References:
Message-ID: <39EC57F4.C0060601@caveosystems.com>
> Also given that C++ and Java versions of Xalan and Xerces are available,
> this would have to me at least seemed a perfect fit for Python and JPython
> both.
I don't know that I'd call them two different versions, but rather I'd
say that the Apache group has two different implementations. The API's
look kinda similar, but they're not really that alike, even though they
keep saying things like "port the Java serializer classes." :)
And if there are two, why can't there be a third? It's WAAY too early
to declare victory for one side or the other.
The Xalan/Xerces folks have to spend a fair amount of time dealing with
lots of issues like delete vs delete[], garbage collection (er, sorry,
lazy evaluations when you call Terminate() :), wchar_t/XmlChar, threads
and other platform issues that are really secondary. Because of that, I
believe that PyXML and friends will soon catch up and surpass the Apache
C++ XML efforts.
> Why has fourthought's offering been chosen over Xalan and Xerces? [again
> genuine question, not rhetorical]
I don't think that this has happened. Feel free to write extension code
that wraps the C++ stuff into Python classes. (I'd recommend swig,
www.swig.org). You can probably get xml.xalan and xml.xerces as the
package names without any problem.
> In an ideal world a Python DOM/XPath/XSLT wrapper that could mask either a
> Xalan/Xerces or MSXML core, with an automatic switch dependant upon platform
Last I looked the API's weren't all that similar -- different enough
that some of those wrappers would be pretty hairy.
> I would like to have developers think of Python and XML in the same way as
> they think of Perl and regular expressions.
Many of us already are.
/r$
From uche.ogbuji@fourthought.com Tue Oct 17 16:18:28 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 17 Oct 2000 09:18:28 -0600
Subject: [XML-SIG] the faster way to get a dom.
In-Reply-To: Message from "Martin v. Loewis"
of "Tue, 17 Oct 2000 01:24:47 +0200." <200010162324.BAA00966@loewis.home.cs.tu-berlin.de>
Message-ID: <200010171518.JAA05611@localhost.localdomain>
> > Are the large documents such that a subset of the DOM would suffice for your
> > use?
>
> On another note, does it make a difference to use a different parser?
> I don't know whether sgmlop is sophisticated and compatible enough for
> the ext.reader functions - I'd be interested to learn whether it makes
> any difference, though.
>
> It appears that you can't tell FromXml and friends what parser to use;
> if you have PyXML 0.6.1, you can influence choice of parser by setting
> the PY_SAX_PARSER environment variable (use
> xml.sax,saxexts.make_parser() to see whether it really gives you a
> sgmlop driver).
Right now you can only specify parser by hacking FromXml or influencing
saxlib's choice of parser as you say. It might be useful to write some
low-level 4DOM readers that don't have to go through SAX. For instance, using
sgmlop's low-level interface should be _many_ times faster than expat via
saxlib.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Tue Oct 17 16:21:01 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 17 Oct 2000 09:21:01 -0600
Subject: [XML-SIG] How to proceed
In-Reply-To: Message from Lars Marius Garshol
of "17 Oct 2000 13:41:47 +0200."
Message-ID: <200010171521.JAA05626@localhost.localdomain>
> The only reason I can see for not including xmlproc is that I would
> like to be able to basically develop it the way I want, and add
> whatever features I like and at my own speed. If you think that would
> work just as well under the XML-SIG umbrella then I think we can do
> that.
You could probably have it both ways if you're not averse to having to merge
in changes between your own work and contributions by other SIG members now
and then.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Tue Oct 17 16:40:50 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 17 Oct 2000 09:40:50 -0600
Subject: [XML-SIG] Xalan and Xerces...
In-Reply-To: Message from Guy Murphy
of "Tue, 17 Oct 2000 13:22:21 BST."
Message-ID: <200010171540.JAA05660@localhost.localdomain>
As a Fourthought principal, I've avoided comment thus far, but...
> It seems to me that a precious opportunity to become *the* language of
> choice for cross-platform XML development is being lost by the Python
> community. Python's SAX support is good, but it's DOM support to date is
> less than "industrial strength", and doesn't look as if it will be for some
> time yet.
Odd statement, that. As Lars discovered with his cross-platform DOM
test-suite (and reported to the www-dom list), 4DOM is one of the most
compliant DOM implementations available for any platform. Why do you think
it's less than "industrial-strength"?
> If Python had production quality XML/XSL support and a core Apache module (I
> realise there are two or more such modules existing, but again IMHO they are
> not well focused by the community, and of unverified / unproven strength)
> then Python could capitalise on a cross-platform web-development role.
For core Apache modules, see mod_snake. It's great stuff. We use it in the
soon-to-be-released 4Suite Server.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 22:51:09 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 17 Oct 2000 23:51:09 +0200
Subject: [XML-SIG] PyXML home page on SF
In-Reply-To: (message from
Nicolas Chauvat on Tue, 17 Oct 2000 12:18:41 +0200 (CEST))
References:
Message-ID: <200010172151.XAA00871@loewis.home.cs.tu-berlin.de>
> Yes. But I understand a commercial company (BeOpen) has taken over python
> development. I think it is the kind of free services they could give
> back to the python community.
I don't feel in a position to demand anything like that... In fact, in
free software, nobody normally has the right to demand anything. It
happens that BeOpen/Pythonlabs makes Python available on roughly the
same terms as it was always available - so I don't even see why they
need to give anything else to anybody...
> Now are we to host every single open source project on SourceForge ?
> If we do so, the day SourceForge closes or changes its policy, or
> whatever, every single open source project will be halted or maybe
> discontinued.
Not at all. In the worst case, people would have to change the host
name in their CVS sandboxes; and you might lose your bug data base -
but that would already be a horror scenario. More likely, I'd think
that there would be advance warning about policy changes, to give
people enough time to migrate somewhere else should they feel the
need.
> Why wouldn't a community as big an active as python's put up ressources in
> common to offer such a useful service, but dedicated to python
> development ? There use to be a python.starship.net, maintained by
> volunteers, that is now hosted by BeOpen. Why not take the next step ?
Lack of volunteers, perhaps?
> That's why I think we shouldn't look for someone able to host a
> server for a few projects but for some company(ies) able to put up
> the ressources for "python projects" and volunteers that help them.
If you think you can demand such things from people, just go ahead and
do so. I don't feel the need for that. There are plenty of
alternatives already: Cygnus/RedHat operates sources.redhat.com (aka
sourceware.cygnus.com). They are also willing to host projects.
> That would also make it easier for people to look for python
> ressources: code in development would be at something.python.org and
> 3rd party software at www.vex.net/parnassus.
That something that is needed - something like CPAN. But we are the
wrong SIG for that :-)
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 22:58:25 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 17 Oct 2000 23:58:25 +0200
Subject: [XML-SIG] How to proceed
In-Reply-To: (message from Lars Marius
Garshol on 17 Oct 2000 13:41:47 +0200)
References: <200010161519.JAA02069@localhost.localdomain> <200010162318.BAA00906@loewis.home.cs.tu-berlin.de>
Message-ID: <200010172158.XAA00916@loewis.home.cs.tu-berlin.de>
> The only reason I can see for not including xmlproc is that I would
> like to be able to basically develop it the way I want, and add
> whatever features I like and at my own speed. If you think that would
> work just as well under the XML-SIG umbrella then I think we can do
> that.
Yes, certainly. As long as you continue to maintain it, I assume you
will also respond to people who complain that something broke. With
different people contributing experimental bleeding-edge code, I would
expect many releases to have glitches here and there - if we all agree
to work towards a "stable" release from time to time, then the better.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Tue Oct 17 23:03:13 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 18 Oct 2000 00:03:13 +0200
Subject: [XML-SIG] How to proceed
In-Reply-To: (message from Lars Marius
Garshol on 17 Oct 2000 13:46:01 +0200)
References: <200010162308.BAA00830@loewis.home.cs.tu-berlin.de>
Message-ID: <200010172203.AAA00965@loewis.home.cs.tu-berlin.de>
> My idea for saxlib is that it should be a toolkit with SAX 2.0-related
> add-ons. I didn't really intend for it to contain SAX 2.0 itself, just
> useful drivers, filters and similar kinds of utilities.
And I see that I was just confusing it with saxutils, sorry - there is
no need to synchronize saxlib with Python 2.0 [as that has no
xml.sax.saxlib].
For xml.sax.saxlib, there is then only the backwards compatibility
concern with SAX1 - we probably have to support the SAX1 classes in
saxlib as long as people have SAX1 applications.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Wed Oct 18 00:22:45 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 18 Oct 2000 01:22:45 +0200
Subject: [XML-SIG] Xalan and Xerces...
In-Reply-To: (message from
Guy Murphy on Tue, 17 Oct 2000 13:22:21 +0100)
References:
Message-ID: <200010172322.BAA01705@loewis.home.cs.tu-berlin.de>
> I understand that the fourthought offering might well be a good one, but I
> am a little confused as to what the fourthought offering might bring to
> Python that Xalan and Xerces would not, and why the fourthought offering has
> been blessed by the SIG for pyXML. [genuine pondering, not rhetorical]
I haven't used Xalan or Xerces - but how exactly would you integrated
into PyXML? I think that is technically not feasible, at least not in
a 99% pure Python approach.
In addition, PyXML supports a number of parsers - validating ones,
fast ones, and super fast ones - I'd see no point in adding another
parser, unless it provides features not found in any of the existing
parsers.
> Windows developers are in the fortunate position of having a
> reference parser, MSXML to develop for. Any product aimed at the
> Windows platform can reasonably stipulate a requirement for MSXML.
Python, with Python 2, is also in the fortunate position of having a
reference parser - xml.parsers.expat. With PyXML, you get xmlproc and
sgmlop in addition to that.
> Xalan and Xerces I would assert are the most likely candidates to become the
> cross platform (or indeed Unix focussed with Windows availability)
> reflection of MSXML.
Nah, can't be :-) Python 2 (shipping today) already provides
cross-platform XML parsing - for Python. There is nothing wrong with
Xerces providing the same thing for Java - although I'd prefer a
parser running in compiled code any time.
> A stable, well documented, widely distributed XML parser and XSL
> processing engine.
For the PyXML parsers, I think pretty much the same can be said.
> Also given that C++ and Java versions of Xalan and Xerces are available,
> this would have to me at least seemed a perfect fit for Python and JPython
> both.
I can't really comment on the quality of the C++ version of Xerces - I
can't emagine it is completely compatible to the Java version,
though. Even if it was, arranging the same *Python* interface to both
might be a challenge.
> Why has fourthought's offering been chosen over Xalan and Xerces?
> [again genuine question, not rhetorical]
Please understand that PyXML is *not* a Fourthought offering. They
have provided the DOM implementation, and they will provide the XSLT
implementation - the parsers come from many other sources.
Being confronted with Xerces for the first time, I took the
opportunity to port their SAXCount example to PyXML, which took me
half an hour (plus minus five minutes), including installing Xerces.
On my system (AMD K6, 350MHz, JDK 1.3.0beta-b07) I got the following
results:
Xerces with no options:
data/personal.xml: 903 ms (37 elems, 18 attrs, 26 spaces, 242 chars)
Xerces with -w (i.e. parse the file once, then measure time for second run)
data/personal.xml: 85 ms (37 elems, 18 attrs, 26 spaces, 242 chars)
PyXML 0.6.1, expat as the parser:
data/personal.xml: 0.0128449s (37 elems, 12 attrs,0 spaces, 268 chars)
First, you'll notice that Python beats Java by an order of magnitude
even in the "fast" java case. I'm not really surprised - expat is a
fast parser, and it is written in C.
Next, you'll notice that expat does not report ignorableWhitespace;
instead, the spaces are reported as character data. I'm not sure which
one is right here (or whether both are acceptable) - both parsers
operate in a non-validating mode. Somebody cares to clarify.
The difference in number of attributes apparently comes from Xerces
passing the default value for an implied attribute from the DTD,
whereas expat doesn't.
See for the source of that ported example below.
> If Python had production quality XML/XSL support and a core Apache
> module (I realise there are two or more such modules existing, but
> again IMHO they are not well focused by the community, and of
> unverified / unproven strength) then Python could capitalise on a
> cross-platform web-development role.
I think Python does capitalise on a cross-platform web-development
role. However, if you think more needs to be done - just go ahead and
do it :-)
> In an ideal world a Python DOM/XPath/XSLT wrapper that could mask
> either a Xalan/Xerces or MSXML core, with an automatic switch
> dependant upon platform and availability might start to qualify for
> the term "full XML support".
I can imagine using MSXML when that is available, and Xerces when it
is available (i.e. in JPython). That should be as simple to support as
adding SAX drivers. However, that is not strictly necessary - Python
has "full XML support" right now.
> My own personal view is that such a Web development niche focused upon ease
> of XML development is essential for Pythons long term viability as a
> development language (as opposed to a spare wrench in the toolbox).
That is a little bit too much of marketing speak for me. I will
continue to use Python as long as it is useful for me - regardless of
others considering it viable for something or not.
> I would like to have developers think of Python and XML in the same
> way as they think of Perl and regular expressions.
I don't think spreading FUD that Python currently does not support XML
does help for that, though...
Regards,
Martin
# Example adapted from Xerces' sax.SAXCount
from xml.sax import ContentHandler, make_parser
from xml.sax.handler import feature_namespaces
from time import time
setValidation = 0
setNameSpaces = 1
setSchemaSupport = 1
warmup = 0
class SAXCount(ContentHandler):
def startDocument(self):
if warmup:return
self.elems = 0
self.attrs = 0
self.chars = 0
self.spaces = 0
def startElementNS(self,name,qname,attrs):
if warmup:return
self.elems += 1
self.attrs += len(attrs)
def characters(self,chars):
if warmup:return
self.chars += len(chars)
def ignorableWhitespace(self,chars):
if warmup:return
self.spaces += len(chars)
def printResults(self, uri, time):
print "%s: %gs" % (uri, time),
print "(%(elems)d elems, %(attrs)d attrs,"\
"%(spaces)d spaces, %(chars)d chars)" %\
vars(self)
def printit(uri):
global warmup
counter = SAXCount()
parser = make_parser()
parser.setContentHandler(counter)
# not setting error handler
# parser.setFeature(feature_validation, setValidation)
parser.setFeature(feature_namespaces, setNameSpaces)
# parser.setFeature(feature_schema, setSchema)
parser.parse(uri)
if warmup:
parser.parse(uri)
parser.reset()
warmup = 0
start = time()
parser.parse(uri)
counter.printResults(uri,time()-start)
if __name__=='__main__':
# todo: argument processing
import sys,getopt
opts, args = getopt.getopt(sys.argv[1:], "w")
for opt,val in opts:
if opt == '-w':
warmup = 1
printit(args[0])
From martin@loewis.home.cs.tu-berlin.de Wed Oct 18 00:27:20 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 18 Oct 2000 01:27:20 +0200
Subject: [XML-SIG] How to proceed
In-Reply-To: <200010171521.JAA05626@localhost.localdomain>
(uche.ogbuji@fourthought.com)
References: <200010171521.JAA05626@localhost.localdomain>
Message-ID: <200010172327.BAA01750@loewis.home.cs.tu-berlin.de>
> You could probably have it both ways if you're not averse to having to merge
> in changes between your own work and contributions by other SIG members now
> and then.
Alternatively, we could make use of CVS branches if there are features
that may take time to get in a usable shape, and that would break
existing code even during that time.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Wed Oct 18 00:26:01 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 18 Oct 2000 01:26:01 +0200
Subject: [XML-SIG] the faster way to get a dom.
In-Reply-To: <200010171518.JAA05611@localhost.localdomain>
(uche.ogbuji@fourthought.com)
References: <200010171518.JAA05611@localhost.localdomain>
Message-ID: <200010172326.BAA01735@loewis.home.cs.tu-berlin.de>
> Right now you can only specify parser by hacking FromXml or
> influencing saxlib's choice of parser as you say. It might be
> useful to write some low-level 4DOM readers that don't have to go
> through SAX. For instance, using sgmlop's low-level interface
> should be _many_ times faster than expat via saxlib.
As a starting point, I'd try using sgmlop through the sgmlop SAX
driver. To achieve that, it would be simplest if the From* methods
took a parser= keyword argument which would allow the caller to
specify a pre-fabricated parser object. This is a small effort and
might reveal whether sgmlop is suitable for building DOM trees or not
(and perhaps fix it if it is not).
Regards,
Martin
From uche.ogbuji@fourthought.com Wed Oct 18 00:43:42 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 17 Oct 2000 17:43:42 -0600
Subject: [XML-SIG] Xalan and Xerces...
In-Reply-To: Message from "Martin v. Loewis"
of "Wed, 18 Oct 2000 01:22:45 +0200." <200010172322.BAA01705@loewis.home.cs.tu-berlin.de>
Message-ID: <200010172343.RAA06974@localhost.localdomain>
> Being confronted with Xerces for the first time, I took the
> opportunity to port their SAXCount example to PyXML, which took me
> half an hour (plus minus five minutes), including installing Xerces.
>
> On my system (AMD K6, 350MHz, JDK 1.3.0beta-b07) I got the following
> results:
>
> Xerces with no options:
> data/personal.xml: 903 ms (37 elems, 18 attrs, 26 spaces, 242 chars)
> Xerces with -w (i.e. parse the file once, then measure time for second run)
> data/personal.xml: 85 ms (37 elems, 18 attrs, 26 spaces, 242 chars)
> PyXML 0.6.1, expat as the parser:
> data/personal.xml: 0.0128449s (37 elems, 12 attrs,0 spaces, 268 chars)
Good stats to have on hand. Thanks.
> First, you'll notice that Python beats Java by an order of magnitude
> even in the "fast" java case. I'm not really surprised - expat is a
> fast parser, and it is written in C.
>
> Next, you'll notice that expat does not report ignorableWhitespace;
> instead, the spaces are reported as character data. I'm not sure which
> one is right here (or whether both are acceptable) - both parsers
> operate in a non-validating mode. Somebody cares to clarify.
There is really no such thing as ignorable whitespace in non-validating mode.
According to XML 1.0, white-space can only be ignored when it occurs where the
is no corresponding #PCDATA in the content model from the DTD. Since the DTD
is not used in non-validating mode, the parser _cannot_ make assumptions that
it's ignorable.
So in this case expat is right and Xerces is wrong.
> The difference in number of attributes apparently comes from Xerces
> passing the default value for an implied attribute from the DTD,
> whereas expat doesn't.
Since expat is strictly non-validating, this is quite valid.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From uche.ogbuji@fourthought.com Wed Oct 18 00:48:46 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 17 Oct 2000 17:48:46 -0600
Subject: [XML-SIG] the faster way to get a dom.
In-Reply-To: Message from "Martin v. Loewis"
of "Wed, 18 Oct 2000 01:26:01 +0200." <200010172326.BAA01735@loewis.home.cs.tu-berlin.de>
Message-ID: <200010172348.RAA07014@localhost.localdomain>
> > Right now you can only specify parser by hacking FromXml or
> > influencing saxlib's choice of parser as you say. It might be
> > useful to write some low-level 4DOM readers that don't have to go
> > through SAX. For instance, using sgmlop's low-level interface
> > should be _many_ times faster than expat via saxlib.
>
> As a starting point, I'd try using sgmlop through the sgmlop SAX
> driver. To achieve that, it would be simplest if the From* methods
> took a parser= keyword argument which would allow the caller to
> specify a pre-fabricated parser object. This is a small effort and
> might reveal whether sgmlop is suitable for building DOM trees or not
> (and perhaps fix it if it is not).
Right-O.
I'll add this and post some bench-marks so we can think it over.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From Nicolas.Chauvat@logilab.fr Wed Oct 18 09:35:02 2000
From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat)
Date: Wed, 18 Oct 2000 10:35:02 +0200 (CEST)
Subject: [XML-SIG] PyXML home page on SF
In-Reply-To: <200010172151.XAA00871@loewis.home.cs.tu-berlin.de>
Message-ID:
On Tue, 17 Oct 2000, Martin v. Loewis wrote:
[about my suggestion concerning a SourceForge dedicated to python]
That was only a suggestion, not a demand as someone stated. I would love
to set up a pythonforge myself if I had the resources, unfortunately I
don't... maybe in a few months? Time will tell. For now I'll just go
back to hacking and keep quiet.
Cheers, :)
--=20
Nicolas Chauvat
http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F=
rance)
From martin@loewis.home.cs.tu-berlin.de Thu Oct 19 07:06:43 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 19 Oct 2000 08:06:43 +0200
Subject: [XML-SIG] XML topic guide in CVS
Message-ID: <200010190606.IAA01575@loewis.home.cs.tu-berlin.de>
I've extracted what I believe is the content of http://www.python.org
as the www module of the PyXML CVS (i.e. a root of
cvs.pyxml.sourceforge.net:/cvsroot/pyxml, and a repository of www). If
you know any file that's missing, please let me know (or add it
yourself).
The ht2html generated files are *not* part of the repository. Instead,
I put ht2html into the www module as well, and intend to regenerate
the .html files on every commit. Unfortunately, write access to the
WWW pages is currently denied on SF, so I cannot test the
procedure. If you want to have a glance on what it should do, please
look into the files commitprog and loginfo of the CVSROOT module.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Thu Oct 19 07:22:08 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 19 Oct 2000 08:22:08 +0200
Subject: [XML-SIG] XML topic guide in CVS
In-Reply-To: <200010190606.IAA01575@loewis.home.cs.tu-berlin.de>
(martin@loewis.home.cs.tu-berlin.de)
References: <200010190606.IAA01575@loewis.home.cs.tu-berlin.de>
Message-ID: <200010190622.IAA01737@loewis.home.cs.tu-berlin.de>
> I've extracted what I believe is the content of http://www.python.org
Oops: http://www.python.org/topics/xml
Martin
From ibarg@as.arizona.edu Thu Oct 19 18:23:57 2000
From: ibarg@as.arizona.edu (Irene Barg)
Date: Thu, 19 Oct 2000 10:23:57 -0700
Subject: [XML-SIG] PxXML-0.6.1 html_builder?
Message-ID: <39EF2E2D.E08BC9D7@as.arizona.edu>
XML-SIG,
I installed PxXML-0.6.1 on RedHat Linux 6.1, Python 1.5.2 and
noticed that neither 'builder' or 'html_builder' exist. The
'PxXML-0.6.1/xml/dom/ChangeLog' says that 'Builder is now
deprecated ...'. Yet, there is still an example in the dom/demo
called 'html2html' that imports 'html_builder', and 'builder.py'
imports 'builder'. Are there examples of 4DOM modules that
replace these?
Thanks,
--irene
------------------------------------------------------------------
Irene Barg Email: ibarg@as.arizona.edu
Steward Observatory Phone: 520-621-2602
933 N. Cherry Ave.
University of Arizona FAX: 520-621-1891
Tucson, AZ 85721 http://nickel.as.arizona.edu/~barg
------------------------------------------------------------------
From uche.ogbuji@fourthought.com Fri Oct 20 03:56:15 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Thu, 19 Oct 2000 20:56:15 -0600
Subject: [XML-SIG] PxXML-0.6.1 html_builder?
In-Reply-To: Message from Irene Barg
of "Thu, 19 Oct 2000 10:23:57 PDT." <39EF2E2D.E08BC9D7@as.arizona.edu>
Message-ID: <200010200256.UAA10726@localhost.localdomain>
For an example of creating a DOM from an html file, see
dom/demos/dom_from_html_file.py
For an example of creating HTML dynamically, see dom/demos/generate_html1.py
Good luck.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From per@sbc.su.se Fri Oct 20 10:02:08 2000
From: per@sbc.su.se (Per Kraulis)
Date: Fri, 20 Oct 2000 11:02:08 +0200
Subject: [XML-SIG] error in test_sax.py, and fix
Message-ID: <39F00A10.1C510662@sbc.su.se>
This is a multi-part message in MIME format.
--------------78802F8B16C4F64AA1F9652C
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Hi,
Having installed PyXML-0.6.1 under Python 1.5.2 (Linux RedHat 6.2), I
had some problems running the PyXML-0.6.1/test/regrtest.py script. I
have isolated one particular problem, and fixed it (I think) by
rearranging the order of some statements in the script, which seemed to
be erroneous. I attach the fixed script; do a comparison with the
original to see what I did.
Original error output:
per@sandman $ python test_sax.py
Traceback (innermost last):
File "test_sax.py", line 283, in ?
xml_test_out = open(findfile("test.xml.out")).read()
IOError: [Errno 2] No such file or directory: 'test.xml.out'
Cheerio, and thanks for the hard work,
Per Kraulis
--
Per J. Kraulis, Ph.D. per@sbc.su.se
Stockholm Bioinformatics Center (SBC) http://www.sbc.su.se/~per
Dept. Biochemistry, Stockholm University phone +46 (0)8 - 674 78 17
SE-106 91 Stockholm, SWEDEN fax +46 (0)8 - 15 80 57
--------------78802F8B16C4F64AA1F9652C
Content-Type: text/plain; charset=iso-8859-1;
name="test_sax.py"
Content-Transfer-Encoding: 8bit
Content-Disposition: inline;
filename="test_sax.py"
# regression test for SAX 2.0
# $Id: test_sax.py,v 1.3 2000/10/07 18:30:11 loewis Exp $
from xml.sax import make_parser, ContentHandler, \
SAXException, SAXReaderNotAvailable, SAXParseException
try:
make_parser()
except SAXReaderNotAvailable:
# don't try to test this module if we cannot create a parser
raise ImportError("no XML parsers available")
from xml.sax.saxutils import XMLGenerator, escape, XMLFilterBase
from xml.sax.expatreader import create_parser
from xml.sax.xmlreader import InputSource, AttributesImpl, AttributesNSImpl
from cStringIO import StringIO
from test.test_support import verbose, TestFailed, findfile
# ===== Utilities
tests = 0
fails = 0
def confirm(outcome, name):
global tests, fails
tests = tests + 1
if outcome:
print "Passed", name
else:
print "Failed", name
fails = fails + 1
# ===========================================================================
#
# saxutils tests
#
# ===========================================================================
# ===== escape
def test_escape_basic():
return escape("Donald Duck & Co") == "Donald Duck & Co"
def test_escape_all():
return escape("") == "<Donald Duck & Co>"
def test_escape_extra():
return escape("Hei på deg", {"å" : "å"}) == "Hei på deg"
def test_make_parser():
try:
# Creating a parser should succeed - it should fall back
# to the expatreader
p = make_parser(['xml.parsers.no_such_parser'])
except:
return 0
else:
return p
# ===== XMLGenerator
start = '\n'
def test_xmlgen_basic():
result = StringIO()
gen = XMLGenerator(result)
gen.startDocument()
gen.startElement("doc", {})
gen.endElement("doc")
gen.endDocument()
return result.getvalue() == start + ""
def test_xmlgen_content():
result = StringIO()
gen = XMLGenerator(result)
gen.startDocument()
gen.startElement("doc", {})
gen.characters("huhei")
gen.endElement("doc")
gen.endDocument()
return result.getvalue() == start + "huhei"
def test_xmlgen_pi():
result = StringIO()
gen = XMLGenerator(result)
gen.startDocument()
gen.processingInstruction("test", "data")
gen.startElement("doc", {})
gen.endElement("doc")
gen.endDocument()
return result.getvalue() == start + ""
def test_xmlgen_content_escape():
result = StringIO()
gen = XMLGenerator(result)
gen.startDocument()
gen.startElement("doc", {})
gen.characters("<huhei&"
def test_xmlgen_ignorable():
result = StringIO()
gen = XMLGenerator(result)
gen.startDocument()
gen.startElement("doc", {})
gen.ignorableWhitespace(" ")
gen.endElement("doc")
gen.endDocument()
return result.getvalue() == start + " "
ns_uri = "http://www.python.org/xml-ns/saxtest/"
def test_xmlgen_ns():
result = StringIO()
gen = XMLGenerator(result)
gen.startDocument()
gen.startPrefixMapping("ns1", ns_uri)
gen.startElementNS((ns_uri, "doc"), "ns1:doc", {})
# add an unqualified name
gen.startElementNS((None, "udoc"), None, {})
gen.endElementNS((None, "udoc"), None)
gen.endElementNS((ns_uri, "doc"), "ns1:doc")
gen.endPrefixMapping("ns1")
gen.endDocument()
return result.getvalue() == start + \
('' %
ns_uri)
# ===== XMLFilterBase
def test_filter_basic():
result = StringIO()
gen = XMLGenerator(result)
filter = XMLFilterBase()
filter.setContentHandler(gen)
filter.startDocument()
filter.startElement("doc", {})
filter.characters("content")
filter.ignorableWhitespace(" ")
filter.endElement("doc")
filter.endDocument()
return result.getvalue() == start + "content "
# ===========================================================================
#
# expatreader tests
#
# ===========================================================================
# ===== DTDHandler support
class TestDTDHandler:
def __init__(self):
self._notations = []
self._entities = []
def notationDecl(self, name, publicId, systemId):
self._notations.append((name, publicId, systemId))
def unparsedEntityDecl(self, name, publicId, systemId, ndata):
self._entities.append((name, publicId, systemId, ndata))
def test_expat_dtdhandler():
parser = create_parser()
handler = TestDTDHandler()
parser.setDTDHandler(handler)
parser.feed('\n')
parser.feed(' \n')
parser.feed(']>\n')
parser.feed('')
parser.close()
return handler._notations == [("GIF", "-//CompuServe//NOTATION Graphics Interchange Format 89a//EN", None)] and \
handler._entities == [("img", None, "expat.gif", "GIF")]
# ===== EntityResolver support
class TestEntityResolver:
def resolveEntity(self, publicId, systemId):
inpsrc = InputSource()
inpsrc.setByteStream(StringIO(""))
return inpsrc
def test_expat_entityresolver():
parser = create_parser()
parser.setEntityResolver(TestEntityResolver())
result = StringIO()
parser.setContentHandler(XMLGenerator(result))
parser.feed('\n')
parser.feed(']>\n')
parser.feed('&test;')
parser.close()
return result.getvalue() == start + ""
# ===== Attributes support
class AttrGatherer(ContentHandler):
def startElement(self, name, attrs):
self._attrs = attrs
def startElementNS(self, name, qname, attrs):
self._attrs = attrs
def test_expat_attrs_empty():
parser = create_parser()
gather = AttrGatherer()
parser.setContentHandler(gather)
parser.feed("")
parser.close()
return verify_empty_attrs(gather._attrs)
def test_expat_attrs_wattr():
parser = create_parser()
gather = AttrGatherer()
parser.setContentHandler(gather)
parser.feed("")
parser.close()
return verify_attrs_wattr(gather._attrs)
def test_expat_nsattrs_empty():
parser = create_parser(1)
gather = AttrGatherer()
parser.setContentHandler(gather)
parser.feed("")
parser.close()
return verify_empty_nsattrs(gather._attrs)
def test_expat_nsattrs_wattr():
parser = create_parser(1)
gather = AttrGatherer()
parser.setContentHandler(gather)
parser.feed("" % ns_uri)
parser.close()
attrs = gather._attrs
return attrs.getLength() == 1 and \
attrs.getNames() == [(ns_uri, "attr")] and \
attrs.getQNames() == [] and \
len(attrs) == 1 and \
attrs.has_key((ns_uri, "attr")) and \
attrs.keys() == [(ns_uri, "attr")] and \
attrs.get((ns_uri, "attr")) == "val" and \
attrs.get((ns_uri, "attr"), 25) == "val" and \
attrs.items() == [((ns_uri, "attr"), "val")] and \
attrs.values() == ["val"] and \
attrs.getValue((ns_uri, "attr")) == "val" and \
attrs[(ns_uri, "attr")] == "val"
# ===== InputSource support
def test_expat_inpsource_filename():
parser = create_parser()
result = StringIO()
xmlgen = XMLGenerator(result)
parser.setContentHandler(xmlgen)
parser.parse(findfile("test.xml"))
return result.getvalue() == xml_test_out
def test_expat_inpsource_sysid():
parser = create_parser()
result = StringIO()
xmlgen = XMLGenerator(result)
parser.setContentHandler(xmlgen)
parser.parse(InputSource(findfile("test.xml")))
return result.getvalue() == xml_test_out
def test_expat_inpsource_stream():
parser = create_parser()
result = StringIO()
xmlgen = XMLGenerator(result)
parser.setContentHandler(xmlgen)
inpsrc = InputSource()
inpsrc.setByteStream(open(findfile("test.xml")))
parser.parse(inpsrc)
return result.getvalue() == xml_test_out
# ===========================================================================
#
# error reporting
#
# ===========================================================================
def test_expat_inpsource_location():
parser = create_parser()
parser.setContentHandler(ContentHandler()) # do nothing
source = InputSource()
source.setByteStream(StringIO("")) #ill-formed
name = "a file name"
source.setSystemId(name)
try:
parser.parse(source)
except SAXException, e:
return e.getSystemId() == name
def test_expat_incomplete():
parser = create_parser()
parser.setContentHandler(ContentHandler()) # do nothing
try:
parser.parse(StringIO(""))
except SAXParseException:
return 1 # ok, error found
else:
return 0
# ===========================================================================
#
# xmlreader tests
#
# ===========================================================================
# ===== AttributesImpl
def verify_empty_attrs(attrs):
try:
attrs.getValue("attr")
gvk = 0
except KeyError:
gvk = 1
try:
attrs.getValueByQName("attr")
gvqk = 0
except KeyError:
gvqk = 1
try:
attrs.getNameByQName("attr")
gnqk = 0
except KeyError:
gnqk = 1
try:
attrs.getQNameByName("attr")
gqnk = 0
except KeyError:
gqnk = 1
try:
attrs["attr"]
gik = 0
except KeyError:
gik = 1
return attrs.getLength() == 0 and \
attrs.getNames() == [] and \
attrs.getQNames() == [] and \
len(attrs) == 0 and \
not attrs.has_key("attr") and \
attrs.keys() == [] and \
attrs.get("attrs") == None and \
attrs.get("attrs", 25) == 25 and \
attrs.items() == [] and \
attrs.values() == [] and \
gvk and gvqk and gnqk and gik and gqnk
def verify_attrs_wattr(attrs):
return attrs.getLength() == 1 and \
attrs.getNames() == ["attr"] and \
attrs.getQNames() == ["attr"] and \
len(attrs) == 1 and \
attrs.has_key("attr") and \
attrs.keys() == ["attr"] and \
attrs.get("attr") == "val" and \
attrs.get("attr", 25) == "val" and \
attrs.items() == [("attr", "val")] and \
attrs.values() == ["val"] and \
attrs.getValue("attr") == "val" and \
attrs.getValueByQName("attr") == "val" and \
attrs.getNameByQName("attr") == "attr" and \
attrs["attr"] == "val" and \
attrs.getQNameByName("attr") == "attr"
def test_attrs_empty():
return verify_empty_attrs(AttributesImpl({}))
def test_attrs_wattr():
return verify_attrs_wattr(AttributesImpl({"attr" : "val"}))
# ===== AttributesImpl
def verify_empty_nsattrs(attrs):
try:
attrs.getValue((ns_uri, "attr"))
gvk = 0
except KeyError:
gvk = 1
try:
attrs.getValueByQName("ns:attr")
gvqk = 0
except KeyError:
gvqk = 1
try:
attrs.getNameByQName("ns:attr")
gnqk = 0
except KeyError:
gnqk = 1
try:
attrs.getQNameByName((ns_uri, "attr"))
gqnk = 0
except KeyError:
gqnk = 1
try:
attrs[(ns_uri, "attr")]
gik = 0
except KeyError:
gik = 1
return attrs.getLength() == 0 and \
attrs.getNames() == [] and \
attrs.getQNames() == [] and \
len(attrs) == 0 and \
not attrs.has_key((ns_uri, "attr")) and \
attrs.keys() == [] and \
attrs.get((ns_uri, "attr")) == None and \
attrs.get((ns_uri, "attr"), 25) == 25 and \
attrs.items() == [] and \
attrs.values() == [] and \
gvk and gvqk and gnqk and gik and gqnk
def test_nsattrs_empty():
return verify_empty_nsattrs(AttributesNSImpl({}, {}))
def test_nsattrs_wattr():
attrs = AttributesNSImpl({(ns_uri, "attr") : "val"},
{(ns_uri, "attr") : "ns:attr"})
return attrs.getLength() == 1 and \
attrs.getNames() == [(ns_uri, "attr")] and \
attrs.getQNames() == ["ns:attr"] and \
len(attrs) == 1 and \
attrs.has_key((ns_uri, "attr")) and \
attrs.keys() == [(ns_uri, "attr")] and \
attrs.get((ns_uri, "attr")) == "val" and \
attrs.get((ns_uri, "attr"), 25) == "val" and \
attrs.items() == [((ns_uri, "attr"), "val")] and \
attrs.values() == ["val"] and \
attrs.getValue((ns_uri, "attr")) == "val" and \
attrs.getValueByQName("ns:attr") == "val" and \
attrs.getNameByQName("ns:attr") == (ns_uri, "attr") and \
attrs[(ns_uri, "attr")] == "val" and \
attrs.getQNameByName((ns_uri, "attr")) == "ns:attr"
# ===== Main program
def make_test_output():
parser = create_parser()
result = StringIO()
xmlgen = XMLGenerator(result)
parser.setContentHandler(xmlgen)
parser.parse(findfile("test.xml"))
outf = open(findfile("test.xml.out"), "w")
outf.write(result.getvalue())
outf.close()
make_test_output()
xml_test_out = open(findfile("test.xml.out")).read()
items = locals().items()
items.sort()
for (name, value) in items:
if name[ : 5] == "test_":
confirm(value(), name)
print "%d tests, %d failures" % (tests, fails)
if fails != 0:
raise TestFailed, "%d of %d tests failed" % (fails, tests)
--------------78802F8B16C4F64AA1F9652C--
From andorxor@gmx.de Fri Oct 20 13:06:51 2000
From: andorxor@gmx.de (Stephan Tolksdorf)
Date: Fri, 20 Oct 2000 14:06:51 +0200
Subject: [XML-SIG] normalize() for minidom
Message-ID: <11011989820.20001020140651@email.com>
Hello,
I'd like to have the normalize() method of DOM2's node interface
(DOM1's normalize() was in the element interface) in minidom
included.
You will find my try of an implementation at the end of this mail.
--- documentation ---
This is the description from the candidate recommendation of dom2:
normalize (introduced in DOM Level 2)
Puts all Text nodes in the full depth of the sub-tree underneath this
Node, including attribute nodes, into a "normal" form where only
structure (e.g., elements, comments, processing instructions, CDATA
sections, and entity references) separates Text nodes, i.e., there
are neither adjacent Text nodes nor empty Text nodes. This can be
used to ensure that the DOM view of a document is the same as if
it were saved and re-loaded, and is useful when operations (such
as XPointer lookups) that depend on a particular document tree
structure are to be used. Note: In cases where the document
contains CDATASections, the normalize operation alone may not be
sufficient, since XPointers do not differentiate between Text
nodes and CDATASection nodes.
No Parameters
No Return Value
No Exceptions
--- implementation ---
class Node:
(...)
def normalize(self):
"""Joins adjacent Text nodes and deletes empty Text nodes
in the full depth of the sub-tree underneath this Node.
"""
i = 0
while i < len(self.childNodes):
cn = self.childNodes[i]
if cn.nodeType == Node.TEXT_NODE:
i += 1
# join adjecent Text nodes
while i < len(self.childNodes) and self.childNodes[i].nodeType == Node.TEXT_NODE:
cn.nodeValue = cn.data = cn.data + self.childNodes[i].data
del(self.childNodes[i])
# delete empty nodes
if cn.nodeValue == "":
i -= 1
del(self.childNodes[i])
continue
elif cn.nodeType == Node.ELEMENT_NODE: cn.normalize()
i += 1
------
Best regards,
Stephan Tolksdorf
From martin@loewis.home.cs.tu-berlin.de Fri Oct 20 17:54:21 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 20 Oct 2000 18:54:21 +0200
Subject: [XML-SIG] error in test_sax.py, and fix
In-Reply-To: <39F00A10.1C510662@sbc.su.se> (message from Per Kraulis on Fri,
20 Oct 2000 11:02:08 +0200)
References: <39F00A10.1C510662@sbc.su.se>
Message-ID: <200010201654.SAA00763@loewis.home.cs.tu-berlin.de>
> Having installed PyXML-0.6.1 under Python 1.5.2 (Linux RedHat 6.2), I
> had some problems running the PyXML-0.6.1/test/regrtest.py script. I
> have isolated one particular problem, and fixed it (I think) by
> rearranging the order of some statements in the script, which seemed to
> be erroneous. I attach the fixed script; do a comparison with the
> original to see what I did.
Thanks for your report and your patch. Since test.xml.out is the
expected output, creating it on each run reduces the strength of the
test. Instead, that file should have been distributed as a source
file. It is already in the CVS, only the MANIFEST.in failed to
mention it. I've corrected that, so 0.6.2 should include that file.
Thanks again,
Martin
From martin@loewis.home.cs.tu-berlin.de Fri Oct 20 18:21:29 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 20 Oct 2000 19:21:29 +0200
Subject: [XML-SIG] normalize() for minidom
In-Reply-To: <11011989820.20001020140651@email.com> (message from Stephan
Tolksdorf on Fri, 20 Oct 2000 14:06:51 +0200)
References: <11011989820.20001020140651@email.com>
Message-ID: <200010201721.TAA00922@loewis.home.cs.tu-berlin.de>
> I'd like to have the normalize() method of DOM2's node interface
> (DOM1's normalize() was in the element interface) in minidom
> included.
Looks fine to me. I've massaged it a little (mostly to restore 1.5.2
compatibility), and committed it into PyXML, to appear as part of 0.6.2.
Regards,
Martin
From larsga@garshol.priv.no Tue Oct 24 12:50:21 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 24 Oct 2000 13:50:21 +0200
Subject: [XML-SIG] Re: How to proceed
Message-ID:
* Lars Marius Garshol
|
| The only reason I can see for not including xmlproc is that I would
| like to be able to basically develop it the way I want, and add
| whatever features I like and at my own speed. If you think that
| would work just as well under the XML-SIG umbrella then I think we
| can do that.
* Martin v. Loewis
|
| Yes, certainly. As long as you continue to maintain it, I assume you
| will also respond to people who complain that something broke.
Of course.
| With different people contributing experimental bleeding-edge code,
| I would expect many releases to have glitches here and there - if we
| all agree to work towards a "stable" release from time to time, then
| the better.
Then I think we've settled that. Once I replace my monitor I will
commit the xmlproc regression tests to the PyXML CVS tree and start
developing it there.
The question of rsskit and dtddoc still remains open, though.
--Lars M.
From larsga@garshol.priv.no Tue Oct 24 12:50:55 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 24 Oct 2000 13:50:55 +0200
Subject: [XML-SIG] Re: How to proceed
Message-ID:
* Lars Marius Garshol
|
| My idea for saxlib is that it should be a toolkit with SAX
| 2.0-related add-ons. I didn't really intend for it to contain SAX
| 2.0 itself, just useful drivers, filters and similar kinds of
| utilities.
* Martin v. Loewis
|
| And I see that I was just confusing it with saxutils, sorry - there
| is no need to synchronize saxlib with Python 2.0 [as that has no
| xml.sax.saxlib].
I think we should stop calling it saxlib, since that just seems to
confuse people. I'm thinking that it should not be a part of SAX 2.0,
but that it should be a set of useful utilities and also some add-ons.
(The LexicalHandler and DeclHandler will be in there, for example.)
My list of the contents, as I currently envision it is:
- extensions
- LexicalHandler
- DeclHandler
- extra drivers
- JPython driver
- SP driver
- xmllib/sgmlop driver
- RXP driver
(xmlproc driver will be part of xmlproc: xml.parsers.xmlproc.drv_xmlproc)
- utilities
- ErrorPrinter, ErrorRaiser, ErrorRecorder
- EventTracer
- XBaseFilter, XIncludeFilter
- ParserManager
- DOM2SAX event generator
- CanonicalXMLGenerator
- DTDValidatingFilter, SchemaValidatingFilter
So maybe saxkit or saxpack would be a better name.
| For xml.sax.saxlib, there is then only the backwards compatibility
| concern with SAX1 - we probably have to support the SAX1 classes in
| saxlib as long as people have SAX1 applications.
I think we should use a different module, possibly even a different
package. Would xml.saxkit.* work? Or xml.saxexts.*? Or xml.saxpack.*?
Comments would be very welcome, as I have some of this stuff already,
and would like to be able to place it where it belongs.
--Lars M.
From Juergen Hermann"
Message-ID:
On 24 Oct 2000 13:50:21 +0200, Lars Marius Garshol wrote:
>The question of rsskit and dtddoc still remains open, though.
I prefer one of two options:
* make both a part of the PyXML project, and put them into the
repository as siblings(!) of the "xml" directory.
* make them their own project(s) on SourceForge (PyXMLTools?).
Ciao, J=FCrgen
--
J=FCrgen Hermann, Developer (jhe@webde-ag.de)
WEB.DE AG, http://webde-ag.de/
From fdrake@acm.org Tue Oct 24 14:43:13 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 24 Oct 2000 09:43:13 -0400 (EDT)
Subject: [XML-SIG] Re: How to proceed
In-Reply-To:
References:
Message-ID: <14837.37361.478208.27446@cj42289-a.reston1.va.home.com>
Lars Marius Garshol writes:
> I think we should use a different module, possibly even a different
> package. Would xml.saxkit.* work? Or xml.saxexts.*? Or xml.saxpack.*?
How about xml.saxtools.*?
-Fred
--
Fred L. Drake, Jr.
PythonLabs Team Member
From larsga@garshol.priv.no Tue Oct 24 14:54:13 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 24 Oct 2000 15:54:13 +0200
Subject: [XML-SIG] Re: How to proceed
In-Reply-To: <14837.37361.478208.27446@cj42289-a.reston1.va.home.com>
References: <14837.37361.478208.27446@cj42289-a.reston1.va.home.com>
Message-ID:
* Lars Marius Garshol
|
| I think we should use a different module, possibly even a different
| package. Would xml.saxkit.* work? Or xml.saxexts.*? Or xml.saxpack.*?
* Fred L. Drake, Jr.
|
| How about xml.saxtools.*?
I like that better than my own suggestions, since it makes it clearer
what the package actually is. Unless anyone protests or proposes a
better name, that's the one I would like to use.
Does this belong in the XML-SIG package? I think it does, but it
would be nice to have feedback.
--Lars M.
From fdrake@acm.org Tue Oct 24 14:58:14 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 24 Oct 2000 09:58:14 -0400 (EDT)
Subject: [XML-SIG] Re: How to proceed
In-Reply-To:
References:
<14837.37361.478208.27446@cj42289-a.reston1.va.home.com>
Message-ID: <14837.38262.68627.415392@cj42289-a.reston1.va.home.com>
Lars Marius Garshol writes:
> Does this belong in the XML-SIG package? I think it does, but it
> would be nice to have feedback.
I think so. I think of PyXML as a fairly substantial package that
provides all sorts of general XML-related support that is useful for a
wide range of applications, but might not be included in the "core"
support packaged with Python. Andrew described it as an "omnibus"
package in the early days of PyXML, and I think that's a good
description. The only thing I'd like to see changed in that regard is
that development may be better off within PyXML rather than using
separate CVS repositories for components (xmlproc, 4DOM, etc.), so
that changes get broader testing a little earlier. Now that the CVS
is on SourceForge that makes more sense than when it was on Andrew's
machine.
-Fred
--
Fred L. Drake, Jr.
PythonLabs Team Member
From larsga@garshol.priv.no Tue Oct 24 15:23:56 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 24 Oct 2000 16:23:56 +0200
Subject: [XML-SIG] Re: How to proceed
In-Reply-To: <14837.38262.68627.415392@cj42289-a.reston1.va.home.com>
References: <14837.37361.478208.27446@cj42289-a.reston1.va.home.com> <14837.38262.68627.415392@cj42289-a.reston1.va.home.com>
Message-ID:
* Lars Marius Garshol
|
| Does this belong in the XML-SIG package? I think it does, but it
| would be nice to have feedback.
* Fred L. Drake, Jr.
|
| I think so. I think of PyXML as a fairly substantial package that
| provides all sorts of general XML-related support that is useful for
| a wide range of applications, but might not be included in the
| "core" support packaged with Python. Andrew described it as an
| "omnibus" package in the early days of PyXML, and I think that's a
| good description.
I agree, this is how I think of it as well. So it looks like the SAX
extras package will be called saxtools, have the package name
xml.saxtools.* and live in the XML-SIG package.
| The only thing I'd like to see changed in that regard is that
| development may be better off within PyXML rather than using
| separate CVS repositories for components (xmlproc, 4DOM, etc.), so
| that changes get broader testing a little earlier.
I agree. My plan for the future is to maintain xmlproc, javadom and
saxtools in the PyXML CVS tree.
However, this still leave the question of the two XML applications
open. Do rsskit and dtddoc belong in the XML-SIG package? I don't
think they do, but there have been requests to me from people who
would like to see them there.
So I have to decide between using the PyXML package, separate SF
projects or hosting development myself.
--Lars M.
From uche.ogbuji@fourthought.com Tue Oct 24 15:55:21 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Tue, 24 Oct 2000 08:55:21 -0600
Subject: [XML-SIG] Re: How to proceed
In-Reply-To: Message from Lars Marius Garshol
of "24 Oct 2000 16:23:56 +0200."
Message-ID: <200010241455.IAA32150@localhost.localdomain>
> | The only thing I'd like to see changed in that regard is that
> | development may be better off within PyXML rather than using
> | separate CVS repositories for components (xmlproc, 4DOM, etc.), so
> | that changes get broader testing a little earlier.
>
> I agree. My plan for the future is to maintain xmlproc, javadom and
> saxtools in the PyXML CVS tree.
>
> However, this still leave the question of the two XML applications
> open. Do rsskit and dtddoc belong in the XML-SIG package? I don't
> think they do, but there have been requests to me from people who
> would like to see them there.
>
> So I have to decide between using the PyXML package, separate SF
> projects or hosting development myself.
I don't know enough about rsskit and dtddoc to specifically conclude where you
should host them, but if there is general interest in the packages, I'd say
"put them in PyXML. Anything from Lars is sure to be of high enough quality
to avoid any question of shovelling poor code into the package.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From Alexandre.Fayolle@logilab.fr Tue Oct 24 16:54:00 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 24 Oct 2000 17:54:00 +0200 (CEST)
Subject: [XML-SIG] xmlproc DTD api bug
Message-ID:
I'm using PyXml 0.5.5.1 on python 1.5.2 I know these are old versions (we
are planning to move further real soon now), so maybe the problem has
already been fixed. Please excuse me if this is the case.
If I have an element with an ANY content model, and use
elt.get_valid_elements(elt.get_start_state()), I get a Type Error,
because the content model is None for this element:
File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmldtd.py",
line 300, in get_valid_elements
return self.content_model[state].keys()
TypeError: unsubscriptable object
I think xmlproc should test for this case, and return the list of all the
elements known in the DTD (using dtd.get_elements())
Cheers
--
Alexandre Fayolle
http://www.logilab.com
LOGILAB, Paris (France).
From larsga@garshol.priv.no Tue Oct 24 17:26:28 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 24 Oct 2000 18:26:28 +0200
Subject: [XML-SIG] xmlproc DTD api bug
In-Reply-To:
References:
Message-ID:
* Alexandre Fayolle
|
| I'm using PyXml 0.5.5.1 on python 1.5.2 I know these are old
| versions (we are planning to move further real soon now), so maybe
| the problem has already been fixed. Please excuse me if this is the
| case.
It has been fixed, but not in the PyXML package. The fix will appear
there when I move xmlproc development to the PyXML CVS tree.
| If I have an element with an ANY content model, and use
| elt.get_valid_elements(elt.get_start_state()), I get a Type Error,
| because the content model is None for this element:
| File "/usr/lib/python1.5/site-packages/xml/parsers/xmlproc/xmldtd.py",
| line 300, in get_valid_elements
| return self.content_model[state].keys()
| TypeError: unsubscriptable object
|
| I think xmlproc should test for this case,
I agree, and the version in my CVS tree does. It might be that
version 0.70 also does; I'm not sure.
| and return the list of all the elements known in the DTD (using
| dtd.get_elements())
In principle I agree that this would be the correct solution.
However, the element doesn't have a reference to the DTD, so it
doesn't have this information. So my current code returns '[]'
instead.
The problem is that if the element is to have a reference to the DTD
we have a cycle and in 1.5.2 that means that we must have either a
.unlink() method on the DTD or memory leaks (and quite often also
both).
If you want to fix the immediate problem, add this method to the
ElementTypeAny class in xmldtd.py:
def get_valid_elements(self, state):
return []
Thank you for reporting this problem!
--Lars M.
From Alexandre.Fayolle@logilab.fr Tue Oct 24 17:48:25 2000
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Tue, 24 Oct 2000 18:48:25 +0200 (CEST)
Subject: [XML-SIG] xmlproc DTD api bug
In-Reply-To:
Message-ID:
On 24 Oct 2000, Lars Marius Garshol wrote:
> If you want to fix the immediate problem, add this method to the
> ElementTypeAny class in xmldtd.py:
>
> def get_valid_elements(self, state):
> return []
Since the involved code is to be distributed, I prefer avoiding patching
dependencies, so I set up a workaround in the calling code. This
workaround may last for some time since the caller knows of the dtd, and
is able to call get_elements() by itself.
Thanks for the quick answer.
--
Alexandre Fayolle
http://www.logilab.com
LOGILAB, Paris (France).
From Joakim.Hove@phys.ntnu.no Tue Oct 24 20:34:12 2000
From: Joakim.Hove@phys.ntnu.no (Joakim Hove)
Date: 24 Oct 2000 21:34:12 +0200
Subject: [XML-SIG] Quotes example PyXML-0.6.1 seems to ignore DTD?
Message-ID:
/-----------------------------------------------------------------
| Please excuse if this mail has apperead on the list previously,=20
| I had some problems sending it initially.
\-----------------------------------------------------------------
Hello,
I have just installed the PyXML-0.6.1 distribution. This is actually
an attempt to teach myself _both_ about Python _and_ XML - ideally one
should probably concentrate on one new thing at a time.
Anyway, I'am experimenting with the qtfmt.py program in demos/quotes/,
this program is supplied with a sample XML file, and an accompanying
DTD - file.
bash% head -2 sample.xml
As we can see the sample.xml file should be validated with respect to
(??) the DTD file "quotations.dtd". Now if I rename quotations.dtd to
something else, I was expecting to trigger a run-time error of some
kind, as the DTD-file specified in the XML is no longer to be found,
however no error occurs, and the output from the qtfmt.py program is
unchanged.
I am not able to assure that the parser used in qtfmt.py is
validating (the relevant part of the qtmft.py file):
# Enforce the use of the Expat parser, because the code needs to be
# sure that the output will be UTF-8 encoded.
p=3Dsaxexts.XMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat")
but if this is indeed a non-validating parser - then it is somewhat
misleading to ship the quotations.dtd file - which is actually not
used.
If anyone could clear up these misunderstandings I would be most
grateful.
-- Joakim Hove
--=20
=3D=3D=3D Joakim Hove www.phys.ntnu.no/~hove/ =3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
# Institutt for fysikk (735) 93637 / E3-166 | Sk=F8yensgate 10D #
# N - 7491 Trondheim hove@phys.ntnu.no | N - 7030 Trondheim #
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 73 93=
31 68 =3D=3D=3D=3D=3D=3D=3D=3D
From akuchlin@mems-exchange.org Tue Oct 24 20:52:51 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Tue, 24 Oct 2000 15:52:51 -0400
Subject: [XML-SIG] Quotes example PyXML-0.6.1 seems to ignore DTD?
In-Reply-To: ; from Joakim.Hove@phys.ntnu.no on Tue, Oct 24, 2000 at 09:34:12PM +0200
References:
Message-ID: <20001024155251.A15058@kronos.cnri.reston.va.us>
On Tue, Oct 24, 2000 at 09:34:12PM +0200, Joakim Hove wrote:
> # Enforce the use of the Expat parser, because the code needs to be
> # sure that the output will be UTF-8 encoded.
> p=saxexts.XMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat")
>but if this is indeed a non-validating parser - then it is somewhat
>misleading to ship the quotations.dtd file - which is actually not
>used.
Correct, Expat is a non-validating parser. But enforcing the use of
Expat is a hack because at the time only Expat would provide UTF-8
output. Probably that hack is no longer necessary with Python 2.0,
since it could just feed a Unicode string to xmlproc, which is a
validating parser, and then convert to the desired output encoding.
I'll add it to my stack of things to do.
--amk
From akuchlin@mems-exchange.org Wed Oct 25 03:40:38 2000
From: akuchlin@mems-exchange.org (A.M. Kuchling)
Date: Tue, 24 Oct 2000 22:40:38 -0400
Subject: [XML-SIG] Determining output encoding of a SAX parser
Message-ID: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com>
Is there any way to determine the encoding of the output from a SAX1
parser driver? It's clear if the callbacks are being passed Unicode
strings, but with 8-bit strings you have no way of knowing if they're
in Latin1 or UTF-8 or anything (unless you know what parser you're
using).
Given that SAX2 does seem to support this with
XMLReader.{get,set}Encoding(), is this worth fixing in SAX1?
--amk
From fdrake@acm.org Wed Oct 25 03:33:07 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 24 Oct 2000 22:33:07 -0400 (EDT)
Subject: [XML-SIG] Determining output encoding of a SAX parser
In-Reply-To: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com>
References: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com>
Message-ID: <14838.18019.793018.508809@cj42289-a.reston1.va.home.com>
A.M. Kuchling writes:
> Given that SAX2 does seem to support this with
> XMLReader.{get,set}Encoding(), is this worth fixing in SAX1?
I thought we'd decided to drop SAX 1 support. Perhaps I'm
mis-remembering?
-Fred
--
Fred L. Drake, Jr.
PythonLabs Team Member
From martin@loewis.home.cs.tu-berlin.de Wed Oct 25 07:17:11 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 25 Oct 2000 08:17:11 +0200
Subject: [XML-SIG] Re: How to proceed
In-Reply-To: (message from Lars Marius
Garshol on 24 Oct 2000 13:50:21 +0200)
References:
Message-ID: <200010250617.IAA00842@loewis.home.cs.tu-berlin.de>
> The question of rsskit and dtddoc still remains open, though.
I don't know what this is, who would use it, or how good it works for
the purposes it is designed for, so I can't really comment. All I can
say that if you consider it free software (in the speech sense), and
related to PyXML, and if its presence doesn't break anything, then I
certainly won't object to ship it together with PyXML.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Wed Oct 25 07:25:43 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 25 Oct 2000 08:25:43 +0200
Subject: [XML-SIG] Determining output encoding of a SAX parser
In-Reply-To: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com>
(amk@mira.erols.com)
References: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com>
Message-ID: <200010250625.IAA00904@loewis.home.cs.tu-berlin.de>
> Is there any way to determine the encoding of the output from a SAX1
> parser driver? It's clear if the callbacks are being passed Unicode
> strings, but with 8-bit strings you have no way of knowing if they're
> in Latin1 or UTF-8 or anything (unless you know what parser you're
> using). =
> =
> Given that SAX2 does seem to support this with
> XMLReader.{get,set}Encoding(), is this worth fixing in SAX1? =
I don't think it is worth to fix anything with SAX1, unless documented
functionality is clearly broken.
=46rom Python 1.6 on, I'd expect drivers to produce Unicode objects in
most cases (although only expat currently does), in which case the
encoding of the input would be irrelevant. Please note that the
{get,set}Encoding() is on the InputSource, not on the XMLReader. I
don't know whether the reader is supposed to invoke setEncoding on the
source once it sees an encoding=3D attribute.
Regards,
Martin
From larsga@garshol.priv.no Wed Oct 25 10:39:18 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 25 Oct 2000 11:39:18 +0200
Subject: [XML-SIG] Determining output encoding of a SAX parser
In-Reply-To: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com>
References: <200010250240.WAA00739@207-172-112-224.s224.tnt4.ann.va.dialup.rcn.com>
Message-ID:
* A. M. Kuchling
|
| Is there any way to determine the encoding of the output from a SAX1
| parser driver?
No, there is not. You simply get 8-bit strings with no semantics
attached.
| Given that SAX2 does seem to support this with
| XMLReader.{get,set}Encoding(),
There is no XMLReader.{get,set}Encoding() in Python or Java SAX 2.0.
There are methods with these names on InputSource, but that is
something completely different.
| is this worth fixing in SAX1?
No, I don't think it is. SAX 1.0 is obsolete now, and we should all
move on to SAX 2.0. In SAX 2.0, the goal is to have all drivers (or
at least as close to all as possible) emit Unicode strings.
--Lars M.
From Nicolas.Chauvat@logilab.fr Wed Oct 25 15:45:00 2000
From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat)
Date: Wed, 25 Oct 2000 16:45:00 +0200 (CEST)
Subject: [XML-SIG] Character encodings and expat
Message-ID:
Hi,
It looks like expat refuses the alias "latin1" for the encoding
"ISO-8859-1" as it returns a fatalError that raises a SaxException when
using
Sax2.FromXml('=E0=E9=E8=F9<=
/try>')
The XML spec says that parsers *may* recognize aliases defined by IANA, so
I wouldn't call it a bug. Did I miss a parameter to set up somewhere to
get expat to recognize "latin1" ?
--=20
Nicolas Chauvat
http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F=
rance)
From martin@loewis.home.cs.tu-berlin.de Thu Oct 26 00:12:28 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 26 Oct 2000 01:12:28 +0200
Subject: [XML-SIG] Character encodings and expat
In-Reply-To: (message from
Nicolas Chauvat on Wed, 25 Oct 2000 16:45:00 +0200 (CEST))
References:
Message-ID: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de>
> It looks like expat refuses the alias "latin1" for the encoding
> "ISO-8859-1" as it returns a fatalError that raises a SaxException when
> using
>
> Sax2.FromXml('àéèù')
>
> The XML spec says that parsers *may* recognize aliases defined by IANA, so
> I wouldn't call it a bug. Did I miss a parameter to set up somewhere to
> get expat to recognize "latin1" ?
Once xmlproc is capable of producing Unicode, it will certainly
understand all encodings that the Python 2.0 encoding machinery knows
of; that includes "latin1".
We should also strive for teaching expat to use the Python encoding
machinery, but that may be more difficult. Any volunteers?
If you *just* want it to recognize "latin1", you should extend
xmltok/xmltok.c:getEncodingIndex.
Regards,
Martin
From larsga@garshol.priv.no Fri Oct 27 10:05:46 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 27 Oct 2000 11:05:46 +0200
Subject: [XML-SIG] Character encodings and expat
In-Reply-To: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de>
References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de>
Message-ID:
* Martin v. Loewis
|
| Once xmlproc is capable of producing Unicode, it will certainly
| understand all encodings that the Python 2.0 encoding machinery knows
| of; that includes "latin1".
Yup. I plan to teach xmlproc the IANA registry, so that this should
not be a problem with xmlproc.
However, it is a problem that Python does not support any of the Far
East encodings yet. Does anyone know if there are any plans to change
that?
| We should also strive for teaching expat to use the Python encoding
| machinery, but that may be more difficult. Any volunteers?
I don't think it's really all that difficult. It should be possible
to use the Python codec system to produce utf-16, and then you feed
this to expat and fix the encoding as "utf-16" in the call to
ParserCreate.
The only possible stumbling block is when expat discovers an XML
declaration that says something other than "utf-16"...
--Lars M.
From mal@lemburg.com Fri Oct 27 11:07:37 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 27 Oct 2000 12:07:37 +0200
Subject: [XML-SIG] Character encodings and expat
References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de>
Message-ID: <39F953E9.493A0DE1@lemburg.com>
Lars Marius Garshol wrote:
>
> * Martin v. Loewis
> |
> | Once xmlproc is capable of producing Unicode, it will certainly
> | understand all encodings that the Python 2.0 encoding machinery knows
> | of; that includes "latin1".
>
> Yup. I plan to teach xmlproc the IANA registry, so that this should
> not be a problem with xmlproc.
You might want to have a look at the code in encodings/aliases.py
It includes the aliasing "database" which the encodings package uses
to map encoding names to codec names.
If not all IANA names are included in this list, it would be
a good idea adding them...
> However, it is a problem that Python does not support any of the Far
> East encodings yet. Does anyone know if there are any plans to change
> that?
Tamito KAJIYAMA has written a few Asian cocecs. These are not
high-performance, but fairly complete and also a great
example of how codecs package can be written. More about this
on the i18n-sig mailing list.
> | We should also strive for teaching expat to use the Python encoding
> | machinery, but that may be more difficult. Any volunteers?
>
> I don't think it's really all that difficult. It should be possible
> to use the Python codec system to produce utf-16, and then you feed
> this to expat and fix the encoding as "utf-16" in the call to
> ParserCreate.
>
> The only possible stumbling block is when expat discovers an XML
> declaration that says something other than "utf-16"...
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
From larsga@garshol.priv.no Fri Oct 27 11:24:09 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 27 Oct 2000 12:24:09 +0200
Subject: [XML-SIG] Character encodings and expat
In-Reply-To: <39F953E9.493A0DE1@lemburg.com>
References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <39F953E9.493A0DE1@lemburg.com>
Message-ID:
* Lars Marius Garshol
|
| Yup. I plan to teach xmlproc the IANA registry, so that this should
| not be a problem with xmlproc.
* mal@lemburg.com
|
| You might want to have a look at the code in encodings/aliases.py
| It includes the aliasing "database" which the encodings package uses
| to map encoding names to codec names.
|
| If not all IANA names are included in this list, it would be
| a good idea adding them...
It would indeed. :) I intended to submit a patch if I found any to be
missing.
* Lars Marius Garshol
|
| However, it is a problem that Python does not support any of the Far
| East encodings yet. Does anyone know if there are any plans to change
| that?
* mal@lemburg.com
|
| Tamito KAJIYAMA has written a few Asian cocecs. These are not
| high-performance, but fairly complete and also a great example of
| how codecs package can be written.
That's only Shift-JIS and EUC-JP, though. Is there any concerted
effort afoot to make a more complete set? At the very least,
ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented.
| More about this on the i18n-sig mailing list.
Well, if only a single response is required I would prefer to get that
here.
--Lars M.
From andorxor@gmx.de Fri Oct 27 18:40:52 2000
From: andorxor@gmx.de (Stephan Tolksdorf)
Date: Fri, 27 Oct 2000 19:40:52 +0200
Subject: [XML-SIG] Improvement pulldom and minidom
Message-ID: <315158556.20001027194052@email.com>
Hello,
I would like to have the two methods hasAttribute and hasAttributeNS
of DOM2's Element in minidom included.
They are very easy to implement and rather usefull:
----------
class Element:
(...)
def hasAttribute(self, name):
return self._attrs.has_key(name)
def hasAttributeNS(self, namespaceURI, localName):
return self._attrsNS.has_key((namespaceURI, localName))
----------
Additonally I propose to replace
l. 220 in pulldom.py (def parse):
if type(stream_or_string) is type(""):
with: if type(stream_or_string in [type(""), type(u'')]:
like it is done in saxutils.prepare_input_source. Just to make it possible
to pass unicode filenames directly to the function.
Best regards,
Stephan Tolksdorf
From fdrake@acm.org Fri Oct 27 18:39:57 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 27 Oct 2000 13:39:57 -0400 (EDT)
Subject: [XML-SIG] Improvement pulldom and minidom
In-Reply-To: <315158556.20001027194052@email.com>
References: <315158556.20001027194052@email.com>
Message-ID: <14841.48621.686427.513012@cj42289-a.reston1.va.home.com>
Stephan Tolksdorf writes:
> I would like to have the two methods hasAttribute and hasAttributeNS
> of DOM2's Element in minidom included.
These are good suggestions. I'd like Paul Prescod to take a look at
this to make a final determination since he wrote that code.
Thanks!
-Fred
--
Fred L. Drake, Jr.
PythonLabs Team Member
From Nicolas.Chauvat@logilab.fr Fri Oct 27 21:36:58 2000
From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat)
Date: Fri, 27 Oct 2000 22:36:58 +0200 (CEST)
Subject: [XML-SIG] (4DOM) Misleading error message in StyleSheetReader
Message-ID:
Hi there,
As stated in the subject, there is a misleading error message in
xml.xslt.StyleSheetReader.py/StyleSheetGenerator.__initialize():
if the xmlns:xsl=3D"URI" of an xsl:transform node is not equal to
the XSL_NAMESPACE defined in xml.xslt.__init__.py, it will raise an
Error.STYLESHEET_MISSING_VERSION error and complain about a missing
version attribute.
I'd say it's a wrong diagnostic. I suppose it should check first that the
URI refered to by xmlns:xsl is the same as XSL_NAMESPACE, but I'm not sure
how things work in there and I'm too tired to submit a patch today.
--=20
Nicolas Chauvat
http://www.logilab.com - "Mais o=F9 est donc Ornicar ?" - LOGILAB, Paris (F=
rance)
From martin@loewis.home.cs.tu-berlin.de Fri Oct 27 22:24:56 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 27 Oct 2000 23:24:56 +0200
Subject: [XML-SIG] Character encodings and expat
In-Reply-To: (message from Lars Marius
Garshol on 27 Oct 2000 11:05:46 +0200)
References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de>
Message-ID: <200010272124.XAA00854@loewis.home.cs.tu-berlin.de>
> Yup. I plan to teach xmlproc the IANA registry, so that this should
> not be a problem with xmlproc.
With due respect, I hope this is not the way it that is done. Instead,
I think codecs.lookup should know the IANA registry. It may be that
this information comes with PyXML only for now, but it should be
available to all Python applications. E.g. xml/__init__.py could
do
codecs.register(iana_lookup)
where iana_lookup simply maps encodings to the "normalized" form.
I agree with MAL that this should eventually end-up in Python proper.
In any case, knowing the official aliases should not be restricted to
xmlproc.
> However, it is a problem that Python does not support any of the Far
> East encodings yet. Does anyone know if there are any plans to change
> that?
Again, I'd see no problem including Tamito Kajiyama's code in PyXML,
if he wants us to ship it - or we could recommend JapaneseCodecs as an
valuable addition to PyXML; this package also uses the distutils, so
it is quite easy to install.
[using Python codecs in expat]
> I don't think it's really all that difficult.
[...]
> The only possible stumbling block is when expat discovers an XML
> declaration that says something other than "utf-16"...
Wouldn't that be the normal case where encodings other than UTF-8
become interesting? I'd assume that most XML documents which don't use
UTF-8 do declare the encoding in the XML declaration, instead of
relying on some higher-level protocol to correctly transmit encoding
information.
So I'd rather see an approach where expat itself finds out eventually
what the encoding is, and then goes to the application (i.e. the
Python SAX driver) and asks to convert the input.
Regards,
Martin
From martin@loewis.home.cs.tu-berlin.de Fri Oct 27 22:47:42 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 27 Oct 2000 23:47:42 +0200
Subject: [XML-SIG] Character encodings and expat
In-Reply-To: (message from Lars Marius
Garshol on 27 Oct 2000 12:24:09 +0200)
References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <39F953E9.493A0DE1@lemburg.com>
Message-ID: <200010272147.XAA00953@loewis.home.cs.tu-berlin.de>
> That's only Shift-JIS and EUC-JP, though. Is there any concerted
> effort afoot to make a more complete set? At the very least,
> ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented.
I'd hope that somebody exposes the operating system's converters to
Python. For example, on Linux and Solaris, the iconv library offers a
wide variety of codecs (at least in its gconv form), which are also
highly performant. On W2k, a huge set of converters is available,
which just waits being exposed to Python.
I'm always concerned by the fact that every package seems to come with
its own set of conversion tables, instead on relying on other people
to do a good job (and report bugs if they don't). Tcl has such tables,
Java does, X11 has some, ICU has more - I really can't see the reason
to reimplement them all again in Python.
> | More about this on the i18n-sig mailing list.
>
> Well, if only a single response is required I would prefer to get that
> here.
This is free software. You never get away with a single response only
:-)
Regards,
Martin
From mal@lemburg.com Sat Oct 28 14:54:19 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 28 Oct 2000 15:54:19 +0200
Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat
References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <200010272124.XAA00854@loewis.home.cs.tu-berlin.de>
Message-ID: <39FADA8B.8D5FE731@lemburg.com>
"Martin v. Loewis" wrote:
>
> > Yup. I plan to teach xmlproc the IANA registry, so that this should
> > not be a problem with xmlproc.
>
> With due respect, I hope this is not the way it that is done. Instead,
> I think codecs.lookup should know the IANA registry. It may be that
> this information comes with PyXML only for now, but it should be
> available to all Python applications. E.g. xml/__init__.py could
> do
>
> codecs.register(iana_lookup)
>
> where iana_lookup simply maps encodings to the "normalized" form.
That would be another option (this codec search function design
turns out to be far more useful than originally though ;-)...
> I agree with MAL that this should eventually end-up in Python proper.
> In any case, knowing the official aliases should not be restricted to
> xmlproc.
Right. Python's encodings package should know at least about all
common aliases used for the provided codecs.
Do you have a pointer to a list of IANA aliases ?
> > However, it is a problem that Python does not support any of the Far
> > East encodings yet. Does anyone know if there are any plans to change
> > that?
>
> Again, I'd see no problem including Tamito Kajiyama's code in PyXML,
> if he wants us to ship it - or we could recommend JapaneseCodecs as an
> valuable addition to PyXML; this package also uses the distutils, so
> it is quite easy to install.
I think it should distributed as separate package: the codecs
are useful in a lot of contexts -- not only XML.
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
From mal@lemburg.com Sat Oct 28 15:09:13 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 28 Oct 2000 16:09:13 +0200
Subject: [XML-SIG] Character encodings and expat
References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <39F953E9.493A0DE1@lemburg.com> <200010272147.XAA00953@loewis.home.cs.tu-berlin.de>
Message-ID: <39FADE09.D257A7DF@lemburg.com>
"Martin v. Loewis" wrote:
>
> > That's only Shift-JIS and EUC-JP, though. Is there any concerted
> > effort afoot to make a more complete set? At the very least,
> > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented.
>
> I'd hope that somebody exposes the operating system's converters to
> Python. For example, on Linux and Solaris, the iconv library offers a
> wide variety of codecs (at least in its gconv form), which are also
> highly performant. On W2k, a huge set of converters is available,
> which just waits being exposed to Python.
>
> I'm always concerned by the fact that every package seems to come with
> its own set of conversion tables, instead on relying on other people
> to do a good job (and report bugs if they don't). Tcl has such tables,
> Java does, X11 has some, ICU has more - I really can't see the reason
> to reimplement them all again in Python.
Sure would be nice... the only problem I see is that the
different codecs for the Asian scripts will most probably
behave differently, e.g. there are many issues with private
code point areas in Unicode and the various Asian encodings.
It would still be nice to have different codec packages around
though -- even if they all implement the same converters, e.g.
AsianCharmapCodecs, NativeWin32Codecs, NativeCLibCodecs, etc.
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
From frank63@ms5.hinet.net Sun Oct 29 00:46:50 2000
From: frank63@ms5.hinet.net (Frank J.S. Chen)
Date: Sat, 28 Oct 2000 23:46:50 -0000
Subject: [XML-SIG] Character encodings and expat
Message-ID: <200010281546.XAA15848@ms5.hinet.net>
> >
> > > That's only Shift-JIS and EUC-JP, though. Is there any concerted
> > > effort afoot to make a more complete set? At the very least,
> > > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented.
> >
>
> Sure would be nice... the only problem I see is that the
> different codecs for the Asian scripts will most probably
> behave differently, e.g. there are many issues with private
> code point areas in Unicode and the various Asian encodings.
For now, all CJK Unicode characters reside in Basic Multilingual
Plane(Plane 0).
It seems no need to consider surrogate area or private use area right now.
What we need is indeed a transcoding interface to convert different locales
to UTF-8/UTF-16 and then back.
From martin@loewis.home.cs.tu-berlin.de Sat Oct 28 21:34:45 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 28 Oct 2000 22:34:45 +0200
Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat
In-Reply-To: <39FADA8B.8D5FE731@lemburg.com> (mal@lemburg.com)
References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <200010272124.XAA00854@loewis.home.cs.tu-berlin.de> <39FADA8B.8D5FE731@lemburg.com>
Message-ID: <200010282034.WAA00771@loewis.home.cs.tu-berlin.de>
> Do you have a pointer to a list of IANA aliases ?
It's at http://www.isi.edu/in-notes/iana/assignments/character-sets
Regards,
Martin
From andy@reportlab.com Sun Oct 29 07:14:01 2000
From: andy@reportlab.com (Andy Robinson)
Date: Sun, 29 Oct 2000 07:14:01 -0000
Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat
In-Reply-To: <200010272147.XAA00953@loewis.home.cs.tu-berlin.de>
Message-ID:
> -----Original Message-----
> From: i18n-sig-admin@python.org [mailto:i18n-sig-admin@python.org]On
> Behalf Of Martin v. Loewis
> Sent: 27 October 2000 22:48
> To: larsga@garshol.priv.no
> Cc: i18n-sig@python.org; xml-sig@python.org
> Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat
>
>
> > That's only Shift-JIS and EUC-JP, though. Is there any concerted
> > effort afoot to make a more complete set? At the very least,
> > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be
> implemented.
>
That was the intention, but I admit we have run out of steam
somewhat. Tamito Kajiyama is the only person to have made
a really big contribution. I was hoping to, but that hope
was on the basis of a large customer project needing
this stuff which got cancelled, and running a startup
is taking so much time that I won't manage much until
ReportLab gets a customer who needs to reencode data.
When that happens, we'll have to do it, and fast.
As an aside, we're doing the work to allow use of
Adobe's Asian Font Packs in reportlab at the moment,
and they use the native encodings. So once that
comes out, we'll be under a lot of pressure to do it.
I am very hopeful of the first half of next year if no
one else has done the work already.
In the meantime, frankly, not enough people need
it badly enough and nobody but Tamito has had a go.
Volunteers welcome!
>I'm always concerned by the fact that every package seems to come
with
>its own set of conversion tables, instead on relying on other people
>to do a good job (and report bugs if they don't). Tcl has such
tables,
>Java does, X11 has some, ICU has more - I really can't see the reason
>to reimplement them all again in Python.
I don't use Tcl, Java or X11 and don't know what ICU
is, but I do use Python on several platforms and would
want to know that the encodings library worked
identically on all platforms - i.e. if there are bugs
in the codecs, they are consistent and can be fixed
consistently. I think this issue was pretty much settled
in MAL's original i18n proposal. However, no sane person
retypes mapping tables; if we built something Pythonic
we'd hopefully do it by extracting data from two different
sources, building our own tables and checking they got
identical results. With compression into a Zip file
and careful use of diff-like techniques (all the obscure
Asian codecs go like 'take this base encoding and add
these extra code points'), I believe a good codec
database could be quite small.
- Andy Robinson
From martin@loewis.home.cs.tu-berlin.de Sun Oct 29 16:46:06 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 29 Oct 2000 17:46:06 +0100
Subject: [XML-SIG] Improvement pulldom and minidom
In-Reply-To: <315158556.20001027194052@email.com> (message from Stephan
Tolksdorf on Fri, 27 Oct 2000 19:40:52 +0200)
References: <315158556.20001027194052@email.com>
Message-ID: <200010291646.RAA00933@loewis.home.cs.tu-berlin.de>
> I would like to have the two methods hasAttribute and hasAttributeNS
> of DOM2's Element in minidom included.
[...]
> Additonally I propose to replace
These patches look good to me, so I have installed them. Please try to
produce context or unified diffs the next time (e.g. like the ones I
have attached below), either using diff or cvs diff.
Regards,
Martin
Index: minidom.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/dom/minidom.py,v
retrieving revision 1.6
diff -u -r1.6 minidom.py
--- minidom.py 2000/10/20 17:19:59 1.6
+++ minidom.py 2000/10/29 16:33:45
@@ -346,6 +346,12 @@
node.unlink()
del self._attrs[node.name]
del self._attrsNS[(node.namespaceURI, node.localName)]
+
+ def hasAttribute(self, name):
+ return self._attrs.has_key(name)
+
+ def hasAttributeNS(self, namespaceURI, localName):
+ return self._attrsNS.has_key((namespaceURI, localName))
def getElementsByTagName(self, name):
return _getElementsByTagNameHelper(self, name, [])
Index: pulldom.py
===================================================================
RCS file: /cvsroot/pyxml/xml/xml/dom/pulldom.py,v
retrieving revision 1.5
diff -u -r1.5 pulldom.py
--- pulldom.py 2000/10/20 16:59:42 1.5
+++ pulldom.py 2000/10/29 16:42:20
@@ -1,6 +1,12 @@
import minidom
import xml.sax,xml.sax.handler
+import types
+try:
+ _StringTypes = [types.StringType, types.UnicodeType]
+except AttributeError:
+ _StringTypes = [types.StringType]
+
START_ELEMENT = "START_ELEMENT"
END_ELEMENT = "END_ELEMENT"
COMMENT = "COMMENT"
@@ -217,7 +223,7 @@
default_bufsize = (2 ** 14) - 20
def parse(stream_or_string, parser=None, bufsize=default_bufsize):
- if type(stream_or_string) is type(""):
+ if type(stream_or_string) in _StringTypes:
stream = open(stream_or_string)
else:
stream = stream_or_string
From uche.ogbuji@fourthought.com Sun Oct 29 19:16:53 2000
From: uche.ogbuji@fourthought.com (uche.ogbuji@fourthought.com)
Date: Sun, 29 Oct 2000 12:16:53 -0700
Subject: [XML-SIG] Error in pyexpat docs
Message-ID: <200010291916.MAA16420@localhost.localdomain>
An excerpt from the Python 2.0 docs for pyexpat:
"""
ParseFile (file)
Parse XML data reading from the object file. file only needs to provide
the read(nbytes) method, returning the empty string when
there's no more data.
[snip]
The following attributes contain values relating to the most recent error
encountered by an xmlparser object, and will only have correct
values once a call to Parse() or ParseFile() has raised a
xml.parsers.expat.error exception.
ErrorByteIndex
Byte index at which an error occurred.
[etc.]
"""
The wrong, "xml.parsers.expat" is the first indicator that there might be a
problem, yet I took the docs at their word and wrapped the call to ParseFile
in a blanket try/except, only to find that no exception of any sort is ever
raised by ParseFile.
It turns out that ParseFile actually returns 0 on error, returning 1 otherwise.
The first matter is that the code and the docs need to be reconciled.
However, I would _much_ rather prefer that things were as in the docs. I
think ParseFile should raise an exception rather than return an error flag.
Interestingly enough, this is the same argument I had with a colleague just
last week.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
From mal@lemburg.com Mon Oct 30 08:52:58 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 30 Oct 2000 09:52:58 +0100
Subject: [XML-SIG] Character encodings and expat
References: <200010281546.XAA15848@ms5.hinet.net>
Message-ID: <39FD36EA.E307382B@lemburg.com>
"Frank J.S. Chen" wrote:
>
> > >
> > > > That's only Shift-JIS and EUC-JP, though. Is there any concerted
> > > > effort afoot to make a more complete set? At the very least,
> > > > ISO 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented.
> > >
> >
> > Sure would be nice... the only problem I see is that the
> > different codecs for the Asian scripts will most probably
> > behave differently, e.g. there are many issues with private
> > code point areas in Unicode and the various Asian encodings.
>
> For now, all CJK Unicode characters reside in Basic Multilingual
> Plane(Plane 0).
> It seems no need to consider surrogate area or private use area right now.
But there is a private use area in the BMP as well... and if you
plan to write round-trip safe codecs for corporate character sets,
then you'll have to use these to make the transfer safe.
> What we need is indeed a transcoding interface to convert different locales
> to UTF-8/UTF-16 and then back.
I not sure I understand you here: there are quite a few codecs
available in the std Python lib which are readily usable and
the locale.py module has a database of many default encodings
for the various locales.
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
From martin@loewis.home.cs.tu-berlin.de Mon Oct 30 22:38:47 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 30 Oct 2000 23:38:47 +0100
Subject: [XML-SIG] Character encodings and expat
In-Reply-To: <39FD36EA.E307382B@lemburg.com> (mal@lemburg.com)
References: <200010281546.XAA15848@ms5.hinet.net> <39FD36EA.E307382B@lemburg.com>
Message-ID: <200010302238.XAA00854@loewis.home.cs.tu-berlin.de>
> But there is a private use area in the BMP as well... and if you
> plan to write round-trip safe codecs for corporate character sets,
> then you'll have to use these to make the transfer safe.
Well, you can't make round-trip encoding safe for them - that is the
very nature of the private use area. If convert set A to Unicode,
using the private map, then convert to set B, and back from there, you
likely lose.
If there are "official" mappings between some corporate's character
set and Unicode, then I'd expect all converters that support the
corporate character set also to treat the private use area in the same
way.
If there are no official mappings published by the corporation, then
you are better of using the platform converters on the corporation's
operating system. Those will definitely get the private use area
right; the ones provided by Python in a cross-platform cross-vendor
way might not.
Regards,
Martin
From mal@lemburg.com Mon Oct 30 22:57:10 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 30 Oct 2000 23:57:10 +0100
Subject: [XML-SIG] Character encodings and expat
References: <200010281546.XAA15848@ms5.hinet.net> <39FD36EA.E307382B@lemburg.com> <200010302238.XAA00854@loewis.home.cs.tu-berlin.de>
Message-ID: <39FDFCC6.D9599CE2@lemburg.com>
"Martin v. Loewis" wrote:
>
> > But there is a private use area in the BMP as well... and if you
> > plan to write round-trip safe codecs for corporate character sets,
> > then you'll have to use these to make the transfer safe.
>
> Well, you can't make round-trip encoding safe for them - that is the
> very nature of the private use area. If convert set A to Unicode,
> using the private map, then convert to set B, and back from there, you
> likely lose.
True. With "round trip" I meant encoding A -> Unicode -> encoding A.
This is often needed in order to do processing on the data and
should be a 1-1 mapping if possible.
> If there are "official" mappings between some corporate's character
> set and Unicode, then I'd expect all converters that support the
> corporate character set also to treat the private use area in the same
> way.
>
> If there are no official mappings published by the corporation, then
> you are better of using the platform converters on the corporation's
> operating system. Those will definitely get the private use area
> right; the ones provided by Python in a cross-platform cross-vendor
> way might not.
Right.
Perhaps the codecs should warn about these conversions by applying
error handling to them (raise exceptions, ignore, replace, etc.) ?!
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
From larsga@garshol.priv.no Tue Oct 31 10:56:25 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 31 Oct 2000 11:56:25 +0100
Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat
In-Reply-To: <39FADA8B.8D5FE731@lemburg.com>
References: <200010252312.BAA01255@loewis.home.cs.tu-berlin.de> <200010272124.XAA00854@loewis.home.cs.tu-berlin.de> <39FADA8B.8D5FE731@lemburg.com>
Message-ID:
* Martin von Loewis
|
| Again, I'd see no problem including Tamito Kajiyama's code in PyXML,
| if he wants us to ship it - or we could recommend JapaneseCodecs as an
| valuable addition to PyXML; this package also uses the distutils, so
| it is quite easy to install.
* mal@lemburg.com
|
| I think it should distributed as separate package: the codecs
| are useful in a lot of contexts -- not only XML.
Agreed. Anyone who wants the codecs at all will want them regardless
of whether they want the XML package or not.
--Lars M.
From larsga@garshol.priv.no Tue Oct 31 11:01:24 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 31 Oct 2000 12:01:24 +0100
Subject: [I18n-sig] Re: [XML-SIG] Character encodings and expat
In-Reply-To:
References:
Message-ID:
* Lars Marius Garshol
|
| That's only Shift-JIS and EUC-JP, though. Is there any concerted
| effort afoot to make a more complete set? At the very least, ISO
| 2022-JP, Big5, VISCII, GB-2312 and EUC-KR should be implemented.
* Andy Robinson
|
| That was the intention, but I admit we have run out of steam
| somewhat. Tamito Kajiyama is the only person to have made a really
| big contribution. [...] Volunteers welcome!
Then I may have a go at it if I can find the time. I've written
codecs for all these in C++ over the past few weeks, so it should be a
simple job to redo it for Python. (It was for a closed-source
project, so it can unfortunately not be reused directly.)
| However, no sane person retypes mapping tables; if we built
| something Pythonic we'd hopefully do it by extracting data from two
| different sources, building our own tables and checking they got
| identical results.
www.unicode.org provides mapping tables that are really easy to parse
with a Python script in order to build tables.
| With compression into a Zip file and careful use of diff-like
| techniques (all the obscure Asian codecs go like 'take this base
| encoding and add these extra code points'), I believe a good codec
| database could be quite small.
My binary collection of conversion tables for ISO 8859 1->15,
Windows-12xx, koi8-r, VISCII, Shift-JIS, EUC-JP, ISO 2022-JP, Big5,
EUC-KR and GB-2312 is about 90k.
--Lars M.
From richard@iopen.co.nz Tue Oct 31 12:07:34 2000
From: richard@iopen.co.nz (richard@iopen.co.nz)
Date: Wed, 1 Nov 2000 01:07:34 +1300 (NZDT)
Subject: [XML-SIG] Multiple top nodes
Message-ID:
Greetings -
Recently I installed PyXML 0.6.1, having previously been using 0.5.2. I
had an XML parser written with 0.5.2 which took an XML document with
multiple top nodes and created DOM from it (ie. it created a
DocumentFragment that wasn't well formed XML).
eg. a file of xml snippets -
---
---
to do this in 0.5.2 I had a pretty simple fragment of code such as -
---
parser=saxexts.make_parser()
dh=SaxBuilder()
dh.buildFragment()
parser.setDocumentHandler(snippetFile)
parser.parse(snippetfile)
---
which worked great. I'm having exactly 0 luck getting the same sort of
thing to work with 0.6.1. The key in this case was using a DocumentHandler
that stuffed the DOM into a DocumentFragment, not a Document (which must
be well formed).
Anyway, to get to the point, could someone please give me a pointer to do
the same with 0.6.1? I'm sure that it is as simple (or at least close) in
0.6.1, but I'm probably missing the obvious.
Regards,
Richard Waid
Network/Software Engineer
iOpen Technologies Ltd.
From larsga@garshol.priv.no Tue Oct 31 12:34:14 2000
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 31 Oct 2000 13:34:14 +0100
Subject: [XML-SIG] Multiple top nodes
In-Reply-To:
References:
Message-ID:
* richard@iopen.co.nz
|
| [...] which worked great. I'm having exactly 0 luck getting the same
| sort of thing to work with 0.6.1. The key in this case was using a
| DocumentHandler that stuffed the DOM into a DocumentFragment, not a
| Document (which must be well formed).
Could you provide us with the exact error message you get?
I have a suspicion that your problem might be related to your trying
to parse a non-wellformed XML document. :-)
--Lars M.
From martin@loewis.home.cs.tu-berlin.de Tue Oct 31 20:09:00 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 31 Oct 2000 21:09:00 +0100
Subject: [XML-SIG] Moving towards 0.6.2
Message-ID: <200010312009.VAA01413@loewis.home.cs.tu-berlin.de>
I'm going to release PyXML 0.6.2 later this week or next week. If you
have any patches that you'd like in that release, please let me know,
or commit them yourself.
Regards,
Martin
From akuchlin@mems-exchange.org Tue Oct 31 20:20:07 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Tue, 31 Oct 2000 15:20:07 -0500
Subject: [XML-SIG] Moving towards 0.6.2
In-Reply-To: <200010312009.VAA01413@loewis.home.cs.tu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Tue, Oct 31, 2000 at 09:09:00PM +0100
References: <200010312009.VAA01413@loewis.home.cs.tu-berlin.de>
Message-ID: <20001031152007.A12433@kronos.cnri.reston.va.us>
On Tue, Oct 31, 2000 at 09:09:00PM +0100, Martin v. Loewis wrote:
>I'm going to release PyXML 0.6.2 later this week or next week. If you
>have any patches that you'd like in that release, please let me know,
>or commit them yourself.
Resyncing with the current version (or the CVS tree?) of 4DOM would be
a good idea, since I think some bugs have been fixed that are still
present. (For example, the incorrect XML declaration in
xml/dom/ext/Printer.py, reported by Jennifer Wells a while back.)
Uche, Mike: any suggestions about this?
--amk
From richard@iopen.co.nz Tue Oct 31 20:39:14 2000
From: richard@iopen.co.nz (richard@iopen.co.nz)
Date: Wed, 1 Nov 2000 09:39:14 +1300 (NZDT)
Subject: [XML-SIG] Multiple top nodes
In-Reply-To:
Message-ID:
On 31 Oct 2000, Lars Marius Garshol wrote:
> * richard@iopen.co.nz
> |
> | [...] which worked great. I'm having exactly 0 luck getting the same
> | sort of thing to work with 0.6.1. The key in this case was using a
> | DocumentHandler that stuffed the DOM into a DocumentFragment, not a
> | Document (which must be well formed).
>
> Could you provide us with the exact error message you get?
>
> I have a suspicion that your problem might be related to your trying
> to parse a non-wellformed XML document. :-)
Thanks for the reply. A quick example -
With a file called some.xml with contents -
---
---
and using FromXmlFile to parse some.xml into a DOM -
---
Python 2.0 (#1, Oct 16 2000, 18:10:03)
[GCC 2.95.2 19991024 (release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from xml.dom.ext.reader.Sax2 import FromXmlFile
>>> xml_dom=FromXmlFile('out.xml')
Traceback (most recent call last):
File "", line 1, in ?
File
"/usr/local/python-2.0/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 276, in FromXmlFile
rv = FromXmlStream(fp, ownerDocument, validate, keepAllWs, catName,
saxHandlerClass)
File
"/usr/local/python-2.0/lib/python2.0/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 256, in FromXmlStream
parser.parseFile(stream)
File
"/usr/local/python-2.0/lib/python2.0/site-packages/_xmlplus/sax/drivers/drv_pyexpat.py",
line 68, in parseFile
if self.parser.Parse(buf, 0) != 1:
xml.parsers.expat.error: junk after document element: line 6, column 0
>>>
---
Which I would have expected, if it was trying to parse into a Document. My
reading of -
http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/level-one-core.html#ID-B63ED1A3
suggests that if I was to parse into a DocumentFragment (which is what I
did with SaxBuilder in my original example), the document should be able
to have more than one top-level node.
A little more detail might help -- I'm looking to have a number of
different programs writing 'snippets' of XML into different files, which
then get read by a different program, parsed into DocumentFragment, and
then written back out to a file which will be Well Formed.
I could just read the files in blindly, manually put root tags around the
lot, and write them out again, but I may want to manipulate the DOM before
I write it out, and I'd like to check that the fragments are individually
well formed (well, as well formed as a document could be with multiple top
nodes).
Many thanks,
Richard Waid
Network/Software Engineer
iOpen Technologies Ltd.
From martin@loewis.home.cs.tu-berlin.de Tue Oct 31 21:17:10 2000
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 31 Oct 2000 22:17:10 +0100
Subject: [XML-SIG] Multiple top nodes
In-Reply-To:
(richard@iopen.co.nz)
References:
Message-ID: <200010312117.WAA01998@loewis.home.cs.tu-berlin.de>
> Which I would have expected, if it was trying to parse into a Document.
Well, you are trying to parse a document. At least this is what
FromXml* assumes.
It *will* assume that you want to create a DocumentFragment if you
pass it the ownerDocument parameter, e.g.
from xml.dom import implementation
from xml.dom.ext.reader import Sax2
Sax=Sax2
frag="""
"""
doc = implementation.createDocument(None,None,None)
print Sax.FromXml(frag,ownerDocument=doc)
However, it still will use a SAX parser to parse the fragment, and SAX
does not support parsing fragments. Specifically, the expat parser
will complain about the ill-formedness of the document.
If you absolutely need to make this work, then you can use the sgmlop
driver, which performs less error checking. Unfortunately, FromXml*
does not support an application-provided driver, so your only solution
at the moment is to set the environment variable PY_SAX_PARSER to
xml.sax.drivers.drv_sgmlop. With that setting, my program above
creates a DocumentFragment.
Please note that this is abusing defects in sgmlop, which should
detect errors in the document more reliably. I'm surprised this worked
with PyXML 0.5.
Regards,
Martin
From richard@iopen.co.nz Tue Oct 31 21:43:52 2000
From: richard@iopen.co.nz (richard@iopen.co.nz)
Date: Wed, 1 Nov 2000 10:43:52 +1300 (NZDT)
Subject: [XML-SIG] Multiple top nodes
In-Reply-To: <200010312117.WAA01998@loewis.home.cs.tu-berlin.de>
Message-ID:
On Tue, 31 Oct 2000, Martin v. Loewis wrote:
> However, it still will use a SAX parser to parse the fragment, and SAX
> does not support parsing fragments. Specifically, the expat parser
> will complain about the ill-formedness of the document.
>
> If you absolutely need to make this work, then you can use the sgmlop
> driver, which performs less error checking. Unfortunately, FromXml*
> does not support an application-provided driver, so your only solution
> at the moment is to set the environment variable PY_SAX_PARSER to
> xml.sax.drivers.drv_sgmlop. With that setting, my program above
> creates a DocumentFragment.
>
> Please note that this is abusing defects in sgmlop, which should
> detect errors in the document more reliably. I'm surprised this worked
> with PyXML 0.5.
Thanks for the swift reply.
Now that I follow the reasoning, I'm suprised it worked in 0.5.5 too. I'll
rework the problem, perhaps using a modified file object to surround the
fragment with an arbitrary root tag, then extract the fragment from the
DOM. Not as tidy as I'd hoped, but I take heed from your warning that
doing it the other way would be abusing sgmlop. I'd hate for someone to go
and 'fix' it on me :)
Many thanks,
Richard Waid
Network/Software Engineer
iOpen Technologies Ltd.