From martin at v.loewis.de Fri Nov 2 09:08:38 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 02 Nov 2007 09:08:38 +0100
Subject: [XML-SIG] PyXML for py 2.5
In-Reply-To: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com>
References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com>
Message-ID: <472ADB06.3090907@v.loewis.de>
> Do you have a version of PyXML that works with python version 2.5? If
> not when do you expect it to be available?
PyXML is currently unmaintained. So likely, there won't be any file
releases if it anymore.
Regards,
Martin
From stefan_ml at behnel.de Fri Nov 2 10:01:34 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 02 Nov 2007 10:01:34 +0100
Subject: [XML-SIG] PyXML for py 2.5
In-Reply-To: <472ADB06.3090907@v.loewis.de>
References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com>
<472ADB06.3090907@v.loewis.de>
Message-ID: <472AE76E.8060305@behnel.de>
Martin v. L?wis wrote:
>> Do you have a version of PyXML that works with python version 2.5? If
>> not when do you expect it to be available?
>
> PyXML is currently unmaintained. So likely, there won't be any file
> releases if it anymore.
BTW, who's responsible for updating the XML-SIG page that the Python homepage
links to behind it's prominent "XML" link? I would like to have it updated to
reflect the 'recent' developments regarding ElementTree and lxml, and also
tools like Amara and others. What that site describes is pretty far from what
XML looks like in Python today, and it doesn't help anyone (especially not
newbees) if we keep up appearances here. Recent posts on this list and on
c.l.py show that people who want to solve XML problems in Python bump into
minidom and SAX and then report on the list about their problems with it. Then
people (especially I) tell them to try ElementTree or lxml, and they come back
happily reporting their success and how much easier it became.
I think there is loads of space for optimisation here.
Stefan
From martin at v.loewis.de Fri Nov 2 10:14:18 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 02 Nov 2007 10:14:18 +0100
Subject: [XML-SIG] PyXML for py 2.5
In-Reply-To: <472AE76E.8060305@behnel.de>
References: <8B473CE55AB5B34FAAE37EF3BE19EE949E09E6@esebe113.NOE.Nokia.com>
<472ADB06.3090907@v.loewis.de> <472AE76E.8060305@behnel.de>
Message-ID: <472AEA6A.9040102@v.loewis.de>
> BTW, who's responsible for updating the XML-SIG page that the Python homepage
> links to behind it's prominent "XML" link?
In short: anybody who volunteers.
Regards,
Martin
From martin at v.loewis.de Tue Nov 6 22:06:57 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 06 Nov 2007 22:06:57 +0100
Subject: [XML-SIG] PyXML setup.py install - MSVC Compile Errors (error
LNK2019: unresolved external symbol __imp__)
In-Reply-To: <47218DB8.8020402@behnel.de>
References: <17632efd0710251310n5a4a12d4ufcb3258448aaf5bc@mail.gmail.com>
<47218DB8.8020402@behnel.de>
Message-ID: <4730D771.40001@v.loewis.de>
> If you are not tied to PyXML by some external constraint, you might want to
> use ElementTree or lxml instead, which are easy to use and actively maintained
> (as opposed to PyXML).
As another alternative, you might try the XML libraries that come with
Python itself, which share already a lot of code with PyXML (in fact,
most of the code you would need a compiler for).
Regards,
Martin
From alexander.girman at gmail.com Thu Nov 8 23:07:36 2007
From: alexander.girman at gmail.com (Alexander Girman)
Date: Thu, 8 Nov 2007 17:07:36 -0500
Subject: [XML-SIG] ZSI Namespace problems
Message-ID: <11efd4830711081407w3f6d4d74w3fee9f9a00553155@mail.gmail.com>
Dear List,
I'm trying to build a client to consume a SOAP webservice, and am
having the damnedest time getting ZSI to pass along namespace
information to its generated SOAP message. The code is illustrated
below, along with the tracefile output for the SOAP request. Please
note that the and tags are unqualified, despite
being explicitly qualified in the generating code. The code otherwise
works (I can cut and paste the below query, add the namespaces by
hand, and send it to the webservice via curl and get the right
response). So the question is: why won't it augment the appropriate
tags with the declared namespaces? Any insight would be appreciated,
as I've been toiling over this for what seems like forever T_T...
def call_web_service(payload):
from ZSI.client import Binding
from ZSI import TC
import sys
url = 'https://www.example.com/'
n = 'aeg:do-process'
b = Binding(url = url, ns = n, tracefile = sys.stdout, nsdict={'ns': n})
class process:
def __init__(self, query):
self.input = payload
process.typecode=TC.Struct(process,[TC.String("ns:input")], "ns:process")
return b.RPC(url, 'ns:process', process(query), nsdict={'ns': n})
Produces:
_________________________________ Thu Nov 8 16:48:20 2007 REQUEST:
__PAYLOAD__
_________________________________ Thu Nov 8 16:48:21 2007 RESPONSE:
500
Internal Server Error
From jza at openoffice.org Sun Nov 11 10:08:28 2007
From: jza at openoffice.org (Alexandro Colorado)
Date: Sun, 11 Nov 2007 03:08:28 -0600
Subject: [XML-SIG] How to parse an XML in SAX
Message-ID:
Hi I want to parse an XML using sax but my big issue are the WhiteSpaces
when they get reported. I want to know how to efficiently ignore them. I
know there are some DocumentHandlers and one specific for ignore
Whitespace but I still come up with a bunch of invisible nodes like \t or
\n.
Anyone have a tutorial on how to handle SAX for this kind of parsing?
--
Alexandro Colorado
Help the Tabasco Relief efforts:
http://rootcoffee.blogspot.com/2007/11/race-to-save-mexico-flood-victims.html
From stefan_ml at behnel.de Sun Nov 11 16:16:13 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 11 Nov 2007 16:16:13 +0100
Subject: [XML-SIG] How to parse an XML in SAX
In-Reply-To:
References:
Message-ID: <47371CBD.4010003@behnel.de>
Alexandro Colorado wrote:
> Hi I want to parse an XML using sax
Any reason why you would want to do that?
> but my big issue are the WhiteSpaces
> when they get reported. I want to know how to efficiently ignore them. I
> know there are some DocumentHandlers and one specific for ignore
> Whitespace but I still come up with a bunch of invisible nodes like \t or
> \n.
>
> Anyone have a tutorial on how to handle SAX for this kind of parsing?
Consider using cElementTree's iterparse() instead.
http://effbot.org/zone/element-iterparse.htm
It's also available in lxml.etree.
Stefan
From stefan_ml at behnel.de Mon Nov 12 12:06:56 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 12 Nov 2007 12:06:56 +0100
Subject: [XML-SIG] How to parse an XML in SAX
In-Reply-To:
References:
<47371CBD.4010003@behnel.de>
<4737CAAD.7000005@behnel.de>
Message-ID: <473833D0.6020102@behnel.de>
[going back to the list]
Alexandro Colorado wrote:
> On Sun, 11 Nov 2007 21:38:21 -0600, Stefan Behnel
> wrote:
>
>> The tool I actually mentioned, cElementTree, should also work just
>> fine on
>> 2.3. Note also that ElementTree (without the 'c') is pure Python, so it
>> doesn't require you to compile anything.
>
> Thanks for selling me into ElementTree however I cant because the
> version of the Python distribution that is being shipped doesn't has
> element tree so this make this a particular situation that I can only
> used the standard libraries.
I'm not sure I understand this. You are writing Python code, right? Why can't
you just add another Python source file? (such as ElementTree.py)
Stefan
> Now going back to SAX, is there a way I can escape the non-printable
> characters and how exactly they get into it on the first place. SAX is a
> very quick parser from what I've read. I have found this tutorial
> between python and SAX:
>
> http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/
>
> I have move on to read other tutorials to see if they can address this
> current issue. I am interested on this parsing specifically to see a way
> of escaping or 'passing' the print out of special characaters:
>
> def endElement(self,name):
> if (name == "img") :
> print "%8s %s" % (self.name, self.title)
> self.name = self.title = "" # just for safety
> if (name == "title") :
> pass
>
> Not sure what %8s and %s compared to escaping the /t or /n.
From jza at openoffice.org Mon Nov 12 17:06:06 2007
From: jza at openoffice.org (Alexandro Colorado)
Date: Mon, 12 Nov 2007 10:06:06 -0600
Subject: [XML-SIG] How to parse an XML in SAX
In-Reply-To: <473833D0.6020102@behnel.de>
References:
<47371CBD.4010003@behnel.de>
<4737CAAD.7000005@behnel.de>
<473833D0.6020102@behnel.de>
Message-ID:
On Mon, 12 Nov 2007 05:06:56 -0600, Stefan Behnel
wrote:
> [going back to the list]
>
> Alexandro Colorado wrote:
>> On Sun, 11 Nov 2007 21:38:21 -0600, Stefan Behnel
>> wrote:
>>
>>> The tool I actually mentioned, cElementTree, should also work just
>>> fine on
>>> 2.3. Note also that ElementTree (without the 'c') is pure Python, so it
>>> doesn't require you to compile anything.
>>
>> Thanks for selling me into ElementTree however I cant because the
>> version of the Python distribution that is being shipped doesn't has
>> element tree so this make this a particular situation that I can only
>> used the standard libraries.
>
> I'm not sure I understand this. You are writing Python code, right? Why
> can't
> you just add another Python source file? (such as ElementTree.py)
>
> Stefan
Well first of, will it be backward compatible with 2.3?
How can I include it on the fly without modifying the base install?
What's wrong with SAX, aside from this whitespace issue I already have the
parsing I want. Plus SAX is not just a python thing I might need SAX for
other languages.
--
Alexandro Colorado
Help the Tabasco Relief efforts:
http://rootcoffee.blogspot.com/2007/11/race-to-save-mexico-flood-victims.html
From stefan_ml at behnel.de Mon Nov 12 17:43:08 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 12 Nov 2007 17:43:08 +0100
Subject: [XML-SIG] How to parse an XML in SAX
In-Reply-To:
References: <47371CBD.4010003@behnel.de> <4737CAAD.7000005@behnel.de> <473833D0.6020102@behnel.de>
Message-ID: <4738829C.3000007@behnel.de>
Alexandro Colorado wrote:
> On Mon, 12 Nov 2007 05:06:56 -0600, Stefan Behnel
> wrote:
>> Alexandro Colorado wrote:
>>> Thanks for selling me into ElementTree however I cant because the
>>> version of the Python distribution that is being shipped doesn't has
>>> element tree so this make this a particular situation that I can only
>>> used the standard libraries.
>> I'm not sure I understand this. You are writing Python code, right? Why
>> can't
>> you just add another Python source file? (such as ElementTree.py)
>
> Well first of, will it be backward compatible with 2.3?
AFAIR, it works on Python 1.5.2 and later.
> How can I include it on the fly without modifying the base install?
By copying the file next to your own code?
> What's wrong with SAX, aside from this whitespace issue I already have the
> parsing I want. Plus SAX is not just a python thing I might need SAX for
> other languages.
No problem, go ahead. Since you already have an implementation, you probably
have solved enough problems already, you'll solve the remaining ones also.
> SAX is a very quick parser from what I've read.
SAX is not a parser. It uses a parser in the background to generate SAX parse
events (which IMHO are pretty ugly to work with, but that's what you wanted).
> is there a way I can escape the non-printable characters
Tried repr() ? (or "%r" for what it's worth...)
Stefan
From info at thegrantinstitute.com Fri Nov 16 06:50:18 2007
From: info at thegrantinstitute.com (Anthony Jones)
Date: 15 Nov 2007 21:50:18 -0800
Subject: [XML-SIG] Professional Grant Proposal Writing Workshop (January
2008: San Diego, CA)
Message-ID: <20071115215018.C29E479D4BFFC8D0@thegrantinstitute.com>
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20071115/00185bd9/attachment.htm
From chris at simplistix.co.uk Tue Nov 27 11:48:44 2007
From: chris at simplistix.co.uk (Chris Withers)
Date: Tue, 27 Nov 2007 10:48:44 +0000
Subject: [XML-SIG] problem with elementtree 1.2.6
In-Reply-To: <368a5cd50711262255l25a828cax8bcb546ed390467a@mail.gmail.com>
References: <474B6D14.6010404@simplistix.co.uk>
<368a5cd50711262255l25a828cax8bcb546ed390467a@mail.gmail.com>
Message-ID: <474BF60C.1050801@simplistix.co.uk>
Fredrik Lundh wrote:
>> Sorry if this should go to a list, I couldn't find one...
>> (please send me that way if there is one...)
>
> python-list/comp.lang.python or xml-sig are good choices.
OK, lets go with xml-sig :)
>> I've bumped into an annoying problem, which I actually think is a
>> problem with expat:
>>
>> >>> from xml.parsers import expat
>> >>> parser = expat.ParserCreate()
>> >>> def handle(data): print repr(data)
>> ...
>> >>> parser.CharacterDataHandler = handle
>> >>> parser.Parse('<node/>',0)
>> u'<'
>> u'node/'
>> u'>'
>> 1
>>
>> Now, why is expat unquoting those two entities?
>
> in an XML file, the characters < and & *must* be escaped (either as
> entity references or character references) when appearing in normal
> text:
Yes indeed.
> the following entities are predefined: & (&) < (<) > (>)
> " (") ' (').
Okay, so in the above, if I really mean <, the xml should be:
'</>'
Seems a little clunky, but okay...
I guess this was causing me problems as I'm working on a bug in Twiddler
(http://www.simplistix.co.uk/software/python/twiddler)
where quoted html was ending up unquoted after processing:
>>> from twiddler import Twiddler
>>> t = Twiddler('<b>')
>>> t.render()
u''
Now, I see how you fixed this in ElementTree by re-escaping all the
predefined entities (out of interest, why is the funtion called
_escape_cdata rather than _escape_data?) but I can't do that because I
want uses to be able to insert chunks of html and choose whether or not
they are escaped:
>>> t = Twiddler('')
escaping:
>>> t['something'].replace('')
>>> t.render()
u'<b>'
no escaping:
>>> t['something'].replace('',filters=())
>>> t.render()
u''
I guess in my use of ElementTree, I need to make sure character data is
re-escaped at the tree building stage?
> other names give an error unless they've been
> explicitly defined.
So I see:
>>> from xml.parsers import expat
>>> parser = expat.ParserCreate()
>>> parser.Parse('&foo;',0)
Traceback (most recent call last):
File "", line 1, in ?
xml.parsers.expat.ExpatError: undefined entity: line 1, column 5
But why does calling UseForeignDTD suddenly make everything ok?
>>> parser = expat.ParserCreate()
>>> parser.UseForeignDTD()
>>> parser.Parse('&foo;',0)
1
What extra hooks get called as a result of calling UseForeignDTD?
cheers,
Chris
--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
From stefan_ml at behnel.de Tue Nov 27 14:59:31 2007
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 27 Nov 2007 14:59:31 +0100
Subject: [XML-SIG] problem with elementtree 1.2.6
In-Reply-To: <474BF60C.1050801@simplistix.co.uk>
References: <474B6D14.6010404@simplistix.co.uk> <368a5cd50711262255l25a828cax8bcb546ed390467a@mail.gmail.com>
<474BF60C.1050801@simplistix.co.uk>
Message-ID: <474C22C3.8020101@behnel.de>
Chris Withers wrote:
>> the following entities are predefined: & (&) < (<) > (>)
>> " (") ' (').
>
> Okay, so in the above, if I really mean <, the xml should be:
> '</>'
>
> Seems a little clunky, but okay...
That's how escaping works, be it in XML, encodings, compression, whatever.
> I guess this was causing me problems as I'm working on a bug in Twiddler
> (http://www.simplistix.co.uk/software/python/twiddler)
> where quoted html was ending up unquoted after processing:
>
> >>> from twiddler import Twiddler
> >>> t = Twiddler('<b>')
> >>> t.render()
> u''
If render() is supposed to serialise a correct HTML or XML tag structure then
this is a bug.
> Now, I see how you fixed this in ElementTree by re-escaping all the
> predefined entities (out of interest, why is the funtion called
> _escape_cdata rather than _escape_data?)
You can read the SGML spec regarding CDATA.
> but I can't do that because I
> want uses to be able to insert chunks of html and choose whether or not
> they are escaped:
>
> >>> t = Twiddler('')
>
> escaping:
>
> >>> t['something'].replace('')
What an odd API.
> >>> t.render()
> u'<b>'
I guess that's the expected behaviour.
> no escaping:
>
> >>> t['something'].replace('',filters=())
> >>> t.render()
> u''
I consider it bad practice to write serialised HTML into an HTML template. It
prevents the templating system from seeing the complete tag structure, which
allows you to output broken HTML without noticing. And there's enough broken
HTML out there already.
Doesn't Twiddler provide a way to insert a tag tree fragment rather than a
serialised tag string?
> What extra hooks get called as a result of calling UseForeignDTD?
Have you tried reading the docs or the source?
Stefan
From chris at simplistix.co.uk Wed Nov 28 22:46:04 2007
From: chris at simplistix.co.uk (Chris Withers)
Date: Wed, 28 Nov 2007 21:46:04 +0000
Subject: [XML-SIG] problem with elementtree 1.2.6
In-Reply-To: <474C22C3.8020101@behnel.de>
References: <474B6D14.6010404@simplistix.co.uk> <368a5cd50711262255l25a828cax8bcb546ed390467a@mail.gmail.com>
<474BF60C.1050801@simplistix.co.uk> <474C22C3.8020101@behnel.de>
Message-ID: <474DE19C.5080002@simplistix.co.uk>
Stefan Behnel wrote:
> Chris Withers wrote:
>>> the following entities are predefined: & (&) < (<) > (>)
>>> " (") ' (').
>> Okay, so in the above, if I really mean <, the xml should be:
>> '</>'
>>
>> Seems a little clunky, but okay...
>
> That's how escaping works, be it in XML, encodings, compression, whatever.
Well yes and no. I'd expect escaping to work such that whatever we're
dealing with can be round tripped, ie: parsed, serialiazed, parsed
again, etc.
>> I guess this was causing me problems as I'm working on a bug in Twiddler
>> (http://www.simplistix.co.uk/software/python/twiddler)
>> where quoted html was ending up unquoted after processing:
>>
>> >>> from twiddler import Twiddler
>> >>> t = Twiddler('<b>')
>> >>> t.render()
>> u''
>
> If render() is supposed to serialise a correct HTML or XML tag structure then
> this is a bug.
Indeed, although the bug turned out to be in the tree builder used as
part of the parsing process.
>> Now, I see how you fixed this in ElementTree by re-escaping all the
>> predefined entities (out of interest, why is the funtion called
>> _escape_cdata rather than _escape_data?)
>
> You can read the SGML spec regarding CDATA.
Not sure what that's supposed to mean. CDATA for me means stuff inside a
section. _escape_cdata is used for everything inside any
tag that isn't another tag.
>> but I can't do that because I
>> want uses to be able to insert chunks of html and choose whether or not
>> they are escaped:
>>
>> >>> t = Twiddler('')
>>
>> escaping:
>>
>> >>> t['something'].replace('')
>
> What an odd API.
It actually works pretty well and might make more sense in context, have
a look a the presentation on it:
http://www.simplistix.co.uk/presentations/templating_06/templating_06.pdf
>> no escaping:
>>
>> >>> t['something'].replace('',filters=())
>> >>> t.render()
>> u''
>
> I consider it bad practice to write serialised HTML into an HTML template.
I and many others do not ;-) When writing content into an html template,
that content often comes from other sources that spit out lumps of html.
Being able to insert them without escaping is a common use case.
> It
> prevents the templating system from seeing the complete tag structure, which
> allows you to output broken HTML without noticing.
That's true, sometimes. That inserted lump may have come from a process
which can only spit out perfect html fragments, in which case you're
fine, or it may come from user input, in which case you're doomed but
will likely have happy customers ;-)
> Doesn't Twiddler provide a way to insert a tag tree fragment rather than a
> serialised tag string?
Yep, sure, that's what the clone method is for...
>> What extra hooks get called as a result of calling UseForeignDTD?
>
> Have you tried reading the docs or the source?
Docs yes, source no. I don't read C anymore :-(
Little help?
cheers,
Chris
--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
From fredrik at pythonware.com Thu Nov 29 00:33:08 2007
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 29 Nov 2007 00:33:08 +0100
Subject: [XML-SIG] problem with elementtree 1.2.6
In-Reply-To: <474DE19C.5080002@simplistix.co.uk>
References: <474B6D14.6010404@simplistix.co.uk> <368a5cd50711262255l25a828cax8bcb546ed390467a@mail.gmail.com> <474BF60C.1050801@simplistix.co.uk>
<474C22C3.8020101@behnel.de> <474DE19C.5080002@simplistix.co.uk>
Message-ID:
Chris Withers wrote:
>> That's how escaping works, be it in XML, encodings, compression, whatever.
>
> Well yes and no. I'd expect escaping to work such that whatever we're
> dealing with can be round tripped, ie: parsed, serialiazed, parsed
> again, etc.
that's exactly how it works in ET, of course. you put Python strings in
the tree, the ET parsers and serializers take care of the rest.
elem = ET.Element("tag")
elem.text = value # ASCII or Unicode string
... write to disk ...
... read it back ...
assert elem.text == value
>> You can read the SGML spec regarding CDATA.
>
> Not sure what that's supposed to mean. CDATA for me means stuff inside a
> section._escape_cdata is used for everything inside any
> tag that isn't another tag.
cdata is character data; see
http://www.w3.org/TR/html401/types.html#h-6.2
that's not the same thing as a "CDATA section" (which is just one of
several ways to store character data in an XML file). how things are
stored doesn't matter; that's just a serialization detail:
http://www.w3.org/TR/xml-infoset/#omitted
What is not in the Information Set
6. Whether characters are represented by character references.
19. The boundaries of CDATA marked sections.
...
> I and many others do not ;-) When writing content into an html template,
> that content often comes from other sources that spit out lumps of html.
> Being able to insert them without escaping is a common use case.
HTML might be similar to XML, but an XML parser cannot parse HTML, so
you cannot insert HTML fragments into an XML document without either
escaping it, or pre-processing it to make sure it's well-formed.
if you want to insert literal XML fragments in an ET tree, use the XML
factory function:
fragment = "..."
elem.append(ET.XML(fragment))
if you want to embed HTML fragments in an ET tree, use ElementTidy or
ElementSoup (or equivalent) to turn the fragment into properly nested
and properly namespaced XHTML.
if you want to do unstructured string handling, use a template library
or Python strings. don't use an XML library if you don't want to work
with XML.
> That's true, sometimes. That inserted lump may have come from a process
> which can only spit out perfect html fragments, in which case you're
> fine, or it may come from user input, in which case you're doomed but
> will likely have happy customers ;-)
the hackers will be happy, at least:
http://en.wikipedia.org/wiki/Cross_site_scripting
From pzs at dcs.gla.ac.uk Thu Nov 29 15:56:49 2007
From: pzs at dcs.gla.ac.uk (Peter Saffrey)
Date: Thu, 29 Nov 2007 14:56:49 -0000
Subject: [XML-SIG] Problems with PyXML Mac OS 10.5 install
Message-ID:
I'm attempting install PyXML on a Mac OS leopard laptop so that I can use the xpath libraries. I've downloaded 0.8.4, run "python setup.py build" and "python setup.py install". If I do import xml.xpath, I get "no module xpath".
It can load the xml module OK, but I presume that this is simply the old version. It seems to me that it's simply not being installed - if I use the spotlight to find xpath, it's only found in the local directory, not where it should be installed.
I've also tried (the hacky approach) of copying the built xml directory into the place where the old one is, but then I get the "cannot import name boolean" error.
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20071129/81e438e8/attachment.htm
From chris at simplistix.co.uk Fri Nov 30 00:30:59 2007
From: chris at simplistix.co.uk (Chris Withers)
Date: Thu, 29 Nov 2007 23:30:59 +0000
Subject: [XML-SIG] problem with elementtree 1.2.6
In-Reply-To:
References: <474B6D14.6010404@simplistix.co.uk> <368a5cd50711262255l25a828cax8bcb546ed390467a@mail.gmail.com> <474BF60C.1050801@simplistix.co.uk> <474C22C3.8020101@behnel.de>
<474DE19C.5080002@simplistix.co.uk>
Message-ID: <474F4BB3.3020009@simplistix.co.uk>
Fredrik Lundh wrote:
> Chris Withers wrote:
>
>>> That's how escaping works, be it in XML, encodings, compression, whatever.
>> Well yes and no. I'd expect escaping to work such that whatever we're
>> dealing with can be round tripped, ie: parsed, serialiazed, parsed
>> again, etc.
>
> that's exactly how it works in ET, of course.
I didn't say it didn't ;-)
> cdata is character data; see
>
> http://www.w3.org/TR/html401/types.html#h-6.2
>
> that's not the same thing as a "CDATA section" (which is just one of
> several ways to store character data in an XML file).
Ug. How confusing :-(
> how things are
> stored doesn't matter; that's just a serialization detail:
>
> http://www.w3.org/TR/xml-infoset/#omitted
>
> What is not in the Information Set
>
> 6. Whether characters are represented by character references.
> 19. The boundaries of CDATA marked sections.
> ...
I'm not sure I follow what you're trying to say...
>> I and many others do not ;-) When writing content into an html template,
>> that content often comes from other sources that spit out lumps of html.
>> Being able to insert them without escaping is a common use case.
>
> HTML might be similar to XML, but an XML parser cannot parse HTML, so
> you cannot insert HTML fragments into an XML document without either
> escaping it, or pre-processing it to make sure it's well-formed.
What about xhtml?
> if you want to embed HTML fragments in an ET tree, use ElementTidy or
> ElementSoup (or equivalent) to turn the fragment into properly nested
> and properly namespaced XHTML.
Fair enough...
> if you want to do unstructured string handling, use a template library
I'm using/building a templating library, it just happens that ET is an
implementation detail of that template library ;-)
>> That's true, sometimes. That inserted lump may have come from a process
>> which can only spit out perfect html fragments, in which case you're
>> fine, or it may come from user input, in which case you're doomed but
>> will likely have happy customers ;-)
>
> the hackers will be happy, at least:
>
> http://en.wikipedia.org/wiki/Cross_site_scripting
user -> content author in this case.
Since they usually own and run the system to which they're adding
content, a much more effective attack would just be to turn the box off :-P
cheers,
Chris
--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
From martin at v.loewis.de Fri Nov 30 08:49:00 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 30 Nov 2007 08:49:00 +0100
Subject: [XML-SIG] problem with elementtree 1.2.6
In-Reply-To: <474BF60C.1050801@simplistix.co.uk>
References: <474B6D14.6010404@simplistix.co.uk> <368a5cd50711262255l25a828cax8bcb546ed390467a@mail.gmail.com>
<474BF60C.1050801@simplistix.co.uk>
Message-ID: <474FC06C.5070401@v.loewis.de>
> What extra hooks get called as a result of calling UseForeignDTD?
Expat will invoke the ExternalEntityRefHandler with both pubid and
sysid set to None, if there is no DOCTYPE declaration in the document.
Regards,
Martin
From martin at v.loewis.de Fri Nov 30 09:05:23 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 30 Nov 2007 09:05:23 +0100
Subject: [XML-SIG] problem with elementtree 1.2.6
In-Reply-To: <474F4BB3.3020009@simplistix.co.uk>
References: <474B6D14.6010404@simplistix.co.uk> <368a5cd50711262255l25a828cax8bcb546ed390467a@mail.gmail.com> <474BF60C.1050801@simplistix.co.uk> <474C22C3.8020101@behnel.de> <474DE19C.5080002@simplistix.co.uk>
<474F4BB3.3020009@simplistix.co.uk>
Message-ID: <474FC443.6030302@v.loewis.de>
>> What is not in the Information Set
>>
>> 6. Whether characters are represented by character references.
>> 19. The boundaries of CDATA marked sections.
>> ...
>
> I'm not sure I follow what you're trying to say...
That it is irrelevant in XML whether the less-than character is
represented as < or < or
So if some XML library choses to represent < as < you should
not be surprised.
It's not clear to me (perhaps because I lack the starting of this
discussion) what the actual problem *is* that you are trying to
resolve.
>>> I and many others do not ;-) When writing content into an html template,
>>> that content often comes from other sources that spit out lumps of html.
>>> Being able to insert them without escaping is a common use case.
>> HTML might be similar to XML, but an XML parser cannot parse HTML, so
>> you cannot insert HTML fragments into an XML document without either
>> escaping it, or pre-processing it to make sure it's well-formed.
>
> What about xhtml?
It should be possible to insert XHTML fragments into XHTML documents,
in selected positions, assuming an appropriate definition of "to insert".
For ET (and any other tree-oriented XML implementation), replacing
text with serialized XHTML in the tree is not an appropriate
implementation of "to insert", as that will just insert less-than
characters, not markup. To insert markup (in particular, tags,
i.e. elements), you need to insert Element objects into the tree.
Regards,
Martin