[lxml-dev] saxify only produces events for *NS handlers

Hi, could you please add a comment to the sax.txt that saxify _always_ produces SAX events for the *NS handlers? I just stumbled over this parsing an html file and using saxify to extract some content and had to look up the code in sax.py. I know the examples in sax.txt use *NS, however it's not clear that there won't be any calls ever to the non-NS functions. Andreas -- You never hesitate to tackle the most difficult problems.

Hi Andreas, Andreas Pakulat wrote:
could you please add a comment to the sax.txt that saxify _always_ produces SAX events for the *NS handlers? I just stumbled over this parsing an html file and using saxify to extract some content and had to look up the code in sax.py.
I know the examples in sax.txt use *NS, however it's not clear that there won't be any calls ever to the non-NS functions.
I added a note on that. Maybe we should support namespace-unaware parsing also, though, through the normal parser feature flag. Stefan

On 09.06.06 19:39:04, Stefan Behnel wrote:
Andreas Pakulat wrote:
could you please add a comment to the sax.txt that saxify _always_ produces SAX events for the *NS handlers? I just stumbled over this parsing an html file and using saxify to extract some content and had to look up the code in sax.py.
I know the examples in sax.txt use *NS, however it's not clear that there won't be any calls ever to the non-NS functions.
I added a note on that. Maybe we should support namespace-unaware parsing also, though, through the normal parser feature flag.
Well it's not about parsing, it's about saxifying. In ElementTreeProducer._recursive_saxify you have the calls to content_handler.startElementNS "hardcoded". You could probably check wether the current element actually has any namespaces associated, but on the other hand that complicates the code for no specific reason. I'm fine with using startElementNS, it's just that it wasn't clearly said that thats the only way. Andreas -- You will engage in a profitable business activity.

Hi Andreas, Andreas Pakulat wrote:
On 09.06.06 19:39:04, Stefan Behnel wrote:
Andreas Pakulat wrote:
could you please add a comment to the sax.txt that saxify _always_ produces SAX events for the *NS handlers? I just stumbled over this parsing an html file and using saxify to extract some content and had to look up the code in sax.py.
I know the examples in sax.txt use *NS, however it's not clear that there won't be any calls ever to the non-NS functions. I added a note on that. Maybe we should support namespace-unaware parsing also, though, through the normal parser feature flag.
Well it's not about parsing, it's about saxifying.
Which, from the point of view of a content handler, is exactly the same.
In ElementTreeProducer._recursive_saxify you have the calls to content_handler.startElementNS "hardcoded". You could probably check wether the current element actually has any namespaces associated, but on the other hand that complicates the code for no specific reason.
And it would also be wrong. According to the (admittedly Java-centered) spec: """ The http://xml.org/sax/features/namespaces feature controls general Namespace processing. When this feature is true (the default), any applicable Namespace URIs and localNames (for elements in namespaces) must be available through the startElement and endElement callbacks in the ContentHandler interface, and through the various methods in the Attributes interface, and start/endPrefixMapping events must be reported. For elements and attributes outside of namespaces, the associated namespace URIs will be empty strings and the qName parameter is guaranteed to be provided as a non-empty string. """ http://www.saxproject.org/namespaces.html Note that this feature is supposed to be on by default in SAX2. Stefan

Stefan Behnel wrote:
Andreas Pakulat wrote:
could you please add a comment to the sax.txt that saxify _always_ produces SAX events for the *NS handlers? I just stumbled over this parsing an html file and using saxify to extract some content and had to look up the code in sax.py.
In ElementTreeProducer._recursive_saxify you have the calls to content_handler.startElementNS "hardcoded". You could probably check wether the current element actually has any namespaces associated, but on the other hand that complicates the code for no specific reason.
And it would also be wrong. According to the (admittedly Java-centered) spec:
""" The http://xml.org/sax/features/namespaces feature controls general Namespace processing. When this feature is true (the default), any applicable Namespace URIs and localNames (for elements in namespaces) must be available through the startElement and endElement callbacks in the ContentHandler interface, and through the various methods in the Attributes interface, and start/endPrefixMapping events must be reported. For elements and attributes outside of namespaces, the associated namespace URIs will be empty strings and the qName parameter is guaranteed to be provided as a non-empty string. """
http://www.saxproject.org/namespaces.html
Note that this feature is supposed to be on by default in SAX2.
And here's the relevant snippet from the official Python XML documentation: """ if namespace processing is turned on, you would have to write startElementNS() and endElementNS() methods that looked like this: def startElementNS(self, (uri, localname), qname, attrs): ... def endElementNS(self, (uri, localname, qname): ... The first argument is a 2-tuple containing the URI and the name of the element within that namespace. qname is a string containing the original qualified name of the element, such as "xlink:a", and attrs is a dictionary of attributes. The keys of this dictionary will be (URI, attribute_name) pairs. If no namespace is specified for an element or attribute, the URI will given given as None. """ Feel free to replace the first "given" in the last sentence by "be". http://pyxml.sourceforge.net/topics/howto/node15.html Stefan

Stefan Behnel wrote:
Hi Andreas,
Andreas Pakulat wrote:
could you please add a comment to the sax.txt that saxify _always_ produces SAX events for the *NS handlers? I just stumbled over this parsing an html file and using saxify to extract some content and had to look up the code in sax.py.
I know the examples in sax.txt use *NS, however it's not clear that there won't be any calls ever to the non-NS functions.
I added a note on that. Maybe we should support namespace-unaware parsing also, though, through the normal parser feature flag.
Hm, I don't think we should support namespace unaware parsing. It's just too many complexities while all modern XML uses namespaces. I consider the non-namespace SAX events to be deprecated, myself.. Regards, Martijn
participants (3)
-
Andreas Pakulat
-
Martijn Faassen
-
Stefan Behnel