[lxml-dev] Compact RelaxNG Validation
Hello, Does the lxml validation support the compact form of RelaxNG Schema language? Thanks, Len -- ____________________________________________________ Leonard J. Reder Jet Propulsion Laboratory Mar Science Laboratory Project Flight Software Applications & Data Product Management, Section 316D Email: reder@jpl.nasa.gov Phone (Voice): 818-354-3639 Mail Address: Mail Stop: 171-113 4800 Oak Grove Dr. Pasadena, CA. 91109 ---------------------------------------------------
Leonard J. Reder wrote:
Does the lxml validation support the compact form of RelaxNG Schema language?
No, but that's been on the wish list for a while. There is a patch for libxml2 that supports it and has been waiting for inclusion for ages. Once libxml2 supports it, we can see if we can also support it in lxml (obviously requires a backwards compatible implementation, as it must still compile on older libxml2 versions). The other solution would be to add a separate (Python-)implementation to lxml, but I am not aware of a spec-compliant Python implementation here. There are two partial implementations, but they currently fail to handle a larger number of non-trivial RNC schemas, so there is not much use in integrating them. Any help is obviously appreciated. It might already help to keep asking on the libxml2 mailing list. Stefan
Leonard J. Reder wrote:
Does the lxml validation support the compact form of RelaxNG Schema language?
A possible (though not portable) way would be to pipe RNC through trang: http://www.thaiopensource.com/relaxng/trang.html It's written in Java, but there are GCJ'ed Linux binaries available. Stefan
Hello, I'm using lxml primarily for validation of XML documents and requests of UPnP devices. Since many vendors are going to make their devices DLNA compliant, some additional XML elements appear in the XML docs. I would have to pay for the DLNA specs so I have no other choice than deleting these elements in advance and validate the XML afterwards. Is there an easy way to do this with lxml? Am I missing something? Thanks, Michael
Hi, first of all: please don't respond to posts from a different thread when you want to start a new one. Mail-Readers will sort the e-mail into the wrong thread and confuse people. micxer wrote:
I'm using lxml primarily for validation of XML documents and requests of UPnP devices. Since many vendors are going to make their devices DLNA compliant, some additional XML elements appear in the XML docs. I would have to pay for the DLNA specs so I have no other choice than deleting these elements in advance and validate the XML afterwards. Is there an easy way to do this with lxml? Am I missing something?
Not sure what your problem is exactly. Are these "additional elements" in a specific namespace? That would make it easy to remove them: for el in root.getiterator("{http://the/namespace}*"): parent = el.getparent() if parent is not None: # not the root element parent.remove(el) Or are they in other namespaces than the main one? MAIN_NS = "{http://the/namespace}" for el in root.getiterator("*"): if not el.tag.startswith(MAIN_NS): parent = el.getparent() if parent is not None: # not the root element parent.remove(el) Similarly, if you have a set of tag names that must be kept or removed, you can iterate over all elements and check the tag names against the set. Does that solve your problem? Stefan
Hi, Stefan Behnel wrote:
Hi,
first of all: please don't respond to posts from a different thread when you want to start a new one. Mail-Readers will sort the e-mail into the wrong thread and confuse people.
Sorry about that. I thought I removed everything from the old post but I forgot about the headers. And sorry for the late reply. I just found your message in the Junk folder.
micxer wrote:
I'm using lxml primarily for validation of XML documents and requests of UPnP devices. Since many vendors are going to make their devices DLNA compliant, some additional XML elements appear in the XML docs. I would have to pay for the DLNA specs so I have no other choice than deleting these elements in advance and validate the XML afterwards. Is there an easy way to do this with lxml? Am I missing something?
Not sure what your problem is exactly. Are these "additional elements" in a specific namespace? That would make it easy to remove them:
for el in root.getiterator("{http://the/namespace}*"): parent = el.getparent() if parent is not None: # not the root element parent.remove(el)
Or are they in other namespaces than the main one?
MAIN_NS = "{http://the/namespace}" for el in root.getiterator("*"): if not el.tag.startswith(MAIN_NS): parent = el.getparent() if parent is not None: # not the root element parent.remove(el)
Similarly, if you have a set of tag names that must be kept or removed, you can iterate over all elements and check the tag names against the set.
That's exactly the problem I have. I already thought about this manual approach, but I also assumed there must be an easier way like telling the parser to ignore any unknown tag or any tag that's not listed in the schema.
Does that solve your problem?
Absolutely, Thanks :-)
Stefan
Michael
participants (3)
-
Leonard J. Reder -
micxer -
Stefan Behnel