[lxml-dev] parsing DTDs - listing of valid elements

Hi, I'm trying to get the elements in a DTD. Since these internals are not exported in the Python interface of lxml.etree, I am trying to write a Cython extension to do so, as previously suggested on this mailing list (see link below). http://codespeak.net/pipermail/lxml-dev/2009-January/004298.html To quote the message, "all you'd really need is the internal _c_dtd field of the DTD class, which you could cimport". I'm wondering exactly how I am supposed to do that (my attempts so far are described below). It would also be nice to know if the last attempt to do so was successful or not. Thanks. Any help would be appreciated. Here is what I've tried so far (on Python 2.5.4, Cython 0.11.2, Windows): The DTD class is not declared in etreepublic.pxd, so I can't just "cimport etreepublic". The actual DTD class definition is in dtd.pxi, as stated in the message. But I can't just "include 'dtd.pxi' " because it inherits from the _Validator class in lxml.etree.pyx . And I can't "cimport lxml.etree" because there is no file lxml.etree.pxd. I tried writing a lxml.etree.pxd file to circumvent these barriers (which was thoroughly confusing because _Validator contains an _ErrorLog which made me search through several other files...), but even when I got the entire thing to compile, it failed to load in Python:
I have attached my lxml.etree.pxd in case I made any mistakes, in the event that this method can be made to work. -- Elliott Slaughter "Don't worry about what anybody else is going to do. The best way to predict the future is to invent it." - Alan Kay

Please ignore my previous message; I solved my own problem by finding an XML schema for what I need to do. Sorry for the noise. On Tue, Jun 30, 2009 at 3:04 PM, Elliott Slaughter < elliottslaughter@gmail.com> wrote:
-- Elliott Slaughter "Don't worry about what anybody else is going to do. The best way to predict the future is to invent it." - Alan Kay

Hi, Elliott Slaughter wrote:
Please ignore my previous message; I solved my own problem by finding an XML schema for what I need to do.
Note that you can always use trang to convert a DTD to an XML Schema. http://www.thaiopensource.com/relaxng/trang.html If all you need is a list of allowed elements, the required logic to extract that from the schema shouldn't be too hard to figure out. Although I wonder if RelaxNG wouldn't be easier to work on. Stefan

Hi, Elliott Slaughter wrote:
True. So your only chance is to write one yourself. And yes, it needs to be called "lxml.etree.pxd".
All you should really need is this: cimport tree cdef class _Validator: cdef object _error_log cdef class DTD(_Validator): cdef tree.xmlDtd* _c_dtd Cython needs to know the exact /layout/ of the classes that you use (at least if they are not exported as C header files), but it doesn't need to know the exact class types of attributes. "object" will do just fine if you don't care. I know that this is harder than necessary (thanks for bringing this up, BTW), but that's just because _DTD isn't an 'officially' C-exported type, just like all other schema types. Stefan

Please ignore my previous message; I solved my own problem by finding an XML schema for what I need to do. Sorry for the noise. On Tue, Jun 30, 2009 at 3:04 PM, Elliott Slaughter < elliottslaughter@gmail.com> wrote:
-- Elliott Slaughter "Don't worry about what anybody else is going to do. The best way to predict the future is to invent it." - Alan Kay

Hi, Elliott Slaughter wrote:
Please ignore my previous message; I solved my own problem by finding an XML schema for what I need to do.
Note that you can always use trang to convert a DTD to an XML Schema. http://www.thaiopensource.com/relaxng/trang.html If all you need is a list of allowed elements, the required logic to extract that from the schema shouldn't be too hard to figure out. Although I wonder if RelaxNG wouldn't be easier to work on. Stefan

Hi, Elliott Slaughter wrote:
True. So your only chance is to write one yourself. And yes, it needs to be called "lxml.etree.pxd".
All you should really need is this: cimport tree cdef class _Validator: cdef object _error_log cdef class DTD(_Validator): cdef tree.xmlDtd* _c_dtd Cython needs to know the exact /layout/ of the classes that you use (at least if they are not exported as C header files), but it doesn't need to know the exact class types of attributes. "object" will do just fine if you don't care. I know that this is harder than necessary (thanks for bringing this up, BTW), but that's just because _DTD isn't an 'officially' C-exported type, just like all other schema types. Stefan
participants (2)
-
Elliott Slaughter
-
Stefan Behnel