[lxml-dev] lxml 2.0alpha1 released
Hi all, I'm proudly announcing the first alpha release of lxml 2.0. It features a major cleanup both behind the scenes and at the surface, that improves the XML tool integration and makes the API clearer and more consistent in many places. The major new addition, however, is the lxml.html package, a new toolkit for HTML handling. The web site for the pre-2.0 series is online at http://codespeak.net/lxml/dev/ The "what's new" page has a description of the major changes: http://codespeak.net/lxml/dev/lxml2.html and the ChangeLog has a more detailed list, see below. This being an alpha release means that not everything is stable, both in terms of crashes and the API. There will be a small number of alpha releases to make the advancements publicly available, before the beta releases focus on improving the stability. I warmly invite everyone to contribute to the final release by discussing the API changes and the new features on the mailing list. There is always space for improvements! There is currently a known problem with Microsoft's compilers, so Windows builds may not become available for 2.0alpha1. The next alpha will hopefully come with prebuilt binaries for that platform. Building with the more standards compliant MinGW compilers should work. Note that working on the code now requires Cython (version 0.9.6.5), an enhanced fork of Pyrex. lxml therefore no longer ships with a copy of Pyrex or Cython, but as usual, building from the distribution sources does not require Cython. It can be installed with "easy_install Cython" or from here: http://www.cython.org/ I hope that lxml 2.0 will become a straight continuation of the success story that lxml 1.x was already. Have fun, Stefan 2.0alpha1 (2007-09-02) Features added * Reimplemented objectify.E for better performance and improved integration with objectify. Provides extended type support based on registered PyTypes. * XSLT objects now support deep copying * New makeSubElement() C-API function that allows creating a new subelement straight with text, tail and attributes. * XPath extension functions can now access the current context node (context.context_node) and use a context dictionary (context.eval_context) from the context provided in their first parameter * HTML tag soup parser based on BeautifulSoup in lxml.html.ElementSoup * New module lxml.doctestcompare by Ian Bicking for writing simplified doctests based on XML/HTML output. Use by importing lxml.usedoctest or lxml.html.usedoctest from within a doctest. * New module lxml.cssselect by Ian Bicking for selecting Elements with CSS selectors. * New package lxml.html written by Ian Bicking for advanced HTML treatment. * Namespace class setup is now local to the ElementNamespaceClassLookup instance and no longer global. * Schematron validation (incomplete in libxml2) * Additional stringify argument to objectify.PyType() takes a conversion function to strings to support setting text values from arbitrary types. * Entity support through an Entity factory and element classes. XML parsers now have a resolve_entities keyword argument that can be set to False to keep entities in the document. * column field on error log entries to accompany the line field * Error specific messages in XPath parsing and evaluation NOTE: for evaluation errors, you will now get an XPathEvalError instead of an XPathSyntaxError. To catch both, you can except on XPathError. * The regular expression functions in XPath now support passing a node-set instead of a string * Extended type annotation in objectify: new xsiannotate() function * EXSLT RegExp support in standard XPath (not only XSLT) Bugs fixed * lxml.etree did not check tag/attribute names * The XML parser did not report undefined entities as error * The text in exceptions raised by XML parsers, validators and XPath evaluators now reports the first error that occurred instead of the last * Passing '' as XPath namespace prefix did not raise an error * Thread safety in XPath evaluators Other changes * objectify.PyType for None is now called "NoneType" * el.getiterator() renamed to el.iter(), following ElementTree 1.3 - original name is still available as alias * In the public C-API, findOrBuildNodeNs() was replaced by the more generic findOrBuildNodeNsPrefix * Major refactoring in XPath/XSLT extension function code * Network access in parsers disabled by default
Frédéric Mantegazza wrote:
Le dimanche 2 septembre 2007 18:25, Stefan Behnel a écrit :
* XSLT objects now support deep copying
Good ;o)
... although that's such a recent feature that I wouldn't bet my household on it. Since you had code that stumbled over the lack of that feature, could you give it some more testing so that we can see if it works? Especially in the threaded case? Thanks, Stefan
Le lundi 3 septembre 2007 09:55, Stefan Behnel a écrit :
Frédéric Mantegazza wrote:
Le dimanche 2 septembre 2007 18:25, Stefan Behnel a écrit :
* XSLT objects now support deep copying
Good ;o)
... although that's such a recent feature that I wouldn't bet my household on it. Since you had code that stumbled over the lack of that feature, could you give it some more testing so that we can see if it works? Especially in the threaded case?
Ok, I will make tests. -- Frédéric
Hi,
* Extended type annotation in objectify: new xsiannotate() function
I propose renaming the existing annotate() function to pyannotate() and adding a public interface annotate() to the internal _annotate(), so you can xsi-typify and py-typify in one step. I also think it would be better to change the defaults of the "ignore_old" keyword args of the annotation functions to False, to avoid:
root = E.root(E.i(23), E.s("12"), E.sub()) print objectify.dump(root) root = None [ObjectifiedElement] i = 23 [IntElement] * py:pytype = 'int' s = '12' [StringElement] * py:pytype = 'str' sub = '' [StringElement] objectify.annotate(root) print objectify.dump(root) root = None [ObjectifiedElement] i = 23 [IntElement] * py:pytype = 'int' s = 12 [IntElement] * py:pytype = 'int' sub = '' [StringElement]
where you lose the "str" type information of root.s. I think the current default is a bit counter-intuitive. What do you say? Holger -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
Hi,
I propose renaming the existing annotate() function to pyannotate() and adding a public interface annotate() to the internal _annotate(), so you can xsi-typify and py-typify in one step. I also think it would be better to change the defaults of the "ignore_old" keyword args of the annotation functions to False, to avoid:
Attached is patch that does just that, with tests. I quirked the defaults for the new annotate() function that can now py:pytype and xsi:type-annotate in one step so that it behaves just like the former annotate() (at least it passes all the existing unittests which I did not alter) The new pyannotate() and xsiannotate() use different defaults, as suggested. There's an additional keyword arg keep_tree that lets you preserve existing TREE attribute values, if switched on. Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
jholg@gmx.de wrote:
I propose renaming the existing annotate() function to pyannotate() and adding a public interface annotate() to the internal _annotate(), so you can xsi-typify and py-typify in one step.
Sure, that's fine.
I also think it would be better to change the defaults of the "ignore_old" keyword args of the annotation functions to False
Definitely ok for the new ones. Maybe for annotate() also, I'm not sure yet.
Attached is patch that does just that, with tests.
I quirked the defaults for the new annotate() function that can now py:pytype and xsi:type-annotate in one step so that it behaves just like the former annotate() (at least it passes all the existing unittests which I did not alter) The new pyannotate() and xsiannotate() use different defaults, as suggested.
I still have to look through the patch a bit more, but I generally like the intention, except:
There's an additional keyword arg keep_tree that lets you preserve existing TREE attribute values, if switched on.
No way. :) It doesn't match the existing "ignore_*" parameters and the default is to /remove/ the tree annotation when what we want is to /create/ annotations. Taking one step back: what was the reason again why we started using TREE annotation at all? I mean, it doesn't have any advantage and it currently looks like it's getting in the way. Is there a reason that should keep us from just dropping it? completely? (minus backwards compatibility?) I mean, honestly, it's not used and it's even faster to check for children than it is to look up the attribute... Stefan
There's an additional keyword arg keep_tree that lets you preserve existing TREE attribute values, if switched on.
No way. :)
It doesn't match the existing "ignore_*" parameters and the default is to /remove/ the tree annotation when what we want is to /create/ annotations.
Hm, maybe then pyannotate() should rather not default to remove TREE attributes?
Taking one step back: what was the reason again why we started using TREE annotation at all? I mean, it doesn't have any advantage and it currently looks like it's getting in the way. Is there a reason that should keep us from just dropping it? completely? (minus backwards compatibility?)
I mean, honestly, it's not used and it's even faster to check for children than it is to look up the attribute...
It's there to allow for leaf elements to be ObjectifiedElements, rather than ObjectifiedDataElements. The rules are easy for all other use cases: - the root has no parent element -> ObjectifiedElement - any other element with children -> ObjectifiedElement Things get difficult if you assign leaf elements and actually instantiate the python proxy objects. If no TREE attributes get used, these will end up being "default empty elements", usually string elements. Also, once having been serialized, there is no way that leaf elements can be recognized as ObjectifiedElements without the help of the TREE attribute. That's the main reason I propose the keep_tree functionality, to make ObjectifiedElement-leaves survive a creation-serialization-parse cycle. Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
Hi Holger, jholg@gmx.de wrote:
Things get difficult if you assign leaf elements and actually instantiate the python proxy objects. If no TREE attributes get used, these will end up being "default empty elements", usually string elements.
Also, once having been serialized, there is no way that leaf elements can be recognized as ObjectifiedElements without the help of the TREE attribute. That's the main reason I propose the keep_tree functionality, to make ObjectifiedElement-leaves survive a creation-serialization-parse cycle.
I think we should do this: if old_pytypename == TREE_PYTYPE: if cetree.findChild(c_node, 0) is NULL: pytype = TREE_PYTYPE else: # check old type Do you still think we need the keep_tree then? Stefan
Hello Stefan,
attribute. That's the main reason I propose the keep_tree functionality, to make ObjectifiedElement-leaves survive a creation-serialization-parse cycle.
I think we should do this:
if old_pytypename == TREE_PYTYPE: if cetree.findChild(c_node, 0) is NULL: pytype = TREE_PYTYPE else: # check old type
Do you still think we need the keep_tree then?
You really don't like it, do you ;-)? I'd say this should work and remove the need for keep_tree, though. Sidenote: So I thought maybe we should revise the use of TREE in objectify in general, but one has to be very careful. You really want to have it e.g. in objectify.Element():
o = objectify.Element("structural") e = etree.Element("structural") type(o), type(e) (<type 'objectify.ObjectifiedElement'>, <type 'objectify.ObjectifiedElement'>) root.o = o root.e = e # Now type lookup can not rely on parent == None ... type(root.o), type(root.e) (<type 'objectify.ObjectifiedElement'>, <type 'objectify.StringElement'>)
Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
Hi Holger, jholg@gmx.de wrote:
I think we should do this:
if old_pytypename == TREE_PYTYPE: if cetree.findChild(c_node, 0) is NULL: pytype = TREE_PYTYPE else: # check old type
Do you still think we need the keep_tree then?
You really don't like it, do you ;-)? I'd say this should work and remove the need for keep_tree, though.
Ok. I also added the tests from your patch now. Obvious question then: anything still missing from what your last patch did?
Sidenote: So I thought maybe we should revise the use of TREE in objectify in general, but one has to be very careful. You really want to have it e.g. in objectify.Element():
I think we should, and we should restrict its use to a minimum. If you want, you can take a look at it. I don't feel like touching working code at the moment. :)
o = objectify.Element("structural") e = etree.Element("structural") type(o), type(e) (<type 'objectify.ObjectifiedElement'>, <type 'objectify.ObjectifiedElement'>)
Whatever. I don't want any code to rely on *that*. :) (but I can see what your getting at)
root.o = o root.e = e # Now type lookup can not rely on parent == None ... type(root.o), type(root.e) (<type 'objectify.ObjectifiedElement'>, <type 'objectify.StringElement'>)
I'm not (any longer :) questioning the TREE type in general. I just think we should not write annotations where we know we will not need them. Stefan
Ok. I also added the tests from your patch now.
Obvious question then: anything still missing from what your last patch did?
I'll take a look.
you can take a look at it. I don't feel like touching working code at the moment. :)
I already peeked, and there is really not many places where TREE is used. I'd say it is needed anywhere it currently is, but I'll take a closer look. Holger -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
Hi, attached patch - enhances the annotation tests to check the TREE-attribute survival for leaf-TREE-elements in annotate/pyannotate/xsiannotate - fixes a bug in DataElement that did not set py:pytype correctly when invoked with unicode string args and adds some tests for this. I renamed _get_pytypename() to _pytypename() (internal) and __get_pytypename() to pytypename() (public), so DataElement() now uses _pytypename() rather than _typename(). Holger Btw I'm getting core dumps in the schematron tests: 685/802 ( 85.4%): test_schematron_invalid_schema_empty (...hematron.ETreeSchematronTestCase)Segmentation Fault (core dumped) #0 0xff0b3218 in strlen () from /usr/lib/libc.so.1 #1 0xff106530 in _doprnt () from /usr/lib/libc.so.1 #2 0xff108730 in vsnprintf () from /usr/lib/libc.so.1 #3 0xfe2b7874 in __xmlRaiseError () from /apps/prod/lib/libxml2.so.2 #4 0xfe45fd5c in xmlSchematronPErr () from /apps/prod/lib/libxml2.so.2 #5 0xfe462d24 in xmlSchematronParse () from /apps/prod/lib/libxml2.so.2 #6 0xfe60f080 in __pyx_f_5etree_10Schematron___init__ (__pyx_v_self=0x829a10, __pyx_args=0x8872c8, __pyx_kwds=0x109b04) at src/lxml/etree.c:4131 #7 0x58504 in type_call (type=0xfe665988, args=0x829d30, kwds=0x89d4b0) at Objects/typeobject.c:443 #8 0x260c4 in PyObject_Call (func=0x829a10, arg=0x829d30, kw=0x89d4b0) at Objects/abstract.c:1802 -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger
jholg@gmx.de wrote:
attached patch - enhances the annotation tests to check the TREE-attribute survival for leaf-TREE-elements in annotate/pyannotate/xsiannotate
Ok.
- fixes a bug in DataElement that did not set py:pytype correctly when invoked with unicode string args and adds some tests for this.
You should just commit this kind of fixes instead of sending them to the list.
I renamed _get_pytypename() to _pytypename() (internal) and __get_pytypename() to pytypename() (public), so DataElement() now uses _pytypename() rather than _typename().
Any reason there *is* a pytypename() function? It doesn't seem to be used.
Btw I'm getting core dumps in the schematron tests:
685/802 ( 85.4%): test_schematron_invalid_schema_empty (...hematron.ETreeSchematronTestCase)Segmentation Fault (core dumped)
#0 0xff0b3218 in strlen () from /usr/lib/libc.so.1 #1 0xff106530 in _doprnt () from /usr/lib/libc.so.1 #2 0xff108730 in vsnprintf () from /usr/lib/libc.so.1 #3 0xfe2b7874 in __xmlRaiseError () from /apps/prod/lib/libxml2.so.2 #4 0xfe45fd5c in xmlSchematronPErr () from /apps/prod/lib/libxml2.so.2 #5 0xfe462d24 in xmlSchematronParse () from /apps/prod/lib/libxml2.so.2 #6 0xfe60f080 in __pyx_f_5etree_10Schematron___init__ (__pyx_v_self=0x829a10, __pyx_args=0x8872c8, __pyx_kwds=0x109b04) at src/lxml/etree.c:4131 #7 0x58504 in type_call (type=0xfe665988, args=0x829d30, kwds=0x89d4b0) at Objects/typeobject.c:443 #8 0x260c4 in PyObject_Call (func=0x829a10, arg=0x829d30, kw=0x89d4b0) at Objects/abstract.c:1802
I don't get those, with none of the supported libxml2 versions. What's the one you use? Have you seen those with the trunk before or is it just now? Stefan
Hi Stefan,
- fixes a bug in DataElement that did not set py:pytype correctly when invoked with unicode string args and adds some tests for this.
You should just commit this kind of fixes instead of sending them to the list.
Ok.
I renamed _get_pytypename() to _pytypename() (internal) and __get_pytypename() to pytypename() (public), so DataElement() now uses _pytypename() rather than _typename().
Any reason there *is* a pytypename() function? It doesn't seem to be used.
I figured it's nice to have it usable from outside objectify if you need to use explicit pytype names, so you don't have to reimplement the str/unicode distinction everywhere.
Btw I'm getting core dumps in the schematron tests:
685/802 ( 85.4%): test_schematron_invalid_schema_empty (...hematron.ETreeSchematronTestCase)Segmentation Fault (core dumped)
#0 0xff0b3218 in strlen () from /usr/lib/libc.so.1 #1 0xff106530 in _doprnt () from /usr/lib/libc.so.1 #2 0xff108730 in vsnprintf () from /usr/lib/libc.so.1 #3 0xfe2b7874 in __xmlRaiseError () from /apps/prod/lib/libxml2.so.2 #4 0xfe45fd5c in xmlSchematronPErr () from /apps/prod/lib/libxml2.so.2 #5 0xfe462d24 in xmlSchematronParse () from /apps/prod/lib/libxml2.so.2 #6 0xfe60f080 in __pyx_f_5etree_10Schematron___init__ (__pyx_v_self=0x829a10, __pyx_args=0x8872c8, __pyx_kwds=0x109b04) at src/lxml/etree.c:4131 #7 0x58504 in type_call (type=0xfe665988, args=0x829d30, kwds=0x89d4b0) at Objects/typeobject.c:443 #8 0x260c4 in PyObject_Call (func=0x829a10, arg=0x829d30, kw=0x89d4b0) at Objects/abstract.c:1802
I don't get those, with none of the supported libxml2 versions. What's the one you use? Have you seen those with the trunk before or is it just now?
No, I've not seen such problems on the trunk before. I had to upgrade to latest cython to build this time. This is the setup: TESTED VERSION: 2.0.alpha2-46719 Python: (2, 4, 4, 'final', 0) lxml.etree: (2, 0, -198, 46719) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
No, I've not seen such problems on the trunk before. I had to upgrade to latest cython to build this time.
Latest Cython *Release* (0.9.6.6), that is, to be exact. Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
jholg@gmx.de wrote:
Any reason there *is* a pytypename() function? It doesn't seem to be used.
I figured it's nice to have it usable from outside objectify if you need to use explicit pytype names, so you don't have to reimplement the str/unicode distinction everywhere.
Ok, why not. I added it.
Btw I'm getting core dumps in the schematron tests:
685/802 ( 85.4%): test_schematron_invalid_schema_empty (...hematron.ETreeSchematronTestCase)Segmentation Fault (core dumped) #0 0xff0b3218 in strlen () from /usr/lib/libc.so.1 #1 0xff106530 in _doprnt () from /usr/lib/libc.so.1 #2 0xff108730 in vsnprintf () from /usr/lib/libc.so.1 #3 0xfe2b7874 in __xmlRaiseError () from /apps/prod/lib/libxml2.so.2 #4 0xfe45fd5c in xmlSchematronPErr () from /apps/prod/lib/libxml2.so.2 #5 0xfe462d24 in xmlSchematronParse () from /apps/prod/lib/libxml2.so.2 #6 0xfe60f080 in __pyx_f_5etree_10Schematron___init__ (__pyx_v_self=0x829a10, __pyx_args=0x8872c8, __pyx_kwds=0x109b04) at src/lxml/etree.c:4131 #7 0x58504 in type_call (type=0xfe665988, args=0x829d30, kwds=0x89d4b0) at Objects/typeobject.c:443 #8 0x260c4 in PyObject_Call (func=0x829a10, arg=0x829d30, kw=0x89d4b0) at Objects/abstract.c:1802 I don't get those, with none of the supported libxml2 versions. What's the one you use? Have you seen those with the trunk before or is it just now?
No, I've not seen such problems on the trunk before. I had to upgrade to latest cython to build this time.
This is the setup:
TESTED VERSION: 2.0.alpha2-46719 Python: (2, 4, 4, 'final', 0) lxml.etree: (2, 0, -198, 46719) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20)
Schematron uses XPath a lot, so I wouldn't be surprised if this was related to the XPath bug in libxml2 2.6.27. Is there any chance you could switch to 2.6.28 or later? Note that lxml.etree (trunk) now emits a warning if you use XPath on 2.6.27, as we can't really work around it. It happens when you get certain errors in the XPath evaluation, as in the case above. Stefan
Hi Stefan,
Btw I'm getting core dumps in the schematron tests: [...] Schematron uses XPath a lot, so I wouldn't be surprised if this was related to the XPath bug in libxml2 2.6.27. Is there any chance you could switch to 2.6.28 or later? Note that lxml.etree (trunk) now emits a warning if you use XPath on 2.6.27, as we can't really work around it. It happens when you get certain errors in the XPath evaluation, as in the case above.
I'll try out the latest libxml2, I had also noted the warning. Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
Hi,
Schematron uses XPath a lot, so I wouldn't be surprised if this was related to the XPath bug in libxml2 2.6.27. Is there any chance you could switch to 2.6.28 or later? Note that lxml.etree (trunk) now emits a warning if you use XPath on 2.6.27, as we can't really work around it. It happens when you get certain errors in the XPath evaluation, as in the case above.
I'll try out the latest libxml2, I had also noted the warning.
Unfortunately, using the latest & greatest libxml2/libxslt (2.6.33/1.1.22) doesn't solve the problem for me. Btw I won't come near a Solaris box for the next week, and probably not be reachable by mail, so unfortunately I will only be able to provide more info then. Have a nice week, everybody! Holger Here's what I see: Something strange (a cython bug?): #6 0xfe60fee0 in __pyx_f_5etree_10Schematron___init__ (__pyx_v_self=0x8c7c50, __pyx_args=0x887700, __pyx_kwds=0x109b04) at src/lxml/etree.c:4905 But when I look at etree.c in line 4905 this is nowhere near __pyx_f_5etree_10Schematron___init__: etree.c: ======== [...] 70188 70189 static int __pyx_f_5etree_10Schematron___init__(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ 70190 static int __pyx_f_5etree_10Schematron___init__(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { 70191 PyObject *__pyx_v_etree = 0; 70192 PyObject *__pyx_v_file = 0; [...] Test & backtrace ================ /apps/pydev/bin/python2.4 setup.py build_ext -i Building with Cython. Building lxml version 2.0.alpha2-46776 running build_ext /apps/pydev/bin/python2.4 test.py -p -v TESTED VERSION: 2.0.alpha2-46776 Python: (2, 4, 4, 'final', 0) lxml.etree: (2, 0, -198, 46776) libxml used: (2, 6, 30) libxml compiled: (2, 6, 30) libxslt used: (1, 1, 22) libxslt compiled: (1, 1, 22) 111/810 ( 13.7%): Doctest: validation.txt /Total line 1: Sum is not 100%. /Total line 1: Sum is not 100%. 671/810 ( 82.8%): Doctest: validation.txt /Total line 1: Sum is not 100%. /Total line 1: Sum is not 100%. 690/810 ( 85.2%): test_schematron (lxml.tests.test_schematron.ETreeSchematronTestCase)/AAA line 1: There is an extra element 693/810 ( 85.6%): test_schematron_invalid_schema_empty (...schematron.ETreeSchematronTestCase)make: *** [test_inplace] Segmentation Fault (core dumped) 2 lb54320@adevp02 .../lxml $ gdb python2.4 -c core GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.6"... Core was generated by `/apps/pydev/bin/python2.4 test.py -p -v'. Program terminated with signal 9, Killed. Reading symbols from /usr/lib/libresolv.so.2...done. Reading symbols from /usr/lib/libsocket.so.1...done. Reading symbols from /usr/lib/libnsl.so.1...done. Reading symbols from /usr/lib/librt.so.1...done. Reading symbols from /usr/lib/libdl.so.1...done. Reading symbols from /usr/lib/libpthread.so.1...done. Reading symbols from /usr/lib/libm.so.1...done. Reading symbols from /usr/lib/libc.so.1...done. Reading symbols from /usr/lib/libmp.so.2...done. Reading symbols from /usr/lib/libaio.so.1...done. Reading symbols from /usr/platform/SUNW,Sun-Fire-V440/lib/libc_psr.so.1...done. Reading symbols from /usr/lib/libthread.so.1...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/time.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/itertools.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/_curses.so...done. Reading symbols from /apps/prod/lib/libncurses.so.5...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/collections.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/strop.so...done. Reading symbols from /data/pydev/hjoukl/LXML/lxml/src/lxml/etree.so...done. Reading symbols from /apps/pydev/lib/libxslt.so.1...done. Reading symbols from /apps/pydev/lib/libexslt.so.0...done. Reading symbols from /apps/pydev/lib/libxml2.so.2...done. Reading symbols from /apps/prod/lib/libz.so...done. Reading symbols from /apps/prod//lib/libiconv.so.2...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/_bisect.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/_heapq.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/cStringIO.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/math.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/binascii.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/_random.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/fcntl.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/_socket.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/_ssl.so...done. Reading symbols from /apps/local/lib/libssl.so.0.9.6...done. Reading symbols from /apps/local/lib/libcrypto.so.0.9.6...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/operator.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/struct.so...done. ---Type <return> to continue, or q <return> to quit--- Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/md5.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/sha.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/datetime.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/zlib.so...done. Reading symbols from /data/pydev/hjoukl/LXML/lxml/src/lxml/objectify.so...done. Reading symbols from /data/pydev/hjoukl/LXML/lxml/src/lxml/pyclasslookup.so...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/_locale.so...done. Reading symbols from /apps/local/lib/libintl.so.1...done. Reading symbols from /apps/pydev/lib/python2.4/lib-dynload/readline.so...done. Reading symbols from /apps/prod/lib/libreadline.so...done. #0 0xff0b3218 in strlen () from /usr/lib/libc.so.1 (gdb) bt #0 0xff0b3218 in strlen () from /usr/lib/libc.so.1 #1 0xff106530 in _doprnt () from /usr/lib/libc.so.1 #2 0xff108730 in vsnprintf () from /usr/lib/libc.so.1 #3 0xfe2b7afc in __xmlRaiseError () from /apps/pydev/lib/libxml2.so.2 #4 0xfe461e2c in xmlSchematronPErr () from /apps/pydev/lib/libxml2.so.2 #5 0xfe4648b4 in xmlSchematronParse () from /apps/pydev/lib/libxml2.so.2 #6 0xfe60fee0 in __pyx_f_5etree_10Schematron___init__ (__pyx_v_self=0x8c7c50, __pyx_args=0x887700, __pyx_kwds=0x109b04) at src/lxml/etree.c:4905 #7 0x58504 in type_call (type=0xfe666d80, args=0x824f30, kwds=0x89c810) at Objects/typeobject.c:443 #8 0x260c4 in PyObject_Call (func=0x8c7c50, arg=0x824f30, kw=0x89c810) at Objects/abstract.c:1802 #9 0x88f4c in ext_do_call (func=0xfe666d80, pp_stack=0xffbed5ec, flags=3, na=-1, nk=0) at Python/ceval.c:3848 #10 0x85af8 in PyEval_EvalFrame (f=0x1803d0) at Python/ceval.c:2214 #11 0x86eb8 in PyEval_EvalCodeEx (co=0x1be460, globals=0x0, locals=0x1803d0, args=0x88496c, argcount=4, kws=0x88497c, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2752 #12 0x88888 in update_keyword_args (orig_kwdict=0x0, nk=-4270016, pp_stack=0x4, func=0x4) at Python/ceval.c:3676 #13 0x886b0 in call_function (pp_stack=0xffbed840, oparg=4) at Python/ceval.c:3597 #14 0x85a00 in PyEval_EvalFrame (f=0x884818) at Python/ceval.c:2186 #15 0x887fc in fast_function (func=0x6d4770, pp_stack=0x77ef28, n=1, na=1, nk=1240904) at Python/ceval.c:3654 #16 0x886b0 in call_function (pp_stack=0xffbeda08, oparg=1) at Python/ceval.c:3597 #17 0x85a00 in PyEval_EvalFrame (f=0x77edc8) at Python/ceval.c:2186 #18 0x86eb8 in PyEval_EvalCodeEx (co=0x1be260, globals=0x0, locals=0x77edc8, args=0x98b834, argcount=2, kws=0x2f4370, kwcount=0, defs=0x1c067c, defcount=1, closure=0x0) at Python/ceval.c:2752 #19 0xdadd4 in PyFunction_GetCode (op=0x1c8a30) at Objects/funcobject.c:66 #20 0x260c4 in PyObject_Call (func=0x1c8a30, arg=0x98b828, kw=0x8c1150) at Objects/abstract.c:1802 #21 0x88f4c in ext_do_call (func=0x1c8a30, pp_stack=0xffbedca4, flags=3, na=-1, nk=0) at Python/ceval.c:3848 #22 0x85af8 in PyEval_EvalFrame (f=0x48eb78) at Python/ceval.c:2214 #23 0x86eb8 in PyEval_EvalCodeEx (co=0x1be2a0, globals=0x0, locals=0x48eb78, args=0x973f3c, argcount=2, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2752 #24 0xdadd4 in PyFunction_GetCode (op=0x1c8a70) at Objects/funcobject.c:66 #25 0x260c4 in PyObject_Call (func=0x1c8a70, arg=0x973f30, kw=0x0) at Objects/abstract.c:1802 #26 0x2e30c in instancemethod_descr_get (meth=0x1, obj=0x973f30, cls=0x0) at Objects/classobject.c:2539 #27 0x260c4 in PyObject_Call (func=0x1c8a70, arg=0x973f30, kw=0x0) at Objects/abstract.c:1802 #28 0x638b8 in slot_tp_call (self=0x6ced10, args=0x25e470, kwds=0x0) at Objects/typeobject.c:4549 #29 0x260c4 in PyObject_Call (func=0x6ced10, arg=0x25e470, kw=0x0) at Objects/abstract.c:1802 #30 0x8a8e0 in do_call (func=0x6ced10, pp_stack=0xffbee3a8, na=-1, nk=2483312)- -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
Hi,
Schematron uses XPath a lot, so I wouldn't be surprised if this was related to the XPath bug in libxml2 2.6.27. Is there any chance you could switch to [...] Unfortunately, using the latest & greatest libxml2/libxslt (2.6.33/1.1.22) doesn't solve the problem for me.
I'm trying to get some sensible information but have real problems with debugging, as I'm seeing line number information that is just plain wrong, though compiling with debugging on and everything, the likes of: (gdb) info source Current source file is src/lxml/etree.c Compilation directory is /home/lb54320/pydev/LXML/lxml/ Located in /home/lb54320/pydev/LXML/lxml/src/lxml/etree.c Contains 90795 lines. Source language is c. Compiled with stabs debugging format. (gdb) b etree.c:70850 No line 70850 in file "src/lxml/etree.c". (gdb) No idea what I'm doing wrong here, at the moment. So the info on the crash does not get much better than that backtrace at the moment: Program received signal SIGSEGV, Segmentation fault. 0xff0b3218 in strlen () from /usr/lib/libc.so.1 (gdb) bt #0 0xff0b3218 in strlen () from /usr/lib/libc.so.1 #1 0xff106530 in _doprnt () from /usr/lib/libc.so.1 #2 0xff108730 in vsnprintf () from /usr/lib/libc.so.1 #3 0xfe23df04 in __xmlRaiseError () from /apps/pydev/debug/dmalloc/lib//libxml2.so.2 #4 0xfe3e717c in xmlSchematronPErr () from /apps/pydev/debug/dmalloc/lib//libxml2.so.2 #5 0xfe3e9878 in xmlSchematronParse () from /apps/pydev/debug/dmalloc/lib//libxml2.so.2 #6 0xfe68dfdc in __pyx_f_5etree_10Schematron___init__ (__pyx_v_self=0x1b30f0, __pyx_args=0x1db670, __pyx_kwds=0x0) at src/lxml/etree.c:5663 What I can see, though, is that using the same schematron schema with xmllint does not crash: 0 $ cat invalid_empty.xst <schema xmlns="http://purl.oclc.org/dsdl/schematron" /> 0 $ python2.4 -i -c 'from lxml import etree; print etree.LIBXML_VERSION; schema = etree.Schematron(etree.parse("invalid_empty.xst"))' (2, 6, 30) Segmentation Fault (core dumped) whereas $ /apps/pydev/bin/xmllint --schematron invalid_empty.xst foo.xml --version /apps/pydev/bin/xmllint: using libxml version 20630 compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib invalid_empty.xst:1: element schema: Schemas parser error : The schematron document 'invalid_empty.xst' has no pattern Schematron schema invalid_empty.xst failed to compile <?xml version="1.0"?> <root/> Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail
jholg@gmx.de wrote:
Schematron uses XPath a lot, so I wouldn't be surprised if this was related to the XPath bug in libxml2 2.6.27. Is there any chance you could switch to [...] Unfortunately, using the latest & greatest libxml2/libxslt (2.6.33/1.1.22) doesn't solve the problem for me.
I'm trying to get some sensible information but have real problems with debugging, as I'm seeing line number information that is just plain wrong, though compiling with debugging on and everything, the likes of:
(gdb) info source Current source file is src/lxml/etree.c Compilation directory is /home/lb54320/pydev/LXML/lxml/ Located in /home/lb54320/pydev/LXML/lxml/src/lxml/etree.c Contains 90795 lines. Source language is c. Compiled with stabs debugging format. (gdb) b etree.c:70850 No line 70850 in file "src/lxml/etree.c". (gdb)
Never seen that before. I assume you did a clean build before that? Maybe gdb doesn't get along with the source line references in the comments of the generated C file?
So the info on the crash does not get much better than that backtrace at the moment:
Program received signal SIGSEGV, Segmentation fault. 0xff0b3218 in strlen () from /usr/lib/libc.so.1 (gdb) bt #0 0xff0b3218 in strlen () from /usr/lib/libc.so.1 #1 0xff106530 in _doprnt () from /usr/lib/libc.so.1 #2 0xff108730 in vsnprintf () from /usr/lib/libc.so.1 #3 0xfe23df04 in __xmlRaiseError () from /apps/pydev/debug/dmalloc/lib//libxml2.so.2 #4 0xfe3e717c in xmlSchematronPErr () from /apps/pydev/debug/dmalloc/lib//libxml2.so.2 #5 0xfe3e9878 in xmlSchematronParse () from /apps/pydev/debug/dmalloc/lib//libxml2.so.2 #6 0xfe68dfdc in __pyx_f_5etree_10Schematron___init__ (__pyx_v_self=0x1b30f0, __pyx_args=0x1db670, __pyx_kwds=0x0) at src/lxml/etree.c:5663
What I can see, though, is that using the same schematron schema with xmllint does not crash: 0 $ cat invalid_empty.xst <schema xmlns="http://purl.oclc.org/dsdl/schematron" />
0 $ python2.4 -i -c 'from lxml import etree; print etree.LIBXML_VERSION; schema = etree.Schematron(etree.parse("invalid_empty.xst"))' (2, 6, 30) Segmentation Fault (core dumped)
whereas
$ /apps/pydev/bin/xmllint --schematron invalid_empty.xst foo.xml --version /apps/pydev/bin/xmllint: using libxml version 20630 compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib invalid_empty.xst:1: element schema: Schemas parser error : The schematron document 'invalid_empty.xst' has no pattern Schematron schema invalid_empty.xst failed to compile <?xml version="1.0"?> <root/>
xmllint has a different error reporting setup, that might make the difference. Anyway, error reporting in Schematron is pretty basic and remember working around that at the time. I'll have to take a deeper look into it when I find the time. Stefan
participants (3)
-
Frédéric Mantegazza
-
jholg@gmx.de
-
Stefan Behnel