[lxml-dev] lxml 1.3 coming up

Hi all, I finally found some time to work on lxml 1.3 again. Since the current trunk will become 2.0 (though not that soon :), I created a new branch "lxml-1.3" that will become the 1.3 release. I worked through the trunk commits to see which ones to copy over, found quite a few and rejected some bigger ones. I'd be glad if others could give this branch a final try before the release, see if it compiles (e.g. on MS Windows), if it contains the long awaited bug fixes, etc. I'm planning the release for the end of the week. http://codespeak.net/svn/lxml/branch/lxml-1.3/ Note that I tried to avoid behavioural changes, so some 'fixes' might have been left out for that reason. If unsure, it might be worth asking. Have fun, Stefan

On 6/12/07, Stefan Behnel <stefan_ml@behnel.de> wrote:
Builds fine here. There's only one test failure: C:\src\lxml-build\lxml-1.3>python test.py TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44201) libxml used: (2, 6, 28) libxml compiled: (2, 6, 28) libxslt used: (1, 1, 19) libxslt compiled: (1, 1, 19) ====================================================================== FAIL: test_module_HTML_unicode (lxml.tests.test_htmlparser.HtmlParserTestCaseBas e) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\Python24\lib\unittest.py", line 260, in run testMethod() File "C:\src\lxml-build\lxml-1.3\src\lxml\tests\test_htmlparser.py", line 33, in test_module_HTML_unicode self.uhtml_str) File "c:\Python24\lib\unittest.py", line 333, in failUnlessEqual raise self.failureException, \ AssertionError: u'<html><head><title>test \xc3\x83\xc2\xa1\xef\xa3\x92</title></ head><body><h1>page \xc3\x83\xc2\xa1\xef\xa3\x92 title</h1></body></html>' != u' <html><head><title>test \xc3\xa1\uf8d2</title></head><body><h1>page \xc3\xa1\uf8 d2 title</h1></body></html>' ---------------------------------------------------------------------- Ran 727 tests in 2.564s FAILED (failures=1) -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214

Oh, btw, maybe it's time to change the setuptools require() from 0.5 to 0.6c6? I had to install setuptools manually on a machine because it couldn't find 0.5 through the usual means. -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214

Hi Stephan,
Note that I tried to avoid behavioural changes, so some 'fixes' might have been left out for that reason. If unsure, it might be worth asking.
I'm missing the deannotate/xsiannotate additions to objectify. Will they not be in 1.3? I was also hoping the xsd:-prefixing of xsi:types would make it into 1.3, although I admit this is a bit of a behavioural change... Apart from that, compiles and tests fine for me on Solaris: $ make test PYTHON=python2.4 python2.4 setup.py build_ext -i Building lxml version 1.3-44219 running build_ext python2.4 test.py -p -v TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44219) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 542/542 (100.0%): Doctest: xpathxslt.txt ---------------------------------------------------------------------- Ran 542 tests in 1.562s OK PYTHONPATH=src python2.4 selftest.py 126 tests ok. PYTHONPATH=src python2.4 selftest2.py 88 tests ok. 0 $ Holger -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

Hi Holger, jholg@gmx.de wrote:
I was thinking about that, but ...
I was also hoping the xsd:-prefixing of xsi:types would make it into 1.3, although I admit this is a bit of a behavioural change...
... I think the two should become available together. Maybe deannotate() makes sense on its own, but xsiannotate is new and would then change its behaviour in 2.0. So I wouldn't mind adding deannotate() to 1.3. BTW, you tend to use the trunk anyway, but I don't think there are many others who already use these features...
Apart from that, compiles and tests fine for me on Solaris:
Great, thanks. Stefan

Hi Stefan,
Not exactly true any more, usually I'm keeping to 1.2.1 now as we finally have a first live application in production, with our framework now fully lxml.objectify-based. I'll have to put up some marketing write-up on what we are using lxml for some of these days... I've just been using trunk recently to check out those nice new features. If nobody uses the features apart from me anyway then I vote for v1.3-inclusion ;-) Seriously: Having deannotate() would be nice. Thanks, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi, just realized that the extended xsi:type-support which registers more xsi:types in _registerPyTypes is also not in 1.3 currently. I think this isn't really a change in behaviour, but rather an extension and some fix (more appropriate mapping of xsi:types to python builtin types). +1 for inclusion in 1.3. Holger
-- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

Hi Holger, jholg@gmx.de wrote:
just realized that the extended xsi:type-support which registers more xsi:types in _registerPyTypes is also not in 1.3 currently.
Someone forgot the svn update? :) There isn't much missing, just the xsiannotate() function and the change in "xsi:" prefix handling. Stefan

Hi
Oops. My apologies.
Yeah. Please ignore my recent post, then... Just updated, tests fine on Solaris: TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44240) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 546/546 (100.0%): Doctest: xpathxslt.txt ---------------------------------------------------------------------- Ran 546 tests in 1.136s OK PYTHONPATH=src python2.4 selftest.py 126 tests ok. PYTHONPATH=src python2.4 selftest2.py 88 tests ok. Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Hi Stefan, would you object to adding the ignorance of prefixes in the xsi:type attribute value *lookup* in _lookupElementClass() for 1.3? That way, no internal xsd:-prefixing happened for now, but 1.3 applications would be prepared to make use prefixed xsi:type information, as will be produced by 2.0-apps. Much in the same direction, maybe already add 'xsd': <schema ns> to the default nsmap used in Element() and DataElment()? Have a nice weekend everybody, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

jholg@gmx.de wrote:
You're not gonna let go, are you? :) But I think you're right. I'll patch those two in. I'll need you to check that these things work, though. The test coverage in that area is not very great, since I had to leave out a couple of those big test cases as they wouldn't work without having lxml also generate the prefixes. As you're the one bagging for inclusion, what about writing some more test cases that work with 1.3? (and preferably also with 2.0 ...) Stefan

Hi,
You're not gonna let go, are you? :)
Just found myself trying to imitate things I already know they will be there in the future...
Gonna do that. My first try on running the tests produce a reproduceable core, however: $ make test PYTHON=python2.4 python2.4 setup.py build_ext -i Building lxml version 1.3-44288 running build_ext python2.4 test.py -p -v TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 552/552 (100.0%): Doctest: xpathxslt.txt ---------------------------------------------------------------------- Ran 552 tests in 1.264s OK make: *** [test_inplace] Illegal Instruction (core dumped) $ gdb python2.4 -c core GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.6"... Core was generated by `python2.4 test.py -p -v'. Program terminated with signal 9, Killed. Reading symbols from /usr/lib/libresolv.so.2...done. Reading symbols from /usr/lib/libsocket.so.1...done. Reading symbols from /usr/lib/libnsl.so.1...done. Reading symbols from /usr/lib/librt.so.1...done. Reading symbols from /usr/lib/libdl.so.1...done. Reading symbols from /usr/lib/libpthread.so.1...done. Reading symbols from /usr/lib/libm.so.1...done. Reading symbols from /usr/lib/libc.so.1...done. Reading symbols from /usr/lib/libmp.so.2...done. Reading symbols from /usr/lib/libaio.so.1...done. Reading symbols from /usr/platform/SUNW,Sun-Fire-V440/lib/libc_psr.so.1...done. Reading symbols from /usr/lib/libthread.so.1...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/time.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/itertools.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/_curses.so...done. Reading symbols from /apps/prod/lib/libncurses.so.5...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/strop.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/_bisect.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/_heapq.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/cStringIO.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/math.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/binascii.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/_random.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/fcntl.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/operator.so...done. Reading symbols from /data/pydev/hjoukl/LXML/lxml-1.3/src/lxml/etree.so...done. Reading symbols from /apps/prod/lib/libxslt.so.1...done. Reading symbols from /apps/prod/lib/libexslt.so.0...done. Reading symbols from /apps/prod/lib/libxml2.so.2...done. Reading symbols from /apps/prod/lib/libz.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/struct.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/zlib.so...done. Reading symbols from /data/pydev/hjoukl/LXML/lxml-1.3/src/lxml/objectify.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/_locale.so...done. Reading symbols from /usr/lib/libintl.so.1... warning: Lowest section in /usr/lib/libintl.so.1 is .hash at 0x74 done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/collections.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/readline.so...done. Reading symbols from /apps/prod/lib/libreadline.so...done. #0 0x50e070 in ?? () (gdb) where #0 0x50e070 in ?? () #1 0x576b0 in tupletraverse (o=0x3943cc, visit=0xb6c28 <visit_decref>, arg=0x0) at Objects/tupleobject.c:443 #2 0xb62fc in collect (generation=2) at Modules/gcmodule.c:294 #3 0xb6a68 in PyGC_Collect () at Modules/gcmodule.c:1196 #4 0xadf28 in Py_Finalize () at Python/pythonrun.c:353 #5 0xafefc in Py_Exit (sts=0) at Python/pythonrun.c:1593 #6 0xaeb70 in handle_system_exit () at Python/pythonrun.c:1050 #7 0xaeb9c in PyErr_PrintEx (set_sys_last_vars=1) at Python/pythonrun.c:1060 #8 0xafea8 in PyErr_Print () at Python/pythonrun.c:974 #9 0xae5f8 in PyRun_SimpleFileExFlags (fp=0xffffffff, filename=0xffbef8b6 "test.py", closeit=1, flags=0xffbef5d8) at Python/pythonrun.c:873 #10 0xaf8cc in PyRun_AnyFileExFlags (fp=0x12a938, filename=0xffbef8b6 "test.py", closeit=1, flags=0xffbef5d8) at Python/pythonrun.c:673 #11 0x1e9ec in Py_Main (argc=4, argv=0xffbef75c) at Modules/main.c:493 #12 0x1dfe4 in main (argc=4, argv=0xffbef75c) at ./Modules/python.c:23 (gdb) As the tests themselves seem to run successfully does this hint at some cleanup/destructor problem? Holger -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

jholg@gmx.de wrote:
Yes, I've seen that before and it's reproduceable if it occurs. So far, I have no idea where this might come from. The "tupletraverse" makes it look like a GC crash when collecting Python (!) references, but I really don't see where this might be triggered in lxml. All problems we had so far were related to libxml2 data being double freed and stuff. I'd be happy about any idea how to investigate this any deeper. Stefan

Hi,
For what it's worth, I'm seeing a strange dependency on the *arguments* passed to test.py: No args: Works ============== 0 lb54320@adevp02 .../lxml-1.3 $ PYTHONPATH=src python2.4 test.py TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) ---------------------------------------------------------------------- Ran 552 tests in 0.978s OK 0 lb54320@adevp02 .../lxml-1.3 $ -p only: Works ============== 0 lb54320@adevp02 .../lxml-1.3 $ PYTHONPATH=src python2.4 test.py -p TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 552/552 (100.0%) ---------------------------------------------------------------------- Ran 552 tests in 1.052s OK 0 lb54320@adevp02 .../lxml-1.3 $ -v only: Works ============== 0 lb54320@adevp02 .../lxml-1.3 $ PYTHONPATH=src python2.4 test.py -v TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ ---------------------------------------------------------------------- Ran 552 tests in 1.016s OK 0 lb54320@adevp02 .../lxml-1.3 $ -vv: Core Dump ============== 0 lb54320@adevp02 .../lxml-1.3 $ PYTHONPATH=src python2.4 test.py -vv TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) test_attribute_based_lookup (lxml.tests.test_classlookup.ClassLookupTestCase) ... ok test_custom_lookup (lxml.tests.test_classlookup.ClassLookupTestCase) ... ok test_custom_lookup_ns_fallback (lxml.tests.test_classlookup.ClassLookupTestCase) ... ok test_default_class_lookup (lxml.tests.test_classlookup.ClassLookupTestCase) ... ok [...] test_xslt_shortcut (lxml.tests.test_xslt.ETreeXSLTTestCase) ... ok test_xslt_unicode (lxml.tests.test_xslt.ETreeXSLTTestCase) ... ok test_xslt_utf8 (lxml.tests.test_xslt.ETreeXSLTTestCase) ... ok Doctest: extensions.txt ... ok Doctest: xpathxslt.txt ... ok ---------------------------------------------------------------------- Ran 552 tests in 1.058s OK Illegal Instruction (core dumped) 132 lb54320@adevp02 .../lxml-1.3 $ -p -v: Core Dump ================ 132 lb54320@adevp02 .../lxml-1.3 $ PYTHONPATH=src python2.4 test.py -p -v TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 552/552 (100.0%): Doctest: xpathxslt.txt ---------------------------------------------------------------------- Ran 552 tests in 1.089s OK Illegal Instruction (core dumped) 132 lb54320@adevp02 .../lxml-1.3 $ Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi,
I propose changing DataElement() like this to come even closer to future behaviour, without internal auto-prefixing; this is actually just taken from trunk and left out the internal prefix addition: Index: src/lxml/objectify.pyx =================================================================== --- src/lxml/objectify.pyx (revision 44327) +++ src/lxml/objectify.pyx (working copy) @@ -1664,6 +1664,7 @@ if the type can be identified. If '_pytype' or '_xsi' are among the keyword arguments, they will be used instead. """ + cdef python.PyObject* dict_result if nsmap is None: nsmap = _DEFAULT_NSMAP if attrib is not None: @@ -1671,12 +1672,19 @@ attrib.update(_attributes) _attributes = attrib if _xsi is not None: + if ':' in _xsi: + prefix, name = _xsi.split(':', 1) + ns = nsmap.get(prefix) + if ns != XML_SCHEMA_NS: + raise ValueError, "XSD types require the XSD namespace" python.PyDict_SetItem(_attributes, XML_SCHEMA_INSTANCE_TYPE_ATTR, _xsi) if _pytype is None: - # allow for s.o. using unregistered or even wrong xsi:type names - pytype_lookup = _SCHEMA_TYPE_DICT.get(_xsi) - if pytype_lookup is not None: - _pytype = pytype_lookup.name + # allow using unregistered or even wrong xsi:type names + dict_result = python.PyDict_GetItem(_SCHEMA_TYPE_DICT, _xsi) + if dict_result is NULL: + dict_result = python.PyDict_GetItem(_SCHEMA_TYPE_DICT, name) + if dict_result is not NULL: + _pytype = (<PyType>dict_result).name if python._isString(_value): strval = _value This way, you can use type prefixes in DataElement if you wish, with the lookup being able to use the information:
Note how the <prefix> element gets identified as pytype "long" (as opposed to pytype "int"). It's easy to also remove the check for a given ns-prefix pointing to XML_SCHEMA_NS but I think it's compatible enough to 1.2.1 behaviour. Patch file attached. Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi, some first tests for nsmap arguments to Element() / DataElement() and for prefixed xsi:type attribute values Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi, another small proposed change: Index: src/lxml/objectify.pyx =================================================================== --- src/lxml/objectify.pyx (revision 44363) +++ src/lxml/objectify.pyx (working copy) @@ -759,7 +759,7 @@ return self.__nonzero__() def __checkBool(s): - if s != 'true' and s != 'false': + if s != 'true' and s != 'false' and s != '1' and s != '0': raise ValueError cdef object _strValueOf(obj): Otherwise, annotate(<elt>, ignore_old=False) will revert a correctly pre-annotated boolean element with a text value of "0" and "1", which XML Schema datatypes allows, to py:pytype="int". It still does this for ignore_old=True which is the default. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi, attached some additional testing for the current 1.3 DataElement behaviour, testing unprefixed, correctly-schema-ns-prefixed and erroneously-prefixed _xsi arguments. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

jholg@gmx.de wrote:
attached some additional testing for the current 1.3 DataElement behaviour, testing unprefixed, correctly-schema-ns-prefixed and erroneously-prefixed _xsi arguments.
Say, would you actually be interested in an SVN account for lxml? That would allow you to work more directly on objectify. Stefan

Hi,
Of course I'm interested :) Just brief me on your lxml development policies, to keep the very hiqh quality up. And I think I've found another thing that could be changed: The DataElement() does not treat None arguments the way normal direct assignment does, setting xsi:nil="true". Instead it returns this:
I'll look into this further tomorrow. Regards, Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi,
Find attached a patch that: - changes the above to apply xsi:nil="true" for None value arguments - lets DataElement() graciously handle ObjectifiedDataElement arguments, keeping their attributes intact, if not overridden by the DataElement() args. This also reuses existing xsi:type or py:pytype information, unless _pytype and/or _xsi are provided as parameters to DataElement() Previously, DataElement() cut off all attributes if given an ObjectifiedDataElement instance. - Type-checks the _value against the given type hint:
Tests are included for the described behaviour. Additionally, I've revamped some of the tests I provided earlier and split them up: More but smaller test methods now. Please try it out, if any of the DataElement changes are not ok I can also send only the split-up tests, of course. Btw.: I'm always getting IOError: Error reading file '/data/pydev/hjoukl/LXML/lxml-1.3/src/lxml/tests/test_xinclude.xml': failed to load external entity "/data/pydev/hjoukl/LXML/lxml-1.3/src/lxml/tests/test_xinclude.xml" due to some missing xml file lately when running the tests. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi, just noticed that this seems to have been lost entirely:
Or am I missing something? Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi Holger, jholg@gmx.de wrote:
Find attached a patch that:
- changes the above to apply xsi:nil="true" for None value arguments
Ok.
Ok.
Ok.
Tests are included for the described behaviour.
Cool, thanks.
Additionally, I've revamped some of the tests I provided earlier and split them up: More but smaller test methods now.
That's even better. :)
I moved an XML file to a subdirectory to also test relative references in a base directory. But it should be fixed in the 1.3 release... Stefan

Hi,
Another thing on my 1.3 wish list: The usage of the _DEFAULT_NSMAP in the Element() and DataElement() factories. Also the lazier handling of _xsi user arguments for DataElement(), allowing unregistered (or even plain wrong) xsi type names without raising an exception. Hmm, if this stuff is painful to separate from the QName-awareness for xsi:type attributes (xsd-prefixing), then I am pro putting it all into 1.3. rather than keeping it back for 2.0. After all, this is more of a bugfix: xsi:type handling is broken in a way without the ns-prefixes. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

On Tuesday 12 June 2007 13:50, Stefan Behnel wrote:
Did sourceline make it? :-) Right now z3c.rml depends on lxml trunk, since the lxml 1.3 beta release does not contain the sourceline feature. It would be nice to depend on a release. Regards, Stephan -- Stephan Richter CBU Physics & Chemistry (B.S.) / Tufts Physics (Ph.D. student) Web2k - Web Software Design, Development and Training

On 6/12/07, Stefan Behnel <stefan_ml@behnel.de> wrote:
Builds fine here. There's only one test failure: C:\src\lxml-build\lxml-1.3>python test.py TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44201) libxml used: (2, 6, 28) libxml compiled: (2, 6, 28) libxslt used: (1, 1, 19) libxslt compiled: (1, 1, 19) ====================================================================== FAIL: test_module_HTML_unicode (lxml.tests.test_htmlparser.HtmlParserTestCaseBas e) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\Python24\lib\unittest.py", line 260, in run testMethod() File "C:\src\lxml-build\lxml-1.3\src\lxml\tests\test_htmlparser.py", line 33, in test_module_HTML_unicode self.uhtml_str) File "c:\Python24\lib\unittest.py", line 333, in failUnlessEqual raise self.failureException, \ AssertionError: u'<html><head><title>test \xc3\x83\xc2\xa1\xef\xa3\x92</title></ head><body><h1>page \xc3\x83\xc2\xa1\xef\xa3\x92 title</h1></body></html>' != u' <html><head><title>test \xc3\xa1\uf8d2</title></head><body><h1>page \xc3\xa1\uf8 d2 title</h1></body></html>' ---------------------------------------------------------------------- Ran 727 tests in 2.564s FAILED (failures=1) -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214

Oh, btw, maybe it's time to change the setuptools require() from 0.5 to 0.6c6? I had to install setuptools manually on a machine because it couldn't find 0.5 through the usual means. -- Sidnei da Silva Enfold Systems http://enfoldsystems.com Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214

Hi Stephan,
Note that I tried to avoid behavioural changes, so some 'fixes' might have been left out for that reason. If unsure, it might be worth asking.
I'm missing the deannotate/xsiannotate additions to objectify. Will they not be in 1.3? I was also hoping the xsd:-prefixing of xsi:types would make it into 1.3, although I admit this is a bit of a behavioural change... Apart from that, compiles and tests fine for me on Solaris: $ make test PYTHON=python2.4 python2.4 setup.py build_ext -i Building lxml version 1.3-44219 running build_ext python2.4 test.py -p -v TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44219) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 542/542 (100.0%): Doctest: xpathxslt.txt ---------------------------------------------------------------------- Ran 542 tests in 1.562s OK PYTHONPATH=src python2.4 selftest.py 126 tests ok. PYTHONPATH=src python2.4 selftest2.py 88 tests ok. 0 $ Holger -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

Hi Holger, jholg@gmx.de wrote:
I was thinking about that, but ...
I was also hoping the xsd:-prefixing of xsi:types would make it into 1.3, although I admit this is a bit of a behavioural change...
... I think the two should become available together. Maybe deannotate() makes sense on its own, but xsiannotate is new and would then change its behaviour in 2.0. So I wouldn't mind adding deannotate() to 1.3. BTW, you tend to use the trunk anyway, but I don't think there are many others who already use these features...
Apart from that, compiles and tests fine for me on Solaris:
Great, thanks. Stefan

Hi Stefan,
Not exactly true any more, usually I'm keeping to 1.2.1 now as we finally have a first live application in production, with our framework now fully lxml.objectify-based. I'll have to put up some marketing write-up on what we are using lxml for some of these days... I've just been using trunk recently to check out those nice new features. If nobody uses the features apart from me anyway then I vote for v1.3-inclusion ;-) Seriously: Having deannotate() would be nice. Thanks, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi, just realized that the extended xsi:type-support which registers more xsi:types in _registerPyTypes is also not in 1.3 currently. I think this isn't really a change in behaviour, but rather an extension and some fix (more appropriate mapping of xsi:types to python builtin types). +1 for inclusion in 1.3. Holger
-- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

Hi Holger, jholg@gmx.de wrote:
just realized that the extended xsi:type-support which registers more xsi:types in _registerPyTypes is also not in 1.3 currently.
Someone forgot the svn update? :) There isn't much missing, just the xsiannotate() function and the change in "xsi:" prefix handling. Stefan

Hi
Oops. My apologies.
Yeah. Please ignore my recent post, then... Just updated, tests fine on Solaris: TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44240) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 546/546 (100.0%): Doctest: xpathxslt.txt ---------------------------------------------------------------------- Ran 546 tests in 1.136s OK PYTHONPATH=src python2.4 selftest.py 126 tests ok. PYTHONPATH=src python2.4 selftest2.py 88 tests ok. Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Hi Stefan, would you object to adding the ignorance of prefixes in the xsi:type attribute value *lookup* in _lookupElementClass() for 1.3? That way, no internal xsd:-prefixing happened for now, but 1.3 applications would be prepared to make use prefixed xsi:type information, as will be produced by 2.0-apps. Much in the same direction, maybe already add 'xsd': <schema ns> to the default nsmap used in Element() and DataElment()? Have a nice weekend everybody, Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

jholg@gmx.de wrote:
You're not gonna let go, are you? :) But I think you're right. I'll patch those two in. I'll need you to check that these things work, though. The test coverage in that area is not very great, since I had to leave out a couple of those big test cases as they wouldn't work without having lxml also generate the prefixes. As you're the one bagging for inclusion, what about writing some more test cases that work with 1.3? (and preferably also with 2.0 ...) Stefan

Hi,
You're not gonna let go, are you? :)
Just found myself trying to imitate things I already know they will be there in the future...
Gonna do that. My first try on running the tests produce a reproduceable core, however: $ make test PYTHON=python2.4 python2.4 setup.py build_ext -i Building lxml version 1.3-44288 running build_ext python2.4 test.py -p -v TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 552/552 (100.0%): Doctest: xpathxslt.txt ---------------------------------------------------------------------- Ran 552 tests in 1.264s OK make: *** [test_inplace] Illegal Instruction (core dumped) $ gdb python2.4 -c core GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.6"... Core was generated by `python2.4 test.py -p -v'. Program terminated with signal 9, Killed. Reading symbols from /usr/lib/libresolv.so.2...done. Reading symbols from /usr/lib/libsocket.so.1...done. Reading symbols from /usr/lib/libnsl.so.1...done. Reading symbols from /usr/lib/librt.so.1...done. Reading symbols from /usr/lib/libdl.so.1...done. Reading symbols from /usr/lib/libpthread.so.1...done. Reading symbols from /usr/lib/libm.so.1...done. Reading symbols from /usr/lib/libc.so.1...done. Reading symbols from /usr/lib/libmp.so.2...done. Reading symbols from /usr/lib/libaio.so.1...done. Reading symbols from /usr/platform/SUNW,Sun-Fire-V440/lib/libc_psr.so.1...done. Reading symbols from /usr/lib/libthread.so.1...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/time.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/itertools.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/_curses.so...done. Reading symbols from /apps/prod/lib/libncurses.so.5...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/strop.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/_bisect.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/_heapq.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/cStringIO.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/math.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/binascii.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/_random.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/fcntl.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/operator.so...done. Reading symbols from /data/pydev/hjoukl/LXML/lxml-1.3/src/lxml/etree.so...done. Reading symbols from /apps/prod/lib/libxslt.so.1...done. Reading symbols from /apps/prod/lib/libexslt.so.0...done. Reading symbols from /apps/prod/lib/libxml2.so.2...done. Reading symbols from /apps/prod/lib/libz.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/struct.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/zlib.so...done. Reading symbols from /data/pydev/hjoukl/LXML/lxml-1.3/src/lxml/objectify.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/_locale.so...done. Reading symbols from /usr/lib/libintl.so.1... warning: Lowest section in /usr/lib/libintl.so.1 is .hash at 0x74 done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/collections.so...done. Reading symbols from /apps/prod/lib/python2.4/lib-dynload/readline.so...done. Reading symbols from /apps/prod/lib/libreadline.so...done. #0 0x50e070 in ?? () (gdb) where #0 0x50e070 in ?? () #1 0x576b0 in tupletraverse (o=0x3943cc, visit=0xb6c28 <visit_decref>, arg=0x0) at Objects/tupleobject.c:443 #2 0xb62fc in collect (generation=2) at Modules/gcmodule.c:294 #3 0xb6a68 in PyGC_Collect () at Modules/gcmodule.c:1196 #4 0xadf28 in Py_Finalize () at Python/pythonrun.c:353 #5 0xafefc in Py_Exit (sts=0) at Python/pythonrun.c:1593 #6 0xaeb70 in handle_system_exit () at Python/pythonrun.c:1050 #7 0xaeb9c in PyErr_PrintEx (set_sys_last_vars=1) at Python/pythonrun.c:1060 #8 0xafea8 in PyErr_Print () at Python/pythonrun.c:974 #9 0xae5f8 in PyRun_SimpleFileExFlags (fp=0xffffffff, filename=0xffbef8b6 "test.py", closeit=1, flags=0xffbef5d8) at Python/pythonrun.c:873 #10 0xaf8cc in PyRun_AnyFileExFlags (fp=0x12a938, filename=0xffbef8b6 "test.py", closeit=1, flags=0xffbef5d8) at Python/pythonrun.c:673 #11 0x1e9ec in Py_Main (argc=4, argv=0xffbef75c) at Modules/main.c:493 #12 0x1dfe4 in main (argc=4, argv=0xffbef75c) at ./Modules/python.c:23 (gdb) As the tests themselves seem to run successfully does this hint at some cleanup/destructor problem? Holger -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

jholg@gmx.de wrote:
Yes, I've seen that before and it's reproduceable if it occurs. So far, I have no idea where this might come from. The "tupletraverse" makes it look like a GC crash when collecting Python (!) references, but I really don't see where this might be triggered in lxml. All problems we had so far were related to libxml2 data being double freed and stuff. I'd be happy about any idea how to investigate this any deeper. Stefan

Hi,
For what it's worth, I'm seeing a strange dependency on the *arguments* passed to test.py: No args: Works ============== 0 lb54320@adevp02 .../lxml-1.3 $ PYTHONPATH=src python2.4 test.py TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) ---------------------------------------------------------------------- Ran 552 tests in 0.978s OK 0 lb54320@adevp02 .../lxml-1.3 $ -p only: Works ============== 0 lb54320@adevp02 .../lxml-1.3 $ PYTHONPATH=src python2.4 test.py -p TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 552/552 (100.0%) ---------------------------------------------------------------------- Ran 552 tests in 1.052s OK 0 lb54320@adevp02 .../lxml-1.3 $ -v only: Works ============== 0 lb54320@adevp02 .../lxml-1.3 $ PYTHONPATH=src python2.4 test.py -v TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ ---------------------------------------------------------------------- Ran 552 tests in 1.016s OK 0 lb54320@adevp02 .../lxml-1.3 $ -vv: Core Dump ============== 0 lb54320@adevp02 .../lxml-1.3 $ PYTHONPATH=src python2.4 test.py -vv TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) test_attribute_based_lookup (lxml.tests.test_classlookup.ClassLookupTestCase) ... ok test_custom_lookup (lxml.tests.test_classlookup.ClassLookupTestCase) ... ok test_custom_lookup_ns_fallback (lxml.tests.test_classlookup.ClassLookupTestCase) ... ok test_default_class_lookup (lxml.tests.test_classlookup.ClassLookupTestCase) ... ok [...] test_xslt_shortcut (lxml.tests.test_xslt.ETreeXSLTTestCase) ... ok test_xslt_unicode (lxml.tests.test_xslt.ETreeXSLTTestCase) ... ok test_xslt_utf8 (lxml.tests.test_xslt.ETreeXSLTTestCase) ... ok Doctest: extensions.txt ... ok Doctest: xpathxslt.txt ... ok ---------------------------------------------------------------------- Ran 552 tests in 1.058s OK Illegal Instruction (core dumped) 132 lb54320@adevp02 .../lxml-1.3 $ -p -v: Core Dump ================ 132 lb54320@adevp02 .../lxml-1.3 $ PYTHONPATH=src python2.4 test.py -p -v TESTED VERSION: Python: (2, 4, 4, 'final', 0) lxml.etree: (1, 3, 0, 44288) libxml used: (2, 6, 27) libxml compiled: (2, 6, 27) libxslt used: (1, 1, 20) libxslt compiled: (1, 1, 20) 552/552 (100.0%): Doctest: xpathxslt.txt ---------------------------------------------------------------------- Ran 552 tests in 1.089s OK Illegal Instruction (core dumped) 132 lb54320@adevp02 .../lxml-1.3 $ Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi,
I propose changing DataElement() like this to come even closer to future behaviour, without internal auto-prefixing; this is actually just taken from trunk and left out the internal prefix addition: Index: src/lxml/objectify.pyx =================================================================== --- src/lxml/objectify.pyx (revision 44327) +++ src/lxml/objectify.pyx (working copy) @@ -1664,6 +1664,7 @@ if the type can be identified. If '_pytype' or '_xsi' are among the keyword arguments, they will be used instead. """ + cdef python.PyObject* dict_result if nsmap is None: nsmap = _DEFAULT_NSMAP if attrib is not None: @@ -1671,12 +1672,19 @@ attrib.update(_attributes) _attributes = attrib if _xsi is not None: + if ':' in _xsi: + prefix, name = _xsi.split(':', 1) + ns = nsmap.get(prefix) + if ns != XML_SCHEMA_NS: + raise ValueError, "XSD types require the XSD namespace" python.PyDict_SetItem(_attributes, XML_SCHEMA_INSTANCE_TYPE_ATTR, _xsi) if _pytype is None: - # allow for s.o. using unregistered or even wrong xsi:type names - pytype_lookup = _SCHEMA_TYPE_DICT.get(_xsi) - if pytype_lookup is not None: - _pytype = pytype_lookup.name + # allow using unregistered or even wrong xsi:type names + dict_result = python.PyDict_GetItem(_SCHEMA_TYPE_DICT, _xsi) + if dict_result is NULL: + dict_result = python.PyDict_GetItem(_SCHEMA_TYPE_DICT, name) + if dict_result is not NULL: + _pytype = (<PyType>dict_result).name if python._isString(_value): strval = _value This way, you can use type prefixes in DataElement if you wish, with the lookup being able to use the information:
Note how the <prefix> element gets identified as pytype "long" (as opposed to pytype "int"). It's easy to also remove the check for a given ns-prefix pointing to XML_SCHEMA_NS but I think it's compatible enough to 1.2.1 behaviour. Patch file attached. Holger -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hi, some first tests for nsmap arguments to Element() / DataElement() and for prefixed xsi:type attribute values Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi, another small proposed change: Index: src/lxml/objectify.pyx =================================================================== --- src/lxml/objectify.pyx (revision 44363) +++ src/lxml/objectify.pyx (working copy) @@ -759,7 +759,7 @@ return self.__nonzero__() def __checkBool(s): - if s != 'true' and s != 'false': + if s != 'true' and s != 'false' and s != '1' and s != '0': raise ValueError cdef object _strValueOf(obj): Otherwise, annotate(<elt>, ignore_old=False) will revert a correctly pre-annotated boolean element with a text value of "0" and "1", which XML Schema datatypes allows, to py:pytype="int". It still does this for ignore_old=True which is the default. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi, attached some additional testing for the current 1.3 DataElement behaviour, testing unprefixed, correctly-schema-ns-prefixed and erroneously-prefixed _xsi arguments. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

jholg@gmx.de wrote:
attached some additional testing for the current 1.3 DataElement behaviour, testing unprefixed, correctly-schema-ns-prefixed and erroneously-prefixed _xsi arguments.
Say, would you actually be interested in an SVN account for lxml? That would allow you to work more directly on objectify. Stefan

Hi,
Of course I'm interested :) Just brief me on your lxml development policies, to keep the very hiqh quality up. And I think I've found another thing that could be changed: The DataElement() does not treat None arguments the way normal direct assignment does, setting xsi:nil="true". Instead it returns this:
I'll look into this further tomorrow. Regards, Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi,
Find attached a patch that: - changes the above to apply xsi:nil="true" for None value arguments - lets DataElement() graciously handle ObjectifiedDataElement arguments, keeping their attributes intact, if not overridden by the DataElement() args. This also reuses existing xsi:type or py:pytype information, unless _pytype and/or _xsi are provided as parameters to DataElement() Previously, DataElement() cut off all attributes if given an ObjectifiedDataElement instance. - Type-checks the _value against the given type hint:
Tests are included for the described behaviour. Additionally, I've revamped some of the tests I provided earlier and split them up: More but smaller test methods now. Please try it out, if any of the DataElement changes are not ok I can also send only the split-up tests, of course. Btw.: I'm always getting IOError: Error reading file '/data/pydev/hjoukl/LXML/lxml-1.3/src/lxml/tests/test_xinclude.xml': failed to load external entity "/data/pydev/hjoukl/LXML/lxml-1.3/src/lxml/tests/test_xinclude.xml" due to some missing xml file lately when running the tests. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi, just noticed that this seems to have been lost entirely:
Or am I missing something? Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

Hi Holger, jholg@gmx.de wrote:
Find attached a patch that:
- changes the above to apply xsi:nil="true" for None value arguments
Ok.
Ok.
Ok.
Tests are included for the described behaviour.
Cool, thanks.
Additionally, I've revamped some of the tests I provided earlier and split them up: More but smaller test methods now.
That's even better. :)
I moved an XML file to a subdirectory to also test relative references in a base directory. But it should be fixed in the 1.3 release... Stefan

Hi,
Another thing on my 1.3 wish list: The usage of the _DEFAULT_NSMAP in the Element() and DataElement() factories. Also the lazier handling of _xsi user arguments for DataElement(), allowing unregistered (or even plain wrong) xsi type names without raising an exception. Hmm, if this stuff is painful to separate from the QName-awareness for xsi:type attributes (xsd-prefixing), then I am pro putting it all into 1.3. rather than keeping it back for 2.0. After all, this is more of a bugfix: xsi:type handling is broken in a way without the ns-prefixes. Holger -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

On Tuesday 12 June 2007 13:50, Stefan Behnel wrote:
Did sourceline make it? :-) Right now z3c.rml depends on lxml trunk, since the lxml 1.3 beta release does not contain the sourceline feature. It would be nice to depend on a release. Regards, Stephan -- Stephan Richter CBU Physics & Chemistry (B.S.) / Tufts Physics (Ph.D. student) Web2k - Web Software Design, Development and Training
participants (5)
-
jholg@gmx.de
-
Martijn Faassen
-
Sidnei da Silva
-
Stefan Behnel
-
Stephan Richter