[lxml-dev] lxml 1.1 problems with python 2.3

Hi there, On Python 2.3 this segfaults with lxml 1.1 (it works with lxml 1.0):
from lxml import etree etree.parse('sfadfdfd')
On python 2.4 we get an error as we should (as the file sfafdfd doesn't exist). Additionally, the tests don't work anymore under Python 2.3. For lxml 1.1 some dependencies on Python 2.4's doctest module exist that don't work on Python 2.3, probably because we dropped the custom doctest that I added initially. For lxml 1.0 this is less bad, but there are still some dependencies on 'sorted()' and such in the tests. I don't think we actually ever explictly dropped support for Python 2.3. Perhaps we should for a particular version of lxml, but it'd be nice if we could track down this bug. It might indicate something wrong in Python 2.4 that just doesn't show up right away, I don't know. Regards, Martijn

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martijn Faassen wrote:
Maybe we should return the custom doctest and wire it in via a conditional import, e.g.:: try: import doctest except ImportError: # Python < 2.4 from lxml.bbb import doctest Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFZywb+gerLs4ltQ4RAsE1AJ9NAx70MKNtcFVkIo0/Tm574XbxvgCfe2iS WStbWOKepYWKu7N4KT8huA8= =eIAB -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Stefan Behnel wrote:
I'm attaching a patch which "fixes" this, assuming that we put the non-standard-for-Python-2.3 doctest.py into a new 'lxml.bbb' package. It still segfaults, but the tests do import from the bbb module. Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFaHLP+gerLs4ltQ4RAgicAJ999DxRyl4qTkDicTFE677vkbna3QCfXMRt Bw1coTGxCWzmcsd/AqI5vAE= =D/Dz -----END PGP SIGNATURE-----

Stefan Behnel wrote:
Did you forget to check in local_doctest.py? I can't make the trunk's tests work on Python 2.3. I tried copying over Python 2.4's doctest.py into src/lxml/local_doctests.py Things then fail with what looks like a new, unrelated issue: Traceback (most recent call last): File "test.py", line 591, in ? exitcode = main(sys.argv) File "test.py", line 554, in main test_cases = get_test_cases(test_files, cfg, tracer=tracer) File "test.py", line 254, in get_test_cases module = import_module(file, cfg, tracer=tracer) File "test.py", line 197, in import_module mod = __import__(modname) File "/home/faassen/working/lxml/lxml-trunk/src/lxml/tests/test_objectify.py", line 16, in ? from lxml import objectify ImportError: /home/faassen/working/lxml/lxml-trunk/src/lxml/objectify.so: undefined symbol: previousElement everything works fine in Python 2.4 though so this is rather mysterious. Regards, Martijn

Hi, Martijn Faassen wrote:
Did you forget to check in local_doctest.py?
No, it's right in src/, revision 35023. It's a copy of the one you put into revision 8449. Maybe it's just not found in the PYTHONPATH? (Well, test.py should do that for us, right?)
That's rather bizarre, previousElement is definitely a public function (i.e. defined in etree.so). I have no idea how that could be missing. Stefan

Hey Stefan, Stefan Behnel wrote:
Stupid of me not to see it earlier, but that's because it's trying to import from lxl.local_doctest and you added it as local_doctest. I fixed the import not to import from the lxml namespace anymore and checked it in. That still leaves the next error.
It's consistently missing though in Python 2.3. Perhaps it accidentally gets turned off together with thread support? I did try to test this theory yesterday though on Python 2.4 by explicitly disabling tests, and that didn't help. Regards, Martijn

Hi Martijn, Martijn Faassen wrote:
Stupid of me not to see it earlier, but that's because it's trying to import from lxl.local_doctest and you added it as local_doctest.
Ah, stupid me then. :)
Ok, then, first thing to check: does "previousElement" turn up as a static function in the generated src/lxml/etree.h? Could you check what the preprocessor sees in objectify.c (gcc -E)? On my side (Py 2.5), it sees the following: ----------------------- ... static xmlNode (*((*nextElement)(xmlNode (*)))); static xmlNode (*((*previousElement)(xmlNode (*)))); ... {"nextElement", &nextElement}, {"previousElement", &previousElement}, ... __pyx_v_next = nextElement; ... __pyx_v_next = previousElement; ... ----------------------- I'm showing both functions here, as both are used in objectify, but only the second seems to be missing according to your report. If this looks the same on your side, I'm really out of ideas. Stefan

Stefan Behnel wrote: [snip]
The only reference to previousElement (and nextElement) in etree.h are here: extern DL_IMPORT(xmlNode) (*(nextElement(xmlNode (*)))); extern DL_IMPORT(xmlNode) (*(previousElement(xmlNode (*))));
Could you check what the preprocessor sees in objectify.c (gcc -E)?
Hm, I wasn't previously familiar with gcc -E. I tried running it against objectify.c but got a lot of missing includes for Python and libxml2 (which is odd as these things are in /usr/include). I'm not quite sure how you generate your output, but here's my reference to previousElement when I do gcc -E: extern DL_IMPORT(xmlNode) (*(nextElement(xmlNode (*)))); extern DL_IMPORT(xmlNode) (*(previousElement(xmlNode (*)))); ... __pyx_v_next = nextElement; ... __pyx_v_next = previousElement; ... Hm, is it possible I'm using the wrong version of Pyrex? I have lxml's version installed for Python 2.4 but I guess I don't have that one for Python 2.3... Us having to maintain our own version of Pyrex rather sucks. I just installed lxml's version of Pyrex, and now the tests start. We still get some failures, though. Most of them are because 'assertFalse' doesn't appear to exist. I added this to HelperTestCase and made those errors go away. There's also the use of operator.itemgetter, which was only introduced in Python 2.4. I hacked up a simplistic implementation too. Now we're down to one failure in Python 2.3: ====================================================================== FAIL: test_findall (lxml.tests.test_objectify.ObjectifyTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/faassen/working/lxml/src/lxml/tests/test_objectify.py", line 218, in test_findall root.getchildren()[:2]) File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual raise self.failureException, \ AssertionError: [<Element b at b787f0cc>, ''] != [<Element b at b787f0cc>, ''] You'd think that this *should* be equal and thus succeed. Possibly some rich comparison feature that doesn't exist yet in Python 2.3? Back to you, Stephan. :) Regards, Martijn

Hi Martijn, Martijn Faassen wrote:
Hm, I wasn't previously familiar with gcc -E. I tried running it against objectify.c but got a lot of missing includes for Python and libxml2
You can use the same command line that distutils use to compile the module, except for the "-c xxx.so" part.
Us having to maintain our own version of Pyrex rather sucks.
Sure, but it's currently not that easy to push things upstream back into Pyrex. Maybe Greg manages to get some work done over Christmas.
I just installed lxml's version of Pyrex, and now the tests start.
Ah, finally. :)
Or maybe just works differently. That was a bad test case anyway, as equality of objectified elements is not really well defined in general. It can be type specific, which might be the problem here already. I changed that to an identity test, so it should work now. Stefan

Hi Martijn, Martijn Faassen wrote:
That could be the same problem that Holger found when testing with Python 2.3 on Solaris. Maybe it's a bug in the thread handling of Python 2.3, which would make it a problem limited to lxml 1.1. Could you post a valgrind trace for this?
Most likely, yes.
For lxml 1.0 this is less bad, but there are still some dependencies on 'sorted()' and such in the tests.
Ah, sure, must have been me who added them. Maybe I should just install a 32bit Python 2.3 on my machine, to do my own tests before releasing...
I don't think we actually ever explictly dropped support for Python 2.3. Perhaps we should for a particular version of lxml
Hmm, it would definitely be easiest to do that for lxml 1.1. ;)
I don't think so. At least in Holger's stack traces, the segfault occurred in pthreads when releasing the Python thread context, so it can't really be a problem in lxml itself. One way to work around this kind of problem would be to not release the thread context under 2.3. That should be simple to do, if we know the right places where we have to do this. Stefan

Hi all,
context under 2.3. That should be simple to do, if we know the right
Just my -1 for dropping python2.3 support. When used in an enterprise production context you will just not switch tested versions quick unless there is some real need to (new functionality you desparately want/dependancy of a module you need) or you have time to spare (doesn't happen that often). It's especially time-consuming when you depend on C- or C++-extensions. As for the dependency of some module, it might well be the case that someone would rather choose not use lxml than upgrade python. thread places
where we have to do this.
Stefan
Better imho, so 2.3 users can still depend on lxml. Regards, Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi, Holger Joukl wrote:
;) I just said it would be the *easiest* solution. I agree that it's worth keeping 2.3 compatibility as long as we can. There were no major changes in the C-API since that version that would prevent us from doing so.
I'll try to come up with a fix then. Maybe it's enough to somehow disable the thread context calls to make lxml run single-threaded under 2.3. I'll have to rely on someone else to test it, though. Stefan

Hi Holger, Martijn, Stefan Behnel wrote:
Ok, I committed this simple patch to the trunk that simply skips releasing and re-acquiring the thread contexts under Python 2.3. I tried switching it on under 2.5 and didn't find any problems in the tests, so please check if it works on your side with 2.3, too. If this works as expected, this would also give us a straight forward way to compile lxml without threading by passing an option (--without-threading) to setup.py and switching on the code section below via a compiler define. Stefan Index: src/lxml/etree_defs.h =================================================================== --- src/lxml/etree_defs.h (Revision 35078) +++ src/lxml/etree_defs.h (Arbeitskopie) @@ -16,6 +16,20 @@ #endif #endif +/* Threading can crash under Python 2.3 */ +#if PY_VERSION_HEX < 0x02040000 +#ifndef WITHOUT_THREADING + #define WITHOUT_THREADING +#endif +#endif + +#ifdef WITHOUT_THREADING + #define PyEval_SaveThread() (NULL) + #define PyEval_RestoreThread(state) + #define PyGILState_Ensure() (PyGILState_UNLOCKED) + #define PyGILState_Release(state) +#endif + /* libxml2 version specific setup */ #include "libxml/xmlversion.h" #if LIBXML_VERSION < 20621

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martijn Faassen wrote:
Maybe we should return the custom doctest and wire it in via a conditional import, e.g.:: try: import doctest except ImportError: # Python < 2.4 from lxml.bbb import doctest Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFZywb+gerLs4ltQ4RAsE1AJ9NAx70MKNtcFVkIo0/Tm574XbxvgCfe2iS WStbWOKepYWKu7N4KT8huA8= =eIAB -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Stefan Behnel wrote:
I'm attaching a patch which "fixes" this, assuming that we put the non-standard-for-Python-2.3 doctest.py into a new 'lxml.bbb' package. It still segfaults, but the tests do import from the bbb module. Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFaHLP+gerLs4ltQ4RAgicAJ999DxRyl4qTkDicTFE677vkbna3QCfXMRt Bw1coTGxCWzmcsd/AqI5vAE= =D/Dz -----END PGP SIGNATURE-----

Stefan Behnel wrote:
Did you forget to check in local_doctest.py? I can't make the trunk's tests work on Python 2.3. I tried copying over Python 2.4's doctest.py into src/lxml/local_doctests.py Things then fail with what looks like a new, unrelated issue: Traceback (most recent call last): File "test.py", line 591, in ? exitcode = main(sys.argv) File "test.py", line 554, in main test_cases = get_test_cases(test_files, cfg, tracer=tracer) File "test.py", line 254, in get_test_cases module = import_module(file, cfg, tracer=tracer) File "test.py", line 197, in import_module mod = __import__(modname) File "/home/faassen/working/lxml/lxml-trunk/src/lxml/tests/test_objectify.py", line 16, in ? from lxml import objectify ImportError: /home/faassen/working/lxml/lxml-trunk/src/lxml/objectify.so: undefined symbol: previousElement everything works fine in Python 2.4 though so this is rather mysterious. Regards, Martijn

Hi, Martijn Faassen wrote:
Did you forget to check in local_doctest.py?
No, it's right in src/, revision 35023. It's a copy of the one you put into revision 8449. Maybe it's just not found in the PYTHONPATH? (Well, test.py should do that for us, right?)
That's rather bizarre, previousElement is definitely a public function (i.e. defined in etree.so). I have no idea how that could be missing. Stefan

Hey Stefan, Stefan Behnel wrote:
Stupid of me not to see it earlier, but that's because it's trying to import from lxl.local_doctest and you added it as local_doctest. I fixed the import not to import from the lxml namespace anymore and checked it in. That still leaves the next error.
It's consistently missing though in Python 2.3. Perhaps it accidentally gets turned off together with thread support? I did try to test this theory yesterday though on Python 2.4 by explicitly disabling tests, and that didn't help. Regards, Martijn

Hi Martijn, Martijn Faassen wrote:
Stupid of me not to see it earlier, but that's because it's trying to import from lxl.local_doctest and you added it as local_doctest.
Ah, stupid me then. :)
Ok, then, first thing to check: does "previousElement" turn up as a static function in the generated src/lxml/etree.h? Could you check what the preprocessor sees in objectify.c (gcc -E)? On my side (Py 2.5), it sees the following: ----------------------- ... static xmlNode (*((*nextElement)(xmlNode (*)))); static xmlNode (*((*previousElement)(xmlNode (*)))); ... {"nextElement", &nextElement}, {"previousElement", &previousElement}, ... __pyx_v_next = nextElement; ... __pyx_v_next = previousElement; ... ----------------------- I'm showing both functions here, as both are used in objectify, but only the second seems to be missing according to your report. If this looks the same on your side, I'm really out of ideas. Stefan

Stefan Behnel wrote: [snip]
The only reference to previousElement (and nextElement) in etree.h are here: extern DL_IMPORT(xmlNode) (*(nextElement(xmlNode (*)))); extern DL_IMPORT(xmlNode) (*(previousElement(xmlNode (*))));
Could you check what the preprocessor sees in objectify.c (gcc -E)?
Hm, I wasn't previously familiar with gcc -E. I tried running it against objectify.c but got a lot of missing includes for Python and libxml2 (which is odd as these things are in /usr/include). I'm not quite sure how you generate your output, but here's my reference to previousElement when I do gcc -E: extern DL_IMPORT(xmlNode) (*(nextElement(xmlNode (*)))); extern DL_IMPORT(xmlNode) (*(previousElement(xmlNode (*)))); ... __pyx_v_next = nextElement; ... __pyx_v_next = previousElement; ... Hm, is it possible I'm using the wrong version of Pyrex? I have lxml's version installed for Python 2.4 but I guess I don't have that one for Python 2.3... Us having to maintain our own version of Pyrex rather sucks. I just installed lxml's version of Pyrex, and now the tests start. We still get some failures, though. Most of them are because 'assertFalse' doesn't appear to exist. I added this to HelperTestCase and made those errors go away. There's also the use of operator.itemgetter, which was only introduced in Python 2.4. I hacked up a simplistic implementation too. Now we're down to one failure in Python 2.3: ====================================================================== FAIL: test_findall (lxml.tests.test_objectify.ObjectifyTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/faassen/working/lxml/src/lxml/tests/test_objectify.py", line 218, in test_findall root.getchildren()[:2]) File "/usr/lib/python2.3/unittest.py", line 302, in failUnlessEqual raise self.failureException, \ AssertionError: [<Element b at b787f0cc>, ''] != [<Element b at b787f0cc>, ''] You'd think that this *should* be equal and thus succeed. Possibly some rich comparison feature that doesn't exist yet in Python 2.3? Back to you, Stephan. :) Regards, Martijn

Hi Martijn, Martijn Faassen wrote:
Hm, I wasn't previously familiar with gcc -E. I tried running it against objectify.c but got a lot of missing includes for Python and libxml2
You can use the same command line that distutils use to compile the module, except for the "-c xxx.so" part.
Us having to maintain our own version of Pyrex rather sucks.
Sure, but it's currently not that easy to push things upstream back into Pyrex. Maybe Greg manages to get some work done over Christmas.
I just installed lxml's version of Pyrex, and now the tests start.
Ah, finally. :)
Or maybe just works differently. That was a bad test case anyway, as equality of objectified elements is not really well defined in general. It can be type specific, which might be the problem here already. I changed that to an identity test, so it should work now. Stefan

Hi Martijn, Martijn Faassen wrote:
That could be the same problem that Holger found when testing with Python 2.3 on Solaris. Maybe it's a bug in the thread handling of Python 2.3, which would make it a problem limited to lxml 1.1. Could you post a valgrind trace for this?
Most likely, yes.
For lxml 1.0 this is less bad, but there are still some dependencies on 'sorted()' and such in the tests.
Ah, sure, must have been me who added them. Maybe I should just install a 32bit Python 2.3 on my machine, to do my own tests before releasing...
I don't think we actually ever explictly dropped support for Python 2.3. Perhaps we should for a particular version of lxml
Hmm, it would definitely be easiest to do that for lxml 1.1. ;)
I don't think so. At least in Holger's stack traces, the segfault occurred in pthreads when releasing the Python thread context, so it can't really be a problem in lxml itself. One way to work around this kind of problem would be to not release the thread context under 2.3. That should be simple to do, if we know the right places where we have to do this. Stefan

Hi all,
context under 2.3. That should be simple to do, if we know the right
Just my -1 for dropping python2.3 support. When used in an enterprise production context you will just not switch tested versions quick unless there is some real need to (new functionality you desparately want/dependancy of a module you need) or you have time to spare (doesn't happen that often). It's especially time-consuming when you depend on C- or C++-extensions. As for the dependency of some module, it might well be the case that someone would rather choose not use lxml than upgrade python. thread places
where we have to do this.
Stefan
Better imho, so 2.3 users can still depend on lxml. Regards, Holger Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte den Inhalt der E-Mail als Hardcopy an. The contents of this e-mail are confidential. If you are not the named addressee or if this transmission has been addressed to you in error, please notify the sender immediately and then delete this e-mail. Any unauthorized copying and transmission is forbidden. E-Mail transmission cannot be guaranteed to be secure. If verification is required, please request a hard copy version.

Hi, Holger Joukl wrote:
;) I just said it would be the *easiest* solution. I agree that it's worth keeping 2.3 compatibility as long as we can. There were no major changes in the C-API since that version that would prevent us from doing so.
I'll try to come up with a fix then. Maybe it's enough to somehow disable the thread context calls to make lxml run single-threaded under 2.3. I'll have to rely on someone else to test it, though. Stefan

Hi Holger, Martijn, Stefan Behnel wrote:
Ok, I committed this simple patch to the trunk that simply skips releasing and re-acquiring the thread contexts under Python 2.3. I tried switching it on under 2.5 and didn't find any problems in the tests, so please check if it works on your side with 2.3, too. If this works as expected, this would also give us a straight forward way to compile lxml without threading by passing an option (--without-threading) to setup.py and switching on the code section below via a compiler define. Stefan Index: src/lxml/etree_defs.h =================================================================== --- src/lxml/etree_defs.h (Revision 35078) +++ src/lxml/etree_defs.h (Arbeitskopie) @@ -16,6 +16,20 @@ #endif #endif +/* Threading can crash under Python 2.3 */ +#if PY_VERSION_HEX < 0x02040000 +#ifndef WITHOUT_THREADING + #define WITHOUT_THREADING +#endif +#endif + +#ifdef WITHOUT_THREADING + #define PyEval_SaveThread() (NULL) + #define PyEval_RestoreThread(state) + #define PyGILState_Ensure() (PyGILState_UNLOCKED) + #define PyGILState_Release(state) +#endif + /* libxml2 version specific setup */ #include "libxml/xmlversion.h" #if LIBXML_VERSION < 20621
participants (4)
-
Holger Joukl
-
Martijn Faassen
-
Stefan Behnel
-
Tres Seaver