[lxml-dev] lxml crash
Hello all, I know that the code is wring, but this crash is happening on my FreeBSD 6.1 and Win XP systems, with lxml 0.9.2: [carcass@/home/howe] python Python 2.4.3 (#2, Apr 17 2006, 14:29:19) [GCC 3.4.4 [FreeBSD] 20050518] on freebsd6 Type "help", "copyright", "credits" or "license" for more information.
from lxml import etree a = etree.Element('a') a.append(None) Exception exceptions.TypeError: 'Argument must not be None.' in 'etree._raiseIfNone' ignored zsh: segmentation fault python
It should be more friendly by just displaying the message and not crashing, right ? By the way, is there any way to get the lxml version from code ? Some module '__version__' attribute ? Shouldn't it be there ? Something like: lxml.__version__ -- Best regards, Steve mailto:howe@carcass.dhs.org
Hi Steve, Steve Howe wrote:
I know that the code is wring, but this crash is happening on my FreeBSD 6.1 and Win XP systems, with lxml 0.9.2:
[carcass@/home/howe] python Python 2.4.3 (#2, Apr 17 2006, 14:29:19) [GCC 3.4.4 [FreeBSD] 20050518] on freebsd6 Type "help", "copyright", "credits" or "license" for more information.
from lxml import etree a = etree.Element('a') a.append(None) Exception exceptions.TypeError: 'Argument must not be None.' in 'etree._raiseIfNone' ignored zsh: segmentation fault python
It should be more friendly by just displaying the message and not crashing, right ?
Definitely. Actually, that's already fixed on the trunk due to a different change a while ago. I didn't even know this bug existed, otherwise I would have applied it to the 0.9 branch also... Another thing related to this: In most of its API functions, ElementTree raises an AssertionError on None, while lxml raises a TypeError. I'll change a couple of other places to make it consistent. That breaks ElementTree compatibility a bit more, but I think no one should rely on code raising an AssertionError when wrong argument types are passed...
By the way, is there any way to get the lxml version from code ? Some module '__version__' attribute ? Shouldn't it be there ? Something like:
lxml.__version__
Sure, good idea. Actually, lxml 1.0 will have even more. You can ask it for the versions of libxml2 and libxslt that it was compiled with and that it runs with. All versions are represented as int tuples so that you don't have to parse a string to find out if you're running version X.X or later. But I copied the lxml version string also to __version__, I think that's a sufficiently common place to look for it. Thanks, Stefan
Hi Steve, Steve Howe wrote:
Wednesday, May 17, 2006, 6:10:11 AM, you wrote:
In most of its API functions, ElementTree raises an AssertionError on None, while lxml raises a TypeError. I'll change a couple of other places to make it consistent. That breaks ElementTree compatibility a bit more, but I think no one should rely on code raising an AssertionError when wrong argument types are passed...
Something that could be done to keep compatibility with both models is using a derived exception such as (I know the name is terrible):
class LXMLInvalidArgument(TypeError, AssertionError): pass
I thought about that, too, since we already do that for SyntaxError. However, currently most of these errors are generated from Pyrex itself through the function signatures. And I don't see why we should change that only to provide a non-intuitive exception for a rare case. Note that we have to check the type anyway in most cases (type casts etc.), so we would double the checks for a not very beautiful result.
Or we could ask Fedrik if he intends to change it on ElementTree...
Sure, go, try it. He's working on ET 1.3, so that would be the right time to do it. I'm not very confident he'll like it, though...
Sure, good idea. Actually, lxml 1.0 will have even more. You can ask it for the versions of libxml2 and libxslt that it was compiled with and that it runs with. All versions are represented as int tuples so that you don't have to parse a string to find out if you're running version X.X or later.
One thing to add: I also appended the SVN revision number if available. That makes it possible to distinguish between 'official' versions from source distributions and those generated from an SVN checkout. So, when you compile from the trunk, setup.py will look for the ".svn" directory and build a version string like "0.9.2-27455".
But I copied the lxml version string also to __version__, I think that's a sufficiently common place to look for it.
It's actually in lxml.etree.__version__ now. Copying it to lxml.__version__ would require us to automatically import etree from lxml, which we should not do without reason. Stefan
I thought about that, too, since we already do that for SyntaxError. However, currently most of these errors are generated from Pyrex itself through the function signatures. And I don't see why we should change that only to provide a non-intuitive exception for a rare case. Note that we have to check the type anyway in most cases (type casts etc.), so we would double the checks for a not very beautiful result. Yes, it would not be very nice. Does it happen on other places ? Anyway,
Hello Stefan, Wednesday, May 17, 2006, 2:53:02 PM, you wrote: that should not be so important if it's documented. The best thing to happen would be a ElementTree change. I wonder what's the reason for raising an AssertionError instead of TypeError...
Sure, go, try it. He's working on ET 1.3, so that would be the right time to do it. I'm not very confident he'll like it, though... Me neither... :) He already didn't comment about the unpythonic str(element) behavior on ElementTree.
Sure, good idea. Actually, lxml 1.0 will have even more. You can ask it for the versions of libxml2 and libxslt that it was compiled with and that it runs with. All versions are represented as int tuples so that you don't have to parse a string to find out if you're running version X.X or later.
One thing to add: I also appended the SVN revision number if available. That makes it possible to distinguish between 'official' versions from source distributions and those generated from an SVN checkout. So, when you compile from the trunk, setup.py will look for the ".svn" directory and build a version string like "0.9.2-27455". Oh, great. Even better.
It's actually in lxml.etree.__version__ now. Copying it to lxml.__version__ would require us to automatically import etree from lxml, which we should not do without reason. I would think it's more intuitive to look for that on the topmost module (lxml only), but ElementTree also exports it (but named as VERSION on a submodule:)
from elementtree.ElementTree import * elementtree.ElementTree.VERSION '1.2.6'
Will you want to maintain compatibility with the VERSION attribute aswell ? -- Best regards, Steve mailto:howe@carcass.dhs.org
Steve Howe wrote:
I wonder what's the reason for raising an AssertionError instead of TypeError...
because that's what "assert" raises: http://pyref.infogami.com/assert
Me neither... :) He already didn't comment about the unpythonic str(element) behavior on ElementTree.
there's nothing "unpythonic" about str(element) in ET. an Element is an Element, not a part of an XML file. all XML-specific behaviour is provided by the ElementTree wrapper, and related helpers. </F>
Hello Fredrik, Thursday, May 18, 2006, 6:11:55 PM, you wrote:
I wonder what's the reason for raising an AssertionError instead of TypeError...
because that's what "assert" raises:
I know assert statements raise that, but no other Python function raises AssertionError on wrong parameters, right ? This is debug code. I mentioned an example where TypeError is raised instead of AssertionError. Is there a reason for not following the Python convention ? What other Python functions raise AssertionError instead of TypeError ? See:
float(None) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: float() argument must be a string or a number
And about str(Element): the Python manual says it should "Return a string containing a nicely printable representation of an object", but it is returning the same as repr(). But since the library is yours (and don't take me bad, I love its API), you do as you wish. -- Best regards, Steve mailto:howe@carcass.dhs.org
Steve Howe wrote:
I know assert statements raise that, but no other Python function raises AssertionError on wrong parameters, right ? This is debug code. I mentioned an example where TypeError is raised instead of AssertionError. Is there a reason for not following the Python convention ? What other Python functions raise AssertionError instead of TypeError ?
any function that uses assert, which is the most efficient way to add *optional* assertions to code written in *Python*. if you're writing user code that relies on catching assertion errors, you need to fix your code.
float(None) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: float() argument must be a string or a number
so? why should a Python implementation of a library have to suffer because some built-in function is written in C?
And about str(Element): the Python manual says it should "Return a string containing a nicely printable representation of an object", but it is returning the same as repr().
which is perfectly okay, and perfectly pythonic according to the same documentation (after all, str() maps to repr(), if you don't override things). it's in fact rather unpythonic to use str() for serialization of non- trivial objects. serialization should be explicit, not implicit. </F>
Hi Fredrik, Fredrik Lundh wrote:
Steve Howe wrote:
I know assert statements raise that, but no other Python function raises AssertionError on wrong parameters, right ? This is debug code. I mentioned an example where TypeError is raised instead of AssertionError. Is there a reason for not following the Python convention ? What other Python functions raise AssertionError instead of TypeError ?
any function that uses assert, which is the most efficient way to add *optional* assertions to code written in *Python*. [snip] why should a Python implementation of a library have to suffer because some built-in function is written in C?
Ok, so, if I understand that right, using AssertionError is an internal optimisation. It is only used to allow switching it off for performance reasons. (Note that you can't do that in C code.) So, for exactly the same reason, lxml will continue to raise TypeError for both None values and other invalid argument types. The test is done internally by Pyrex anyway and therefore the cheapest solution. According to your arguments, code that catches assertions is broken anyway, so this is not even an incompatibility. That's good news.
And about str(Element): the Python manual says it should "Return a string containing a nicely printable representation of an object", but it is returning the same as repr().
which is perfectly okay, and perfectly pythonic according to the same documentation (after all, str() maps to repr(), if you don't override things).
I also think that's ok. We already had this discussion and lxml follows ET here. Regards, Stefan
Stefan Behnel wrote:
Ok, so, if I understand that right, using AssertionError is an internal optimisation. It is only used to allow switching it off for performance reasons. (Note that you can't do that in C code.)
well, you can, but I don't think you need to (see below).
So, for exactly the same reason, lxml will continue to raise TypeError for both None values and other invalid argument types. The test is done internally by Pyrex anyway and therefore the cheapest solution.
that's perfectly okay. if not else, we could view lxml as a library that implements the ET interface with assertions disabled -- after all, in general, a piece of code that generates an assertion error if you use it incorrectly is free to do whatever it wants if you disable assertions.
According to your arguments, code that catches assertions is broken anyway, so this is not even an incompatibility.
exactly. (the only problem is doctest-based test suites, but I guess we have to live with that, at least until someone gets around to write an ET validator (or I get around to finish the one I started...)) </F>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Stefan Behnel wrote:
Hi Fredrik,
Fredrik Lundh wrote:
Steve Howe wrote:
I know assert statements raise that, but no other Python function raises AssertionError on wrong parameters, right ? This is debug code. I mentioned an example where TypeError is raised instead of AssertionError. Is there a reason for not following the Python convention ? What other Python functions raise AssertionError instead of TypeError ?
any function that uses assert, which is the most efficient way to add *optional* assertions to code written in *Python*.
[snip]
why should a Python implementation of a library have to suffer because some built-in function is written in C?
Ok, so, if I understand that right, using AssertionError is an internal optimisation. It is only used to allow switching it off for performance reasons. (Note that you can't do that in C code.)
So, for exactly the same reason, lxml will continue to raise TypeError for both None values and other invalid argument types. The test is done internally by Pyrex anyway and therefore the cheapest solution.
According to your arguments, code that catches assertions is broken anyway, so this is not even an incompatibility.
That's good news.
I think the assertion was that catching AssertionError is broken, because the assertion failure is a *programming* error, not a *runtime* error. The TypeErrors raised from Pyrex probably fit the same way. More framework-y code sometimes ends up having to catch such errors, either for backwards compatibility or to allow for things like broken plugins. Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFEbe4l+gerLs4ltQ4RArEKAJ98L2UYoQ/lXXScXx2p3IwXSu5/aQCeNYLG RnMvKP5qOg/GXoPojvqcOPw= =4rgO -----END PGP SIGNATURE-----
Stefan Behnel wrote: [snip]
So, for exactly the same reason, lxml will continue to raise TypeError for both None values and other invalid argument types. The test is done internally by Pyrex anyway and therefore the cheapest solution.
This is why I chose to use TypeError instead of AssertionError originally in lxml - it was the simplest thing to do and came for free. Performance is an incidental happy side effect. :) Regards, Martijn
participants (5)
-
Fredrik Lundh
-
Martijn Faassen
-
Stefan Behnel
-
Steve Howe
-
Tres Seaver