Re: [Python-checkins] CVS: python/dist/src/Lib/xml/dom minidom.py,1.13,1.14

On Tue, Nov 21, 2000 at 02:02:24PM -0800, Fred L. Drake wrote:
Yay! This checkin may fix Bug #116677: "minidom:Node.appendChild() has wrong semantics". Does the patch check for illegal children (bug #116678). --amk

Andrew Kuchling writes:
Yay! This checkin may fix Bug #116677: "minidom:Node.appendChild() has wrong semantics".
Sorry; that would have taken just enough more that I didn't want to get into that today, but it shouldn't be hard. See my comments on the bug page (just added).
Does the patch check for illegal children (bug #116678).
No. There's still seriously little parameter checking in minidom, and I'm not sure I want to add that. One of the problems people had with PyOM was the general weight of the code, and adding all the checks contributes to that (though I suspect the proxies in PyDOM were a far more serious contributor). -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

On Tue, Nov 21, 2000 at 05:16:52PM -0500, Fred L. Drake, Jr. wrote:
Indeed; I think the proxies really obfuscated the code. Some simple parameter checking, though, shouldn't add too much of a burden, and will protect users from common mistakes that will result in invalid trees. --amk

Andrew Kuchling wrote:
Those checks would slow down the original tree building unless we split the interface into "internal" methods that we use ourself and "external methods" that do the extra checking. Worth the effort and extra code complexity? Maybe...maybe not. Paul Prescod

"PP" == Paul Prescod <paulp@ActiveState.com> writes:
PP> Andrew Kuchling wrote:
PP> Those checks would slow down the original tree building unless PP> we split the interface into "internal" methods that we use PP> ourself and "external methods" that do the extra checking. Worth PP> the effort and extra code complexity? Maybe...maybe not. Could those checks be implemented as assertions? If so, people who care about speed can use "python -O" Jeremy

Jeremy Hylton writes:
Could those checks be implemented as assertions? If so, people who care about speed can use "python -O"
Yes, but it is not clear that the checks are expensive. Another issue is compliance with the spec -- DOM level 1 states that certain exceptions will be raised for various conditions, and using assertions to check those would mean that exceptions would *not* be raised in those cases. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

On Tue, Nov 21, 2000 at 05:26:14PM -0500, Jeremy Hylton wrote:
Could those checks be implemented as assertions? If so, people who care about speed can use "python -O"
Making error checks optional and leaving it up to the user to avoid problems... did I get subscribed to the perl5-porters list by mistake? --amk

"AMK" == Andrew Kuchling <akuchlin@mems-exchange.org> writes:
AMK> On Tue, Nov 21, 2000 at 05:26:14PM -0500, Jeremy Hylton wrote:
Could those checks be implemented as assertions? If so, people who care about speed can use "python -O"
AMK> Making error checks optional and leaving it up to the user to AMK> avoid problems... did I get subscribed to the perl5-porters AMK> list by mistake? Not quite what I asked about: Enabling error checks by default, but allowing users to turn them off in optmized mode. If the checks are expensive, which Fred subsequently said he wasn't sure about, this might not be unreasonable. Perhaps I'm just odd when it comes to -O. I've never used it. Jeremy

On Tue, Nov 21, 2000 at 05:26:14PM -0500, Jeremy Hylton wrote:
+1 ... that would be the way to do it. Cheers, -g -- Greg Stein, http://www.lyra.org/

[Andrew Kuchling]
[Paul Prescod]
[Jeremy Hylton]
Could those checks be implemented as assertions? If so, people who care about speed can use "python -O"
[Greg Stein]
+1 ... that would be the way to do it.
-1. User input is never trustworthy. Your and your users' software lives will be a lot happier if you stick to the rule that an assertion failure always (always!) announces a bug in the implementation -- assertion failure is never a user's fault. This makes assertions *possibly* suitable for Paul's hypothesized "internal methods", but never for checking that user-supplied arguments satisfy preconditions. pinning-the-blame-is-90%-of-debugging-and-"assert"-should-pin-it- exactly-ly y'rs - tim

Tim Peters wrote:
So you prefer if __debug__ and node.nodeType!=ELEMENT_TYPE: raise TypeError Unfortunately there's no way to turn that off at "compile time" so you always incur the __debug__ lookup cost. That would send us back to two versions of the methods. Maybe testing would indicate that the performance implications are minor. If so, I wouldn't mind having the type checks in there. Paul Prescod

[Tim, objects to abusing assertions] [Paul Prescod]
I personally prefer if node.nodeType != ELEMENT_TYPE: raise TypeError if that is in fact a correct test of whatever user-input precondition it is you're verifying. An assert would be appropriate if it were "impossible" for the test to fail.
Actually, there is: if __debug__: if node.nodeType != ELEMENT_TYPE: raise TypeError Python produces no code for that block under -O (btw, this is the same mechanism that makes asserts vanish under -O: it's __debug__ that's magic, not asserts). As a user, though, I don't expect -O to turn off argument verification! Same as the Python implementation in these respects: public API functions *always* check their arguments, while some private API functions check only in Debug builds (and via the C library's assert() function, as it's a bug in the implementation if a private API is misused). do-s-right-thing-ly y'rs - tim

Tim Peters wrote:
As a user, I don't expect much argument verification from the Python library at all! C-level verification makes sense because the alternative is core dumps. That's not acceptable. For the rare Python-coded function that DOES do argument verification, I wouldn't have much of an expectation of the affect of "-O" on it because, like Jeremy, I hardly ever use -O. So maybe the argument is not worth having -- if nobody ever uses -O then we should always just balance safety and performance rather than expecting the user to choose one or the other. Paul Prescod

[Paul Prescod]
Why not? I don't see any real difference between a core dump and an uncaught & unexpected Python exception: in either case the program didn't get the job done, and left stuff in an unknown state. Nothing supernaturally evil about a core dump in that respect; if one is unacceptable, so is the other. In the good old days a core dump often crashed the OS too, but that's rare even on Windows now.
I don't care about argument verification except to the extent that it validates preconditions. If the preconditions for using a public function aren't stated, the docs are inadequate (what's the user supposed to do then? guess?). If the preconditions aren't verified, then a user error may lead to an incomprehensible error somewhere in the bowels. Most library functions have preconditions of the form "x is a sequence" or "y supports .sort()", and for such builtin types & operations laziness is usually tolerable because the specific exception raised by accident (in the absence of checking and the presence of a bad argument) says "not a sequence" or "can't be .sort()ed" more or less directly. If you've got fancier preconditions, an accidental exception likely makes no sense at all to the user. Andrew's claim was "some simple parameter checking, though, shouldn't add too much of a burden, and will protect users from common mistakes that will result in invalid trees". I'm arguing both that validation is valuable to the user and that "-O" is a rotten way to turn it off (even if people *did* use it -- and I agree that few ever do). Without any validation first, though, an argument about how to turn it off-- or whether there's even a need to --is at best premature. If Andrew is wrong (that user mistakes aren't common, or that simple checking couldn't protect users in a useful way), I haven't heard anyone say so. if-not-the-rest-is-just-a-question-of-biting-the-bullet-ly y'rs - tim

Tim Peters wrote:
A core dump would kill Zope, PythonWin, Alice etc. An exception does not. To me, that's a big difference. Also, Py_Object type checks are extremely cheap in C code. And once we put in the parameter checks the user will get an unexpected Python exception. Presumably they are not building faulty XML trees on purpose! Anyhow, I am won over despite your unpersuasive argument. I note that minidom will not always give you an exception for a poorly formed tree. That means that the programmer may not find her error until the XML is "out of Python's hands." It should give an exception sooner or later but not never. Paul Prescod

Paul Prescod writes:
I'd like to mention again that there's also a matter of compliance with the specification. The DOM level 1 recommendation includes specific documentation about the exceptions that are raised in certain conditions. Perhaps we should "map" these to more Pythonic exceptions, and perhaps not, but I think the exceptions should be raised when the API specification says they will be. This is an important aspect of compliance, and the XML community has demonstrated substantially more interest in standards compliance than the HTML community ever did; we should reap the benefits and not end up having Python discarded because the standard implementation isn't compliant. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

By definition, isn't minidom supposed to be cheap/easy/quick? The quick answer to a problem? If somebody wants an exception-conforming DOM implementation with all the bells and whistles, then they can go elsewhere. If minidom starts getting loaded up, then it has kind of defeated its purpose, no? Cheers, -g On Tue, Nov 21, 2000 at 08:13:22PM -0800, Paul Prescod wrote:
-- Greg Stein, http://www.lyra.org/

On Tue, Nov 21, 2000 at 10:33:03PM -0800, Greg Stein wrote:
Checking for the correct children should be quite fast; in PyDOM it was basically the line "if newnode.type not in self._LEGAL_CHILDREN_TYPES: raise ...". I don't know about the other minidom bug report, but will try to look into it before too long. --amk Some compilers allow a check during execution that subscripts do not exceed array dimensions. This is a help, but not sufficient. First, many programmers do not use such compilers because "They're not efficient." (Presumably, this means that it is vital to get the wrong answers quickly.) Kernighan and Plauger, in _The Elements of Programming Style_

Greg Stein writes:
I've started a new discussion about this over in the XML SIG; if you're interested, please join in on that list. If you haven't been following it, you might want to check the archives starting with yesterday. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

Andrew Kuchling writes:
Yay! This checkin may fix Bug #116677: "minidom:Node.appendChild() has wrong semantics".
Sorry; that would have taken just enough more that I didn't want to get into that today, but it shouldn't be hard. See my comments on the bug page (just added).
Does the patch check for illegal children (bug #116678).
No. There's still seriously little parameter checking in minidom, and I'm not sure I want to add that. One of the problems people had with PyOM was the general weight of the code, and adding all the checks contributes to that (though I suspect the proxies in PyDOM were a far more serious contributor). -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

On Tue, Nov 21, 2000 at 05:16:52PM -0500, Fred L. Drake, Jr. wrote:
Indeed; I think the proxies really obfuscated the code. Some simple parameter checking, though, shouldn't add too much of a burden, and will protect users from common mistakes that will result in invalid trees. --amk

Andrew Kuchling wrote:
Those checks would slow down the original tree building unless we split the interface into "internal" methods that we use ourself and "external methods" that do the extra checking. Worth the effort and extra code complexity? Maybe...maybe not. Paul Prescod

"PP" == Paul Prescod <paulp@ActiveState.com> writes:
PP> Andrew Kuchling wrote:
PP> Those checks would slow down the original tree building unless PP> we split the interface into "internal" methods that we use PP> ourself and "external methods" that do the extra checking. Worth PP> the effort and extra code complexity? Maybe...maybe not. Could those checks be implemented as assertions? If so, people who care about speed can use "python -O" Jeremy

Jeremy Hylton writes:
Could those checks be implemented as assertions? If so, people who care about speed can use "python -O"
Yes, but it is not clear that the checks are expensive. Another issue is compliance with the spec -- DOM level 1 states that certain exceptions will be raised for various conditions, and using assertions to check those would mean that exceptions would *not* be raised in those cases. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

On Tue, Nov 21, 2000 at 05:26:14PM -0500, Jeremy Hylton wrote:
Could those checks be implemented as assertions? If so, people who care about speed can use "python -O"
Making error checks optional and leaving it up to the user to avoid problems... did I get subscribed to the perl5-porters list by mistake? --amk

"AMK" == Andrew Kuchling <akuchlin@mems-exchange.org> writes:
AMK> On Tue, Nov 21, 2000 at 05:26:14PM -0500, Jeremy Hylton wrote:
Could those checks be implemented as assertions? If so, people who care about speed can use "python -O"
AMK> Making error checks optional and leaving it up to the user to AMK> avoid problems... did I get subscribed to the perl5-porters AMK> list by mistake? Not quite what I asked about: Enabling error checks by default, but allowing users to turn them off in optmized mode. If the checks are expensive, which Fred subsequently said he wasn't sure about, this might not be unreasonable. Perhaps I'm just odd when it comes to -O. I've never used it. Jeremy

On Tue, Nov 21, 2000 at 05:26:14PM -0500, Jeremy Hylton wrote:
+1 ... that would be the way to do it. Cheers, -g -- Greg Stein, http://www.lyra.org/

[Andrew Kuchling]
[Paul Prescod]
[Jeremy Hylton]
Could those checks be implemented as assertions? If so, people who care about speed can use "python -O"
[Greg Stein]
+1 ... that would be the way to do it.
-1. User input is never trustworthy. Your and your users' software lives will be a lot happier if you stick to the rule that an assertion failure always (always!) announces a bug in the implementation -- assertion failure is never a user's fault. This makes assertions *possibly* suitable for Paul's hypothesized "internal methods", but never for checking that user-supplied arguments satisfy preconditions. pinning-the-blame-is-90%-of-debugging-and-"assert"-should-pin-it- exactly-ly y'rs - tim

Tim Peters wrote:
So you prefer if __debug__ and node.nodeType!=ELEMENT_TYPE: raise TypeError Unfortunately there's no way to turn that off at "compile time" so you always incur the __debug__ lookup cost. That would send us back to two versions of the methods. Maybe testing would indicate that the performance implications are minor. If so, I wouldn't mind having the type checks in there. Paul Prescod

[Tim, objects to abusing assertions] [Paul Prescod]
I personally prefer if node.nodeType != ELEMENT_TYPE: raise TypeError if that is in fact a correct test of whatever user-input precondition it is you're verifying. An assert would be appropriate if it were "impossible" for the test to fail.
Actually, there is: if __debug__: if node.nodeType != ELEMENT_TYPE: raise TypeError Python produces no code for that block under -O (btw, this is the same mechanism that makes asserts vanish under -O: it's __debug__ that's magic, not asserts). As a user, though, I don't expect -O to turn off argument verification! Same as the Python implementation in these respects: public API functions *always* check their arguments, while some private API functions check only in Debug builds (and via the C library's assert() function, as it's a bug in the implementation if a private API is misused). do-s-right-thing-ly y'rs - tim

Tim Peters wrote:
As a user, I don't expect much argument verification from the Python library at all! C-level verification makes sense because the alternative is core dumps. That's not acceptable. For the rare Python-coded function that DOES do argument verification, I wouldn't have much of an expectation of the affect of "-O" on it because, like Jeremy, I hardly ever use -O. So maybe the argument is not worth having -- if nobody ever uses -O then we should always just balance safety and performance rather than expecting the user to choose one or the other. Paul Prescod

[Paul Prescod]
Why not? I don't see any real difference between a core dump and an uncaught & unexpected Python exception: in either case the program didn't get the job done, and left stuff in an unknown state. Nothing supernaturally evil about a core dump in that respect; if one is unacceptable, so is the other. In the good old days a core dump often crashed the OS too, but that's rare even on Windows now.
I don't care about argument verification except to the extent that it validates preconditions. If the preconditions for using a public function aren't stated, the docs are inadequate (what's the user supposed to do then? guess?). If the preconditions aren't verified, then a user error may lead to an incomprehensible error somewhere in the bowels. Most library functions have preconditions of the form "x is a sequence" or "y supports .sort()", and for such builtin types & operations laziness is usually tolerable because the specific exception raised by accident (in the absence of checking and the presence of a bad argument) says "not a sequence" or "can't be .sort()ed" more or less directly. If you've got fancier preconditions, an accidental exception likely makes no sense at all to the user. Andrew's claim was "some simple parameter checking, though, shouldn't add too much of a burden, and will protect users from common mistakes that will result in invalid trees". I'm arguing both that validation is valuable to the user and that "-O" is a rotten way to turn it off (even if people *did* use it -- and I agree that few ever do). Without any validation first, though, an argument about how to turn it off-- or whether there's even a need to --is at best premature. If Andrew is wrong (that user mistakes aren't common, or that simple checking couldn't protect users in a useful way), I haven't heard anyone say so. if-not-the-rest-is-just-a-question-of-biting-the-bullet-ly y'rs - tim

Tim Peters wrote:
A core dump would kill Zope, PythonWin, Alice etc. An exception does not. To me, that's a big difference. Also, Py_Object type checks are extremely cheap in C code. And once we put in the parameter checks the user will get an unexpected Python exception. Presumably they are not building faulty XML trees on purpose! Anyhow, I am won over despite your unpersuasive argument. I note that minidom will not always give you an exception for a poorly formed tree. That means that the programmer may not find her error until the XML is "out of Python's hands." It should give an exception sooner or later but not never. Paul Prescod

Paul Prescod writes:
I'd like to mention again that there's also a matter of compliance with the specification. The DOM level 1 recommendation includes specific documentation about the exceptions that are raised in certain conditions. Perhaps we should "map" these to more Pythonic exceptions, and perhaps not, but I think the exceptions should be raised when the API specification says they will be. This is an important aspect of compliance, and the XML community has demonstrated substantially more interest in standards compliance than the HTML community ever did; we should reap the benefits and not end up having Python discarded because the standard implementation isn't compliant. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations

By definition, isn't minidom supposed to be cheap/easy/quick? The quick answer to a problem? If somebody wants an exception-conforming DOM implementation with all the bells and whistles, then they can go elsewhere. If minidom starts getting loaded up, then it has kind of defeated its purpose, no? Cheers, -g On Tue, Nov 21, 2000 at 08:13:22PM -0800, Paul Prescod wrote:
-- Greg Stein, http://www.lyra.org/

On Tue, Nov 21, 2000 at 10:33:03PM -0800, Greg Stein wrote:
Checking for the correct children should be quite fast; in PyDOM it was basically the line "if newnode.type not in self._LEGAL_CHILDREN_TYPES: raise ...". I don't know about the other minidom bug report, but will try to look into it before too long. --amk Some compilers allow a check during execution that subscripts do not exceed array dimensions. This is a help, but not sufficient. First, many programmers do not use such compilers because "They're not efficient." (Presumably, this means that it is vital to get the wrong answers quickly.) Kernighan and Plauger, in _The Elements of Programming Style_

Greg Stein writes:
I've started a new discussion about this over in the XML SIG; if you're interested, please join in on that list. If you haven't been following it, you might want to check the archives starting with yesterday. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations
participants (6)
-
Andrew Kuchling
-
Fred L. Drake, Jr.
-
Greg Stein
-
Jeremy Hylton
-
Paul Prescod
-
Tim Peters