Patches: 1 for the price of 10.

Hello all,
per previous discussion,
http://mail.python.org/pipermail/python-dev/2004-October/049495.html
I'd like to push a trivial little patch to sgmllib (#1087808) on you gents, in exchange for my reviews & effort etc. on 10 other patches.
Without further ado:
No-brainers:
1055159 -- a docstring+docs update to CGIHTTPServer describing already- existing behavior. Recommend apply.
1037974 -- fix HTTP digest auth for broken servers, e.g. LiveJournal. Trivial code fix, should break nothing. Recommend apply + backport.
1028908 -- JJ Lee's updates to urllib2. Passes regr tests, by an original author of much of the code (I think). Recommend apply.
901480 -- patch to urllib2.parse_http_list (bug 735248). Works. Updated patch. Recommend apply + backport.
827559 -- SimpleHTTPServer redirects to 'dir/' when asked for 'dir'. This behavior mimics common behavior online and fixes a problem with relative URLs when clicking on files within 'dir'. Recommend apply.
810023 -- fixes off-by-one bug in urllib reporthook. regr tests & all. Good stuff. Recommend apply + backport.
893642 -- adds allow_none option to SimpleXMLRPCServer & associated classes. Doesn't change default behavior. Recommend apply.
755670 -- modify HTMLParser to accept clearly broken HTML. Recommend reject.
Slightly more complicated:
1067760 -- float-->long conversion on fileobj.seek calls, rather than float-->int. Permits larger floats (2.0**62) to match large int (2**62) arguments. rhettinger marked as "won't fix" in the original bug report; this seems like a clean solution, tho. Recommend apply.
755660 -- should HTMLParser fail on all bad input, or do best effort? I'd recommend more sweeping changes where must-fail situations are distinguished from fails-by-default situations. Alternatively take a stand and say "nein!" once and for all. (See my comment for more information.)
--
For no particularly good reason, all of these were tested against the current CVS HEAD rather than 2.4. All of them should be trivial to backport, although I think only a few are real problems worthy of the effort.
--
I'm kind of curious to see how this goes, I must admit ;). Please CC me on replies so I can listen in...
One comment to Martin: it clearly isn't worth the effort of reviewing 10 patches to push a patch the size of my sgmllib patch. On the other hand, it's nice to have a guarantee & it's an educational experience, that's for sure.
A 5:1 ratio might be more reasonable, since that in practice will mean 1 serious patch, 2 or 3 updates, and 1 drop-dead easy patch.
cheers, --titus

On Sun, 19 Dec 2004 02:13:24 -0800, Titus Brown titus@caltech.edu wrote:
Hello all,
per previous discussion,
http://mail.python.org/pipermail/python-dev/2004-October/049495.html
I'd like to push a trivial little patch to sgmllib (#1087808) on you gents, in exchange for my reviews & effort etc. on 10 other patches.
I got started on these this morning, will likely finish them tomorrow. It would be perverse to apply your patch last, wouldn't it?
Jeremy

Jeremy Hylton wrote:
I got started on these this morning, will likely finish them tomorrow. It would be perverse to apply your patch last, wouldn't it?
It turns out that Titus' patch might be more involved than he thought it would be.
In any case, the review itself is a highly appreciated contribution.
In the hope that this progress can trigger more contributions like this, I'll happily reduce the requirements to 5 reviews.
Regards, Martin

Martin v. Löwis wrote:
Jeremy Hylton wrote:
I got started on these this morning, will likely finish them tomorrow. It would be perverse to apply your patch last, wouldn't it?
It turns out that Titus' patch might be more involved than he thought it would be.
In any case, the review itself is a highly appreciated contribution.
In the hope that this progress can trigger more contributions like this, I'll happily reduce the requirements to 5 reviews.
And to help this along I am willing to officially toss my hat into the fray by being another person who will review a patch if someone does 5 reviews (sans anything that doesn't work on OS X, which leaves out Tkinter).
-Brett

-> Jeremy Hylton wrote: -> >I got started on these this morning, will likely finish them tomorrow. -> > It would be perverse to apply your patch last, wouldn't it? -> -> It turns out that Titus' patch might be more involved than he thought -> it would be.
*shrug* that's life ;). I stole my patch from the other HTMLParser & thought that would be sufficient; now I'll have to fix both!
-> In any case, the review itself is a highly appreciated contribution.
It was very educational; just wish I could remember to always submit context diffs! <sigh>
The only patch that I think deserves some actual discussion - here or on c.l.p, not sure which -- is patch 755660, which deals with HTMLParser.HTMLParser. The goal of the original submitter was to allow subclasses of HTMLParser deal with bad HTML in a more robust way; essentially this comes down to allowing returns from self.error() calls.
I have now come across the same problem in my work with PBP (pbp.berlios.de): it turns out that many Web pages (such as the SourceForge mailman admindb page...) contain errors that cause HTMLParser to raise an exception. It's simply not possible to reliably change this behavior within either htmllib.HTMLParser or HTMLParser.HTMLParser as they're currently written. This is a big problem for people basing packages on either HTMLParser class.
An additional problem is that both HTMLParser.HTMLParser and htmllib.HTMLParser are based on other classes that call self.error(), so those base classes would have to altered to fit the new behavior.
What I proposed doing in my comment on patch 755660 was changing HTMLParser.HTMLParser (and its base class markupbase, too) to call _fail() when a hard failure was called for, and otherwise to call error() and proceed parsing on an as-best-possible basis. This wouldn't change the *behavior* of the existing code, but would allow for it to be overridden when necessary.
Right now the error() call is undocumented and so it's probably ok to change what happens upon return. As is it can leave the parser in a borked state upon return, and that's the behavior I propose to fix.
I'd of course be willing to do the work & submit the more involved patch.
Your opinions are not only welcome but (as I understand it) necessary ;).
cheers, --titus

1067760 -- float-->long conversion on fileobj.seek calls, rather than float-->int. Permits larger floats (2.0**62) to match large int (2**62) arguments. rhettinger marked as "won't fix" in the original bug report; this seems like a clean solution, tho. Recommend apply.
Wouldn't this cause subtle errors when the float -> long conversion is no longer precise? Or is this a non issue because it could only happen when seeking on impossibly large files?

-> > 1067760 -- float-->long conversion on fileobj.seek calls, rather than -> > float-->int. Permits larger floats (2.0**62) to match large -> > int (2**62) arguments. rhettinger marked as "won't fix" in -> > the original bug report; this seems like a clean solution, -> > tho. Recommend apply. -> -> Wouldn't this cause subtle errors when the float -> long conversion is -> no longer precise? Or is this a non issue because it could only happen -> when seeking on impossibly large files?
When would the float --> long conversion not be precise? The docs say PyLong_FromDouble takes the integer part, so it's like doing a floor(), yes?
The current behavior is to convert directly from float to int, even though seek() can take longs as an argument. Thus 2.0**31 works, 2**31 works, 2**62 works, but 2.0**62 doesn't. This seems mildly inconsistent and prompted the guy to submit a bug report, and then a patch.
The patch (with a bit more code context) is:
-- if (!PyArg_ParseTuple(args, "O|i:seek", &offobj, &whence)) return NULL;
#if !defined(HAVE_LARGEFILE_SUPPORT) offset = PyInt_AsLong(offobj); #else + if(PyFloat_Check(offobj)) { + offobj = PyLong_FromDouble(PyFloat_AsDouble(offobj)); + } offset = PyLong_Check(offobj) ? PyLong_AsLongLong(offobj) : PyInt_AsLong(offobj); #endif --
so the issue only comes into play with large file support.
Heh, and I just noticed that PyLong_FromDouble returns a new reference, so this is a memory leak, I believe. Whoooops. I'll fix that.
cheers, --titus

Titus Brown titus@caltech.edu wrote:
-> > 1067760 -- float-->long conversion on fileobj.seek calls, rather than -> > float-->int. Permits larger floats (2.0**62) to match large -> > int (2**62) arguments. rhettinger marked as "won't fix" in -> > the original bug report; this seems like a clean solution, -> > tho. Recommend apply. -> -> Wouldn't this cause subtle errors when the float -> long conversion is -> no longer precise? Or is this a non issue because it could only happen -> when seeking on impossibly large files?
When would the float --> long conversion not be precise? The docs say PyLong_FromDouble takes the integer part, so it's like doing a floor(), yes?
Precision.
2.0**52
4503599627370496.0
2.0**52+1
4503599627370497.0
2.0**53
9007199254740992.0
2.0**53+1
9007199254740992.0
Anything above float(2**53-1) (one must be careful about the order of float conversions) are basically useless for offsets into files.
The current behavior is to convert directly from float to int, even though seek() can take longs as an argument. Thus 2.0**31 works, 2**31 works, 2**62 works, but 2.0**62 doesn't. This seems mildly inconsistent and prompted the guy to submit a bug report, and then a patch.
"works" and "doesn't work" aren't precise. "doesn't raise an exception" and "does raise an exception" are precise. Note the above float precision precision example about why even non-exceptions may not "work".
- Josiah

Titus Brown titus@caltech.edu writes:
-> > 1067760 -- float-->long conversion on fileobj.seek calls, rather than -> > float-->int. Permits larger floats (2.0**62) to match large -> > int (2**62) arguments. rhettinger marked as "won't fix" in -> > the original bug report; this seems like a clean solution, -> > tho. Recommend apply. -> -> Wouldn't this cause subtle errors when the float -> long conversion is -> no longer precise? Or is this a non issue because it could only happen -> when seeking on impossibly large files?
When would the float --> long conversion not be precise?
When the float is so large that it is the closest approximation to more than one integer? (i.e. larger than 2**53 for 754 doubles).
Cheers, mwh

On Dec 23, 2004, at 12:36 AM, Timothy Fitz wrote:
1067760 -- float-->long conversion on fileobj.seek calls, rather than float-->int. Permits larger floats (2.0**62) to match large int (2**62) arguments. rhettinger marked as "won't fix" in the original bug report; this seems like a clean solution, tho. Recommend apply.
Wouldn't this cause subtle errors when the float -> long conversion is no longer precise? Or is this a non issue because it could only happen when seeking on impossibly large files?
I think that Raymond marked as "won't fix" because automatic float -> integer conversion has been deprecated since Python 2.3.0 (if not earlier), for exactly the reason you state. It's dumb.
range(2.7)
__main__:1: DeprecationWarning: integer argument expected, got float [0, 1]
Apparently file.seek doesn't have this DeprecationWarning though.. Strange, that.
f.seek(3.6) f.tell()
3L
-bob

-> > Apparently file.seek doesn't have this DeprecationWarning though.. -> > Strange, that. -> > >>> f.seek(3.6) -> > >>> f.tell() -> > 3L -> -> That's a bug. Who'll fix it?
Added:
+ if (PyFloat_Check(offobj)) + PyErr_Warn(PyExc_DeprecationWarning, + "integer argument expected, got float" );
see attached diff. I also attached it to the patch at SourceForge, with a comment, just to keep the record straight.
The DeprecationWarning wasn't given because a manual PyObject --> int/long conversion was being done, rather than the usual conversion implicit in ParseTuple.
Regression tests run w/o problem on RH 9.0/i386. Now you get the correct behavior:
--
f = open('/dev/zero') f.seek(5) f.tell()
5L
f.seek(5.2)
__main__:1: DeprecationWarning: integer argument expected, got float
f.tell()
5L --
This seems like a reasonable resolution to patch #1067760, to me...
cheers, --titus

Titus Brown titus@caltech.edu writes:
-> > Apparently file.seek doesn't have this DeprecationWarning though.. -> > Strange, that. -> > >>> f.seek(3.6) -> > >>> f.tell() -> > 3L -> -> That's a bug. Who'll fix it?
Added:
if (PyFloat_Check(offobj))
PyErr_Warn(PyExc_DeprecationWarning,
"integer argument expected, got float" );
see attached diff. I also attached it to the patch at SourceForge, with a comment, just to keep the record straight.
You should check the return value of PyErr_Warn in case the user passed -Werror on the command line.
This seems like a reasonable resolution to patch #1067760, to me...
I guess I could look at that -- but when I'm not on dialup at my parents' house...
Cheers, mwh
participants (9)
-
"Martin v. Löwis"
-
Bob Ippolito
-
Brett C.
-
Guido van Rossum
-
Jeremy Hylton
-
Josiah Carlson
-
Michael Hudson
-
Timothy Fitz
-
Titus Brown