Passing actual read size to urllib reporthook
Patch #849407 proposes to change the meaning of the urllib reporthook so that it takes the amount of the data read instead of the block size as its second argument. While this is a behavior change (and even for explicitly-documented behavior), I still propose to apply the change: - in many cases, the number of bytes read will equal to the block size, so no change should occur - the signature (number of parameters) does not change, so applications shouldn't crash because of that change - applications that do use the parameter to estimate total download time now get a better chance to estimate since they learn about short reads. What do you think? Regards, Martin
On Sun, Nov 19, 2006 at 11:58:37AM +0100, "Martin v. L?wis" wrote:
- the signature (number of parameters) does not change, so applications shouldn't crash because of that change
I am slightly worried about the change in semantics.
- applications that do use the parameter to estimate total download time now get a better chance to estimate since they learn about short reads.
+1 Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Martin v. Löwis wrote:
While this is a behavior change (and even for explicitly-documented behavior), I still propose to apply the change: - in many cases, the number of bytes read will equal to the block size, so no change should occur - the signature (number of parameters) does not change, so applications shouldn't crash because of that change - applications that do use the parameter to estimate total download time now get a better chance to estimate since they learn about short reads.
haven't used the reporthook, but my reading of the documentation would have led me to believe that I should do count*blocksize to determine how much data I've gotten this far. changing the blocksize without setting the count to zero would break such code. </F>
Fredrik Lundh schrieb:
haven't used the reporthook, but my reading of the documentation would have led me to believe that I should do count*blocksize to determine how much data I've gotten this far. changing the blocksize without setting the count to zero would break such code.
Right - such code would break. I believe the code would also break when the count is set to zero; I can't see how this would help. The question is whether this breakage is a strong enough reason not to change the code. Regards, Martin
Is there any reason to assume the data size is ever less than the
block size except for the last data block? It's reading from a
pseudo-file tied to a socket, but Python files tend to have the
property that read(n) returns exactly n bytes unless at EOF.
BTW I left a longer comment at SF earlier.
On 11/20/06, "Martin v. Löwis"
Fredrik Lundh schrieb:
haven't used the reporthook, but my reading of the documentation would have led me to believe that I should do count*blocksize to determine how much data I've gotten this far. changing the blocksize without setting the count to zero would break such code.
Right - such code would break. I believe the code would also break when the count is set to zero; I can't see how this would help.
The question is whether this breakage is a strong enough reason not to change the code.
Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum schrieb:
Is there any reason to assume the data size is ever less than the block size except for the last data block? It's reading from a pseudo-file tied to a socket, but Python files tend to have the property that read(n) returns exactly n bytes unless at EOF.
Right: socket._fileobject will invoke recv as many times as necessary to read the requested amount of data. I was somehow assuming that it maps read() to read(2), which, in turn, would directly map to recv(2), which could return less data. So it's a semantic change only for the last block. Regards, Martin
Hi Martin, On Tue, Nov 21, 2006 at 06:56:20AM +0100, "Martin v. L?wis" wrote:
Right: socket._fileobject will invoke recv as many times as necessary to read the requested amount of data. I was somehow assuming that it maps read() to read(2), which, in turn, would directly map to recv(2), which could return less data.
So it's a semantic change only for the last block.
That means that it would be rather pointless to make the change, right? The original poster's motivation is to get accurate progress during the transfer - but he missed that he already gets that. The proposed change only appears to be relevant together with a hypothetical rewrite of the underlying code, one that would use recv() instead of read(). A bientot, Armin
OK, so let's reject the change.
On 11/21/06, Armin Rigo
Hi Martin,
On Tue, Nov 21, 2006 at 06:56:20AM +0100, "Martin v. L?wis" wrote:
Right: socket._fileobject will invoke recv as many times as necessary to read the requested amount of data. I was somehow assuming that it maps read() to read(2), which, in turn, would directly map to recv(2), which could return less data.
So it's a semantic change only for the last block.
That means that it would be rather pointless to make the change, right? The original poster's motivation is to get accurate progress during the transfer - but he missed that he already gets that.
The proposed change only appears to be relevant together with a hypothetical rewrite of the underlying code, one that would use recv() instead of read().
A bientot,
Armin
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Armin Rigo schrieb:
Hi Martin,
On Tue, Nov 21, 2006 at 06:56:20AM +0100, "Martin v. L?wis" wrote:
Right: socket._fileobject will invoke recv as many times as necessary to read the requested amount of data. I was somehow assuming that it maps read() to read(2), which, in turn, would directly map to recv(2), which could return less data.
So it's a semantic change only for the last block.
That means that it would be rather pointless to make the change, right?
Right; I rejected the patch. Thanks for all your input. Regards, Martin
Fredrik Lundh wrote:
haven't used the reporthook, but my reading of the documentation would have led me to believe that I should do count*blocksize to determine how much data I've gotten this far. changing the blocksize without setting the count to zero would break such code.
</F>
I'm not sure where the error in your reading happened, but I read the docs and got the same thing out of it except that there is no problem with Martin's change. This API doesn't seem to make much sense anyways because who is going to be interested in the count? Fixing the count to one always and setting blocksize to the actual amount of data makes the most sense in recovering this API. The only potential problem is if there is a non-null answer to "who is going to be interested in the count?" -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
participants (6)
-
"Martin v. Löwis"
-
Armin Rigo
-
Fredrik Lundh
-
Guido van Rossum
-
Oleg Broytmann
-
Scott Dial