[Python-Dev] Asking for feedback about fixing `ftplib' (issue25458)

Ivan Pozdeev vano at mail.mipt.ru
Wed Dec 28 09:57:12 EST 2016

Re: [Python-Dev] Asking for feedback about fixing `ftplib' (issue25458) 
On 25.12.2016 17:21, Giampaolo Rodola' wrote:
> From what I understand the problem should be fixed in urllib so that 
> it always closes the FTP connection. I understand this is what happens 
> in recent 3.x versions but not on 2.7.
urllib in 2.7 should indeed be fixed to close FTP connections rather 
than leave them dangling, no question about that.
(ftpcache probably has to go in all branches as a result since it's now 
useless - but that's another issue entirely)

The question is, in both 2.x and 3.x, urllib currently doesn't close the 
control connection gracefully. It just closes the socket immediately 
after reading all data on the data connection - without issuing QUIT or 
even reading the end-of-transfer response.
By doing this, it gets away with not calling voidresp().
(RFC 939 specifies that the server should execute the equivalent of ABOR 
and QUIT upon unexpected close of control connection.)
> As for ftplib, I don't like the solution of using a 
> `transfer_in_progress` flag in all commands. I see that as a cheap way 
> to work around the fact that the user may forget to call voidresp() 
> and close the socket after nt/transfercmd(). It probably makes sense 
> to just update nt/transfercmd() doc instead and make that clear.
I tried to fix urllib so that it's guaranteed to call voidresp(), and 
failed. Too complicated, effectively reimplementing a sizable part of 
FTP client logic is required, perhaps even monkey-patching ftplib to be 
able to set flags in the right places. That's why I thought that just 
requiring the user to call it themselves and calling it a day would be 
asking too much from users and the easy way out - ftplib currently dumps 
too much work on its users that it ought to do itself instead.

The fact that in FTP, one cannot send another command before the 
previous one is complete (or rather, one can, but the server won't 
respond to it until then) means that FTP is inherently stateful. So, 
ftplib, to be a usable library, needs to either encapsulate this 
statefulness, or provide building blocks with all the stock logic parts 
so that only truly application-specific logic has to be added to make a 
robust solution.

With simple commands (|voidcmd|) and self-contained transfer commands 
(retrlines/retrbinary), ftplib does incapsulate statefulness, by 
handling them atomically. But with transfercmd(), it returns control to 
the user halfway through.
At this point, if ftplib doesn't itself keep track that the control 
connection is currently in invalid state for the next command, the user 
is forced to do that themselves instead. And guess what - urllib has to 
use ftplib through a wrapper class that does exactly that. That's 
definitely a "stock logic part" 'cuz it's an integral part of FTP rather 
than something specific to user logic.

The other "stock logic part" currently missing is error handling during 
transfer. If the error's cause is local (like a local exception that 
results in closing the data socket prematurely, or the user closing it 
by hand, or an ABOR they sent midway), the error code from the 
end-of-transfer response should be ignored, otherwise, it shouldn't.
Nothing currently provides this logic. ftplib.retr*() never ignore the 
code - because they never abort the transfer - but they don't handle 
exceptions, either, so they wouldn't read the response if one is raised 
on data socket read or in the callback. urllib used to handle the 
response in an overridden socket close handler, and it was forced to 
always ignore the code because it's impossible to check from there if 
there was an exception or if the data socket was closed prematurely.
Other than making ftplib keep track of the session's internal state 
while the user has control and providing the custom error handling logic 
to them, another solution is to never release control to the user 
midway, i.e. ban transfercmd() entirely and only provide self-contained 
retr*()/dir()/whatever, but let the callback signal them to abort the 
transfer. That way, the state would be kept implicitly - in the 
execution pointer.

> -- 
> Giampaolo - http://grodola.blogspot.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20161228/318b7763/attachment.html>

More information about the Python-Dev mailing list