ftplib - Did the whole file get sent?

John Nagle nagle at animats.com
Mon Oct 25 15:26:12 EDT 2010


On 10/22/2010 10:03 PM, Sean DiZazzo wrote:
> Hi,
>
> I have some scripts that send files via ftplib to a client's ftp
> site.  The scripts have generally worked great for a few years.
> Recently, the client complained that only part of an important  file
> made it to their server.  My boss got this complaint and brought it to
> my attention.
>
> The first thing I did was track down the specific file transfer in my
> logs.  My log showed a success, I told my boss that, but he wasn't
> satisfied with my response.  He began asking if there is a record of
> the file transfer ack and number of bytes sent for this transfer.  I'm
> not keeping a record of that...only success or failure (and some
> output)
>
> How can I assure him (and the client) that the transfer completed
> successfully like my log shows?  I'm using code similar to the
> following:
>
> try:
>      ftp = ftplib.FTP(host)
>      ftp.login(user, pass)
>      ftp.storbinary("STOR " + destfile, open(f.path, 'rb'))
>      # log this as success
> except:
>      # log this as an error
>
> Is ftplib reliable enough to say that if an exception is not thrown,
> that the file was transferred in full?

    No.

    This was for years an outstanding problem with FTP under Windows.
See "http://www.fourmilab.ch/documents/corrupted_downloads/"
And 
"http://us.generation-nt.com/answer/incomplete-ftp-upload-under-windows-xp-help-139017881.html"
And "http://winscp.net/forum/viewtopic.php?t=6458".  Many FTP
implementations have botched this.  TCP has all the machinery to
guarantee that both ends know the transfer completed
properly, but it's often misused.

    Looking at the Python source, it doesn't look good.  The "ftplib"
module does sending by calling sock_sendall in "socketmodule.c".
That does an OS-level "send", and once everything has been sent,
returns.

    But an OS-level socket send returns when the data is queued for
sending, not when it is delivered.  Only when the socket is closed,
and the close status checked, do you know if the data was delivered.
There's a final TCP close handshake that occurs when close has
been called at both ends, and only when it completes successfully
do you know that the data has been delivered.

    At the socket level, this is performed by "shutdown" (which
closes the connection and returns the proper network status
information), or by "close" (which forces a shutdown but doesn't
return status).

    Look at sock_close in "socketmodule.c".  Note that it ignores the
return status on close, always returns None, and never raises an 
exception.  As the Linux manual page for "close" says:
"Not checking the return value of close() is a common but nevertheless 
serious programming error. It is quite possible that errors on a 
previous write(2) operation are first reported at the final close(). Not 
checking the return value when closing the file may lead to silent loss 
of data."

    "ftplib", in "storlines" and "storbinary", calls "close"
without calling "shutdown" first.  So if the other end disconnects
after all data has been queued but not received, the sender will
never know.  FAIL.

    So there's your bug.

				John Nagle



More information about the Python-list mailing list