urllib.urlretrieve() and handling 550 errors when reading from FTP

Bug #635453 reports that when you use urllib.urlretrieve() to read from an FTP server and the file you are trying to get does not have the proper permissions for you, you are given a listing of the directory instead of an exception (this is all done by Lib/urllib.py:752). Now this doesn't make sense to me and I suspect most people would agree. But having that explicit line there on top of it having been in the file since it was renamed from urlopen way backin 1994 makes me hesitate before I delete the line. Anyone have issues if I ditch the line so an exception is raised instead of getting a directory listing instead? -Brett

Bug #635453 reports that when you use urllib.urlretrieve() to read from an FTP server and the file you are trying to get does not have the proper permissions for you, you are given a listing of the directory instead of an exception (this is all done by Lib/urllib.py:752).
Now this doesn't make sense to me and I suspect most people would agree. But having that explicit line there on top of it having been in the file since it was renamed from urlopen way backin 1994 makes me hesitate before I delete the line.
Anyone have issues if I ditch the line so an exception is raised instead of getting a directory listing instead?
If you remove this, you won't be able to get a directory listing of directories. I tried this with and without the line; with that line in, it gives a directlry listing, without it gives an IOError (Not a regular file):
urllib.urlretrieve("ftp://ftp.python.org/pub", "xyzzy")
So I'd be against deleting this, unless there's a different way to get directory listings. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Bug #635453 reports that when you use urllib.urlretrieve() to read from an FTP server and the file you are trying to get does not have the proper permissions for you, you are given a listing of the directory instead of an exception (this is all done by Lib/urllib.py:752).
Now this doesn't make sense to me and I suspect most people would agree. But having that explicit line there on top of it having been in the file since it was renamed from urlopen way backin 1994 makes me hesitate before I delete the line.
Anyone have issues if I ditch the line so an exception is raised instead of getting a directory listing instead?
If you remove this, you won't be able to get a directory listing of directories. I tried this with and without the line; with that line in, it gives a directlry listing, without it gives an IOError (Not a regular file):
urllib.urlretrieve("ftp://ftp.python.org/pub", "xyzzy")
So I'd be against deleting this, unless there's a different way to get directory listings.
Try it again but with a trailing slash on the address. With the line removed it will fetch the directory. Otherwise it errors out. The code apparently tries to get the file and when it fails with a 550 (meaning the file cannot be accessed because of permissions or lack of existence), the code then send a LIST command which lists whatever the argument is, file or directory. Unfortunately I can't think of a good way to detect the difference between a file and a directory that just doesn't have the ending slash beyond reading the output and seeing if it is a single listing for a file with the same name as the LIST request. -Brett

If you remove this, you won't be able to get a directory listing of directories. I tried this with and without the line; with that line in, it gives a directlry listing, without it gives an IOError (Not a regular file):
urllib.urlretrieve("ftp://ftp.python.org/pub", "xyzzy")
So I'd be against deleting this, unless there's a different way to get directory listings.
Try it again but with a trailing slash on the address. With the line removed it will fetch the directory. Otherwise it errors out. The code apparently tries to get the file and when it fails with a 550 (meaning the file cannot be accessed because of permissions or lack of existence), the code then send a LIST command which lists whatever the argument is, file or directory.
Unfortunately I can't think of a good way to detect the difference between a file and a directory that just doesn't have the ending slash beyond reading the output and seeing if it is a single listing for a file with the same name as the LIST request.
Why should I have to use a trailing backslash to get a directory listing? That's not in the FTP standard and probably won't work everywhere. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote: [SNIP]
Why should I have to use a trailing backslash to get a directory listing? That's not in the FTP standard and probably won't work everywhere.
It's the way the code is written for the FTP wrapper in urllib; has nothing to do with ftp.python.org or FTP. Basically the code checks to see if there is a trailing slash. If it does it assumes it is a directory and thus only requests a listing of the path from the server. Otherwise it tries to get the file. If that get fails (with a 550) it then decides to try getting a listing for the path. If that works it returns that, otherwise it completely fails. Basically I don't think there is a way to make this work for urllib nicely. If no one has any objections I will just clarify the docs stating that if an attempt to get a file fails on a 550 error (and of course I will say what that means) it will then try a LIST command and if that succeeds that is what is returned. And if you need more fine-grained control then use ftplib. -Brett

Why should I have to use a trailing backslash to get a directory listing? That's not in the FTP standard and probably won't work everywhere.
It's the way the code is written for the FTP wrapper in urllib; has nothing to do with ftp.python.org or FTP. Basically the code checks to see if there is a trailing slash. If it does it assumes it is a directory and thus only requests a listing of the path from the server.
Ah, right. Sorry. Alzheimer is setting in early today. :-)
Otherwise it tries to get the file. If that get fails (with a 550) it then decides to try getting a listing for the path. If that works it returns that, otherwise it completely fails.
Basically I don't think there is a way to make this work for urllib nicely.
Depends on what you call nicely. I think of the current behavior as "nice", because directories are a lot more common that files you can't read in the typical ftp setup.
If no one has any objections I will just clarify the docs stating that if an attempt to get a file fails on a 550 error (and of course I will say what that means) it will then try a LIST command and if that succeeds that is what is returned. And if you need more fine-grained control then use ftplib.
Sounds good to me. Thanks! --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote: [SNIP]
If no one has any objections I will just clarify the docs stating that if an attempt to get a file fails on a 550 error (and of course I will say what that means) it will then try a LIST command and if that succeeds that is what is returned. And if you need more fine-grained control then use ftplib.
Sounds good to me. Thanks!
OK, in as rev. 1.53 for Doc/lib/liburllib.tex in the "Restrictions" section. The wording seems fine to me, but after my already glorious email on why test_strptime was faililng, I would appreciate someone else reading and making sure it makes sense. Once people have cleared it for public consumption I will backport it. -Brett
participants (2)
-
Brett C.
-
Guido van Rossum