Can ftp url start with file:// ?

Strictly not a Python question, but I wanted to know from the experience of others in this list. Is this is valid ftp url? # file://ftp.example.com/blah.txt (an ftp URL) My answer is no. When we have the scheme specifically mentioned as file:// it is no point in considering it as ftp url (which should start with ftp://). If I go ahead with this assumption and fix a bug in stdlib, I am introducing a regression because at the moment the above is considered a ftp url. -- Senthil A real diplomat is one who can cut his neighbor's throat without having his neighbour notice it. -- Trygve Lie

On Fri, 9 Jul 2010 01:52:32 pm Senthil Kumaran wrote:
I agree. Just because the host is *called* ftp doesn't mean you should use the ftp protocol to get the file. http://en.wikipedia.org/wiki/File_URI_scheme
Do you have a url for the bug report? -- Steven D'Aprano

On Fri, Jul 09, 2010 at 02:23:40PM +1000, Steven D'Aprano wrote:
It was not just for the host being called ftp.example.com It was for a pattern that file:/// is local file (correct) and file://localhost/somepath is again local file (correct again) but file://anyhost.domain/file.txt is actually ftp (pretty weird).
Do you have a url for the bug report?
http://bugs.python.org/issue8801 Don't go into the suggestion in the report, but just notice that file url lead to an ftp error exception. -- Senthil

Senthil Kumaran <orsenthil@gmail.com> wrote:
RFC 1738 explicitly says that "file://<host>/<path>" is pretty much useless for anything except host=localhost: ``The file URL scheme is unusual in that it does not specify an Internet protocol or access method for such files; as such, its utility in network protocols between hosts is limited.'' So, FTP is *not* the "default protocol". On the other hand, if <host> actually begins with "ftp.", it's a pretty good guess that FTP will work. Similarly, if <host> actually begins with "www.", it's a pretty good guess that HTTP will work. This seems to me like a practicality-vs.-purity consideration. Bill

On Fri, Jul 9, 2010 at 12:41, Bill Janssen <janssen@parc.com> wrote:
Actually, FTP *is* the default protocol for most URLs with hostnames in urllib.py. urllib.open_file() delegates to open_ftp() if there's a any host other than the exact string "localhost", and open_local_file() otherwise.

To be clear, Python 2.x's urllib.urlopen() has this issue; 3.1's urllib.request.urlopen() rejects non-local hosts in a file URL. -- Tim Lesher <tlesher@gmail.com>

Tim Lesher <tlesher@gmail.com> wrote:
I just meant to point out that it's not specified that way by any RFC. And, while FTP might have been a good default choice last century when urllib.py was originally written, times have changed. I'd suggest that HTTP is a better (more likely to succeed) default choice in this century. If we want to perpetuate these guessing heuristics, I'd suggest using FTP if the hostname starts with "ftp.", and HTTP if the hostname starts with "www.", and raise an error otherwise. Bill

Paul Moore wrote:
This /could/ become a bugfix in 2.7.x if people thought it was a sufficiently egregious bug to need fixing. Given that the matter is only now coming to light it's probably bets to let sleeping dogs lie. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/

On Fri, Jul 9, 2010 at 2:04 PM, Bill Janssen <janssen@parc.com> wrote:
I'd suggest that HTTP is a better (more likely to succeed) default choice in this century.
FTP access also more often reflected the actual file hierarchy of the machine, so trying that path as a system path is more likely to work that I'd expect to see for HTTP. Really, I'd expect any non-local file: URLs to be kicked back to the application, and let it decide to re-write if it wants to. -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> "A storm broke loose in my mind." --Albert Einstein

On Fri, Jul 09, 2010 at 02:32:04PM -0400, Fred Drake wrote:
I see the reason. But I doubt if this is a reliable approach. Also when the scheme begins with file:// it should not be confused with ftp, so I think, that portion of code in urllib which work that way should be removed. The issue8801 was fixed in a different way so that no regression is introduced. For 3.2 release, shall we do way with relying on ftp for file:// scheme ? Currently for a url like file://somehost.domain//filesys/file.txt the flow control considers it a ftp url! The expected behaviour might be throw an exception saying that file:// url wont make any sense if it not localhost or absolute path. What if 'somehost.domain' is actually hostname of the machine? Should file:// be allowed in that case or is it the reason to rely on ftp? But it still does not make much sense to use ftp, because there is no guarantee that ftp service is running in that machine. -- Senthil The whole world is a scab. The point is to pick it constructively. -- Peter Beard

Senthil Kumaran wrote:
My own inclination would be to regard the current treatment of file: as a bug (albeit one not worth fixing on 2.x). RFC 1630 specification lists the "file" scheme as being for "local file access", and RFC 1738 says they are for "host-specific file names" and points out that "The file URL scheme is unusual in that it does not specify an Internet protocol or access method for such files; as such, its utility in network protocols between hosts is limited." Presumably a hostname in such a URI would require that some host-specific protocol be used (but this should be an access protocol like SMB or NFS, not a transfer protocol like FTP). It seems pretty clear that randomly interpreting particular host names to imply a specific remote-access protocol like FTP is bogus. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/

On Mon, 12 Jul 2010 12:01:07 am Steve Holden wrote:
KDE uses SMB for non-local hostnames in file URIs.
It seems pretty clear that randomly interpreting particular host names to imply a specific remote-access protocol like FTP is bogus.
Agreed. The right behaviour is to raise an exception and let the caller deal with it, or provide a means to register an alternative. -- Steven D'Aprano

On Sat, Jul 10, 2010 at 11:56 PM, Senthil Kumaran <orsenthil@gmail.com> wrote:
I'm not trying to defend the current behavior of defaulting to FTP as a good thing; it's definitely surprising. I am trying to rationalize it so I can be sure I understand why it might have been done to start with. My own preference is to kick out any non-local references with an exception that can be detected (possibly derived from ValueError) so that applications that want rewrite (hopefully with user agreement!) can do so. (Checking for non-local file: URLs on input might be better, of course.) -Fred -- Fred L. Drake <fdrake at acm.org>

On Fri, 9 Jul 2010 01:52:32 pm Senthil Kumaran wrote:
I agree. Just because the host is *called* ftp doesn't mean you should use the ftp protocol to get the file. http://en.wikipedia.org/wiki/File_URI_scheme
Do you have a url for the bug report? -- Steven D'Aprano

On Fri, Jul 09, 2010 at 02:23:40PM +1000, Steven D'Aprano wrote:
It was not just for the host being called ftp.example.com It was for a pattern that file:/// is local file (correct) and file://localhost/somepath is again local file (correct again) but file://anyhost.domain/file.txt is actually ftp (pretty weird).
Do you have a url for the bug report?
http://bugs.python.org/issue8801 Don't go into the suggestion in the report, but just notice that file url lead to an ftp error exception. -- Senthil

Senthil Kumaran <orsenthil@gmail.com> wrote:
RFC 1738 explicitly says that "file://<host>/<path>" is pretty much useless for anything except host=localhost: ``The file URL scheme is unusual in that it does not specify an Internet protocol or access method for such files; as such, its utility in network protocols between hosts is limited.'' So, FTP is *not* the "default protocol". On the other hand, if <host> actually begins with "ftp.", it's a pretty good guess that FTP will work. Similarly, if <host> actually begins with "www.", it's a pretty good guess that HTTP will work. This seems to me like a practicality-vs.-purity consideration. Bill

On Fri, Jul 9, 2010 at 12:41, Bill Janssen <janssen@parc.com> wrote:
Actually, FTP *is* the default protocol for most URLs with hostnames in urllib.py. urllib.open_file() delegates to open_ftp() if there's a any host other than the exact string "localhost", and open_local_file() otherwise.

To be clear, Python 2.x's urllib.urlopen() has this issue; 3.1's urllib.request.urlopen() rejects non-local hosts in a file URL. -- Tim Lesher <tlesher@gmail.com>

Tim Lesher <tlesher@gmail.com> wrote:
I just meant to point out that it's not specified that way by any RFC. And, while FTP might have been a good default choice last century when urllib.py was originally written, times have changed. I'd suggest that HTTP is a better (more likely to succeed) default choice in this century. If we want to perpetuate these guessing heuristics, I'd suggest using FTP if the hostname starts with "ftp.", and HTTP if the hostname starts with "www.", and raise an error otherwise. Bill

Paul Moore wrote:
This /could/ become a bugfix in 2.7.x if people thought it was a sufficiently egregious bug to need fixing. Given that the matter is only now coming to light it's probably bets to let sleeping dogs lie. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/

On Fri, Jul 9, 2010 at 2:04 PM, Bill Janssen <janssen@parc.com> wrote:
I'd suggest that HTTP is a better (more likely to succeed) default choice in this century.
FTP access also more often reflected the actual file hierarchy of the machine, so trying that path as a system path is more likely to work that I'd expect to see for HTTP. Really, I'd expect any non-local file: URLs to be kicked back to the application, and let it decide to re-write if it wants to. -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> "A storm broke loose in my mind." --Albert Einstein

On Fri, Jul 09, 2010 at 02:32:04PM -0400, Fred Drake wrote:
I see the reason. But I doubt if this is a reliable approach. Also when the scheme begins with file:// it should not be confused with ftp, so I think, that portion of code in urllib which work that way should be removed. The issue8801 was fixed in a different way so that no regression is introduced. For 3.2 release, shall we do way with relying on ftp for file:// scheme ? Currently for a url like file://somehost.domain//filesys/file.txt the flow control considers it a ftp url! The expected behaviour might be throw an exception saying that file:// url wont make any sense if it not localhost or absolute path. What if 'somehost.domain' is actually hostname of the machine? Should file:// be allowed in that case or is it the reason to rely on ftp? But it still does not make much sense to use ftp, because there is no guarantee that ftp service is running in that machine. -- Senthil The whole world is a scab. The point is to pick it constructively. -- Peter Beard

Senthil Kumaran wrote:
My own inclination would be to regard the current treatment of file: as a bug (albeit one not worth fixing on 2.x). RFC 1630 specification lists the "file" scheme as being for "local file access", and RFC 1738 says they are for "host-specific file names" and points out that "The file URL scheme is unusual in that it does not specify an Internet protocol or access method for such files; as such, its utility in network protocols between hosts is limited." Presumably a hostname in such a URI would require that some host-specific protocol be used (but this should be an access protocol like SMB or NFS, not a transfer protocol like FTP). It seems pretty clear that randomly interpreting particular host names to imply a specific remote-access protocol like FTP is bogus. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/

On Mon, 12 Jul 2010 12:01:07 am Steve Holden wrote:
KDE uses SMB for non-local hostnames in file URIs.
It seems pretty clear that randomly interpreting particular host names to imply a specific remote-access protocol like FTP is bogus.
Agreed. The right behaviour is to raise an exception and let the caller deal with it, or provide a means to register an alternative. -- Steven D'Aprano

On Sat, Jul 10, 2010 at 11:56 PM, Senthil Kumaran <orsenthil@gmail.com> wrote:
I'm not trying to defend the current behavior of defaulting to FTP as a good thing; it's definitely surprising. I am trying to rationalize it so I can be sure I understand why it might have been done to start with. My own preference is to kick out any non-local references with an exception that can be detected (possibly derived from ValueError) so that applications that want rewrite (hopefully with user agreement!) can do so. (Checking for non-local file: URLs on input might be better, of course.) -Fred -- Fred L. Drake <fdrake at acm.org>
participants (7)
-
Bill Janssen
-
Fred Drake
-
Paul Moore
-
Senthil Kumaran
-
Steve Holden
-
Steven D'Aprano
-
Tim Lesher