[ python-Bugs-1436428 ] urllib has trouble with Windows filenames
SourceForge.net
noreply at sourceforge.net
Thu Apr 13 02:12:02 CEST 2006
Bugs item #1436428, was opened at 2006-02-22 07:03
Message generated for change (Comment added) made by zseil
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1436428&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Donovan Eastman (dpeastman)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib has trouble with Windows filenames
Initial Comment:
When you pass urllib the name of a local file including
a Windows drive letter (e.g. 'C:\dir\My File.txt')
URLopener.open() incorrectly interprets the drive
letter as the scheme of a URL. Of course, given that
there is no scheme 'C', this fails.
I have solved this in my own code by putting the
following test before calling urllib.urlopen():
if url[1] == ':' and url[0].isalpha():
url = 'file:' + url
Although this works fine in my particular case, it
seems like urllib should just simply "do the right
thing" without having to worry about it. Therefore I
propose that urllib should automatically assume that
any URL that begins with a single alpha followed by a
colon is a local file.
The only potential downside would be that it would
preclude the use of single letter scheme names. I did
a little research on this. RFC 3986 suggests, but does
not explicitly state that scheme names must be more
than one character.
(http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#scheme)
. That said, there are no currently recognized single
letter scheme names
(http://www.iana.org/assignments/uri-schemes.html) and
it seems very unlikely that there every would be.
I would gladly write the code for this myself -- but I
suspect that it would take someone longer to review and
integrate my changes than it would to just write the code.
Thanks,
Donovan
----------------------------------------------------------------------
Comment By: iga Seilnacht (zseil)
Date: 2006-04-13 02:12
Message:
Logged In: YES
user_id=1326842
There are already two platform specific functions
in urllib module just for this purpose: pathname2url
and url2pathname. See
http://docs.python.org/lib/module-urllib.html#l2h-3193.
I agree that this should be closed as invalid.
----------------------------------------------------------------------
Comment By: Andrew Clover (bobince)
Date: 2006-03-20 18:41
Message:
Logged In: YES
user_id=311085
Filepaths aren't URIs and attempting to hide the difference
in the backend is doomed to fail (as it did for SAX).
Throw filenames with colons in, network paths, Mac paths and
RISC OS paths into the mix, and you've got a situation where
it is all but impossible to handle correctly.
In any case, the docs *don't* say you can pass in a filepath:
If the URL does not have a scheme identifier, or if
it has file: as its scheme identifier, this opens a
local file
This means the string you pass in is unequivocally a URL
*not* a pathname... just that you can leave the scheme
prefix off for file: URLs. Effectively this is a relative URL.
r'C:\spam' is *not* a valid way to refer to a local file
using a relative URL. Pass it through pathname2url and
you'll get '///C|/spam', which is okay; 'C|/spam' and
'/C|span' will also work.
Even on Unix, a filepath won't always work when passed to
urlopen. Filenames can have percent signs in, which have to
be encoded in URLs, for example. Always use pathname2url or
you're going to trip up.
(Suggest setting status INVALID, possible clarification to
docs to warn against passing a filepath to urlopen?)
----------------------------------------------------------------------
Comment By: Donovan Eastman (dpeastman)
Date: 2006-03-14 03:32
Message:
Logged In: YES
user_id=757799
OK - Here's my suggested fix:
This can be fixed with a single if statement (and a comment
to explain it to confused unix programmers).
In splittype(), right after the line that reads:
scheme = match.group(1)
add the following:
#ignore single char schemes to avoid confusion with win32
drive letters
if len(scheme) > 1:
...and indent the next line.
Alternatively, the if statement could read:
if len(scheme) > 1 or sys.platform != 'win32':
...which would allow single letter scheme names on
non-Windows systems. I would argue that it is better to be
consistent and have it work the same way on all OS's.
----------------------------------------------------------------------
Comment By: Donovan Eastman (dpeastman)
Date: 2006-03-14 02:56
Message:
Logged In: YES
user_id=757799
Reasons why urllib should open local files:
1) This allows you to write code that handles local files
and Internet files equally well -- without having to do any
special magic of your own.
2) The docs all say that it should.
I believe this would work just fine under Unix. In
URLopener.open() it looks for the protocol prefix and if it
can't find one, it assumes that it is a local file.
The problem on Windows is that you have these pesky drive
letters. The form 'C:\location' ends up looking a lot like
the form 'http://location'. Therefore it looks for a
protocol called 'c' -- which obviously isn't going to work.
----------------------------------------------------------------------
Comment By: Koen van de Sande (shadowmorpher)
Date: 2006-03-13 20:19
Message:
Logged In: YES
user_id=270334
Why should the URL lib module support opening of local
files? It already does so through the file: protocol prefix,
and do not see why it should support automatic detection of
Windows filenames. AFAIK it does not do automatic detection
of Unix filenames (one could recognize it from /home/
something), so why would Windows work differently?
I'm not an expert or anything, so I might be wrong.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1436428&group_id=5470
More information about the Python-bugs-list
mailing list