[ python-Bugs-1436428 ] urllib has trouble with Windows filenames

SourceForge.net noreply at sourceforge.net
Wed May 3 07:35:54 CEST 2006

Bugs item #1436428, was opened at 2006-02-22 06:03
Message generated for change (Settings changed) made by gbrandl
You can respond by visiting: 

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
>Status: Closed
>Resolution: Wont Fix
Priority: 5
Submitted By: Donovan Eastman (dpeastman)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib has trouble with Windows filenames

Initial Comment:
When you pass urllib the name of a local file including
a Windows drive letter (e.g. 'C:\dir\My File.txt')
URLopener.open() incorrectly interprets the drive
letter as the scheme of a URL.  Of course, given that
there is no scheme 'C', this fails.

I have solved this in my own code by putting the
following test before calling urllib.urlopen():

if url[1] == ':' and url[0].isalpha():
    url = 'file:' + url

Although this works fine in my particular case, it
seems like urllib should just simply "do the right
thing" without having to worry about it.  Therefore I
propose that urllib should automatically assume that
any URL that begins with a single alpha followed by a
colon is a local file.

The only potential downside would be that it would
preclude the use of single letter scheme names.  I did
a little research on this.  RFC 3986 suggests, but does
not explicitly state that scheme names must be more
than one character.
.  That said, there are no currently recognized single
letter scheme names
(http://www.iana.org/assignments/uri-schemes.html) and
it seems very unlikely that there every would be.

I would gladly write the code for this myself -- but I
suspect that it would take someone longer to review and
integrate my changes than it would to just write the code.



>Comment By: Georg Brandl (gbrandl)
Date: 2006-05-03 05:35

Logged In: YES 

I agree with zseil.


Comment By: Žiga Seilnacht (zseil)
Date: 2006-04-13 00:12

Logged In: YES 

There are already two platform specific functions
in urllib module just for this purpose: pathname2url
and url2pathname. See
I agree that this should be closed as invalid.


Comment By: Andrew Clover (bobince)
Date: 2006-03-20 17:41

Logged In: YES 

Filepaths aren't URIs and attempting to hide the difference
in the backend is doomed to fail (as it did for SAX).

Throw filenames with colons in, network paths, Mac paths and
RISC OS paths into the mix, and you've got a situation where
it is all but impossible to handle correctly.

In any case, the docs *don't* say you can pass in a filepath:

  If the URL does not have a scheme identifier, or if
  it has file: as its scheme identifier, this opens a
  local file

This means the string you pass in is unequivocally a URL
*not* a pathname... just that you can leave the scheme
prefix off for file: URLs. Effectively this is a relative URL.

r'C:\spam' is *not* a valid way to refer to a local file
using a relative URL. Pass it through pathname2url and
you'll get '///C|/spam', which is okay; 'C|/spam' and
'/C|span' will also work.

Even on Unix, a filepath won't always work when passed to
urlopen. Filenames can have percent signs in, which have to
be encoded in URLs, for example. Always use pathname2url or
you're going to trip up.

(Suggest setting status INVALID, possible clarification to
docs to warn against passing a filepath to urlopen?)


Comment By: Donovan Eastman (dpeastman)
Date: 2006-03-14 02:32

Logged In: YES 

OK - Here's my suggested fix:
This can be fixed with a single if statement (and a comment
to explain it to confused unix programmers).

In splittype(), right after the line that reads: 
scheme = match.group(1)
add the following:
#ignore single char schemes to avoid confusion with win32
drive letters
if len(scheme) > 1:

...and indent the next line.

Alternatively, the if statement could read:
if len(scheme) > 1 or sys.platform != 'win32':
...which would allow single letter scheme names on
non-Windows systems.  I would argue that it is better to be
consistent and have it work the same way on all OS's.


Comment By: Donovan Eastman (dpeastman)
Date: 2006-03-14 01:56

Logged In: YES 

Reasons why urllib should open local files:
1) This allows you to write code that handles local files
and Internet files equally well -- without having to do any
special magic of your own.
2) The docs all say that it should.

I believe this would work just fine under Unix. In
URLopener.open() it looks for the protocol prefix and if it
can't find one, it assumes that it is a local file.

The problem on Windows is that you have these pesky drive
letters.  The form 'C:\location' ends up looking a lot like
the form 'http://location'.  Therefore it looks for a
protocol called 'c' -- which obviously isn't going to work.


Comment By: Koen van de Sande (shadowmorpher)
Date: 2006-03-13 19:19

Logged In: YES 

Why should the URL lib module support opening of local 
files? It already does so through the file: protocol prefix, 
and do not see why it should support automatic detection of 
Windows filenames. AFAIK it does not do automatic detection 
of Unix filenames (one could recognize it from /home/
something), so why would Windows work differently?

I'm not an expert or anything, so I might be wrong.


You can respond by visiting: 

More information about the Python-bugs-list mailing list