URL parsing for the hard cases

Miles semanticist at gmail.com
Mon Jul 23 07:51:45 CEST 2007

On 7/23/07, John Nagle wrote:
> Here's another hard case.  This one might be a bug in urlparse:
> import urlparse
> s = 'ftp://administrator:password@ june
> 07/ebay/login/ebayisapi.html'
> urlparse.urlparse(s)
> yields:
> (u'ftp', u'administrator:password at', u'/originals/6 june
> 07/ebay/login/ebayisapi.html', '', '', '')
> That second field is supposed to be the "hostport" (per the RFC usage
> of the term; Python uses the term "netloc"), and the username/password
> should have been parsed and moved to the "username" and "password" fields
> of the object. So it looks like urlparse doesn't really understand FTP URLs.

Those values aren't "moved" to the fields; they're extracted on the
fly from the netloc.  Use the .hostname property of the result tuple
to get just the hostname.


More information about the Python-list mailing list