[Web-SIG] So what's missing?

Sat Oct 25 16:54:06 EDT 2003

On Sat, 25 Oct 2003, Ian Bicking wrote:

> On Saturday, October 25, 2003, at 07:38 AM, John J Lee wrote:
[...]
> > It's a minor issue, but it seems nicer to me to have authentication
> > separate if it can easily be separate -- that fits in with the general
> > philosophy of urllib2 that you pick 'n mix the features you want.  What
> > are the trivial reasons for it breaking on non-HTTP auth?
>
> There's a HTTPBasicAuthHandler, but no HTTPSBasicAuthHandler, and
> though the two concepts are orthogonal they are still tied into each
> other.  Another option would be to take HTTPS out of the class
> hierarchy, and make SSL a feature of HTTPHandler (and maybe the other

Well, that would break code.  And adding an HTTPSBasicAuthHandler is only
five lines or so (even less if you want a class that handles both HTTP and
HTTPS).

[...]
> The AuthHandlers are a little annoying too, you can't just give them a
> username/password.  You have to give them some manager object that can
> be queried for a password for a username/realm/URL.  This is a nice
> option to have, but in most cases you don't need that kind of
> generality, and it makes it a lot harder to understand what you need to
> do.  username=x, password=y are very easy to understand.

That's just a documentation issue, I think -- and possibly adding some
convenience method.  I wrote some docs for this, and I keep asking for
people who seem to be actually using these features to check this
documentation bug, but nobody has yet:

http://www.python.org/sf/798244

You don't have to provide a password manager object in fact: just let the
HTTPBasicAuthHandler create one for you, and use the add_password method
(which admittedly does require realm and uri as well as username /
password -- perhaps None should act as a wildcard there?).

> >> Cookie handling also fits into this, but from the opposite direction
> >> from a URL object, since we are creating something of a user agent.
> >> You'd almost want to do:
> >>
> >> ua = UserAgent()
> >> url = web.URL('http://whatever.com')
> >> content = ua.get(url)
> >>
> >> Or something like that.  I think an explicit agent is called for,
> >> separate from the URLs that it may retrieve.  But only when you start
> >> considering cookies and caching.
> > [...]
> >
> > Are you suggesting replacing urllib2, building on top of it, or
> > extending it?  urllib2's handlers already gets a lot of the
> > 'user-agent' job done.  What requirements does caching impose that
> > urllib2 doesn't meet?  There's already a CacheFTPHandler.
>
> I think a URL class would probably building on top of urllib2, but
> would also need some more features.  And obviously urllib2 can't go
> anywhere, so we might as well use it.

OK.  Does this URL class proposal fit with that path module PEP, do you
think?  Somebody mentioned that PEP (it was a PEP, wasn't it...?) before,
but I've forgotten everything about it :-)

> The caching in CacheFTPHandler is connection caching, not result

OK.

> caching.  HTTP has a wide array of ways to indicate caching, check for
> updates, etc.  Enough that it becomes kind of complicated, which is why
> I don't think that fits well into the idea of a URL object (which
> should be quite simple, at least from the outside).

That doesn't answer my question.  To repeat: What requirements does
caching impose that *urllib2* doesn't meet?  And why do we need a new
UserAgent class when we already have urllib2 and its handlers?

John