[Web-SIG] So what's missing?

Sun Oct 26 08:24:39 EST 2003

On Sat, 25 Oct 2003, Ian Bicking wrote:
> On Saturday, October 25, 2003, at 07:12 PM, John J Lee wrote:
[...]
> a) There's not a lot of different ways to deal with a 401 response.  Is
> there something that's not covered by basic and digest authentication?

You may have a point.

> b) Accessing a database should happen in the password manager, not the
> handler.  The handler handles the protocol, the database is not tied to
> the protocol.  I'm not proposing that the password manager go away
> (though it would be nice if it was hidden for simple usage)

OK, and another one.  :-)

> c) This doesn't have to effect backward compatibility anyway.  We can
> leave HTTPBasicAuthHandler in there (deprecated), but also fold it's
> functionality into HTTPHandler.  HTTPBasicAuthHandler doesn't require
> that HTTPHandler *not* handle authentication.

Well, it does if you do something important in your auth handler that
never gets called because HTTPHandler has decided it knows best when it
comes to 40x.  But like you say, there's probably not much important that
you could do since password management is already abstracted out.

I *still* don't see why you're complaining about the current state of
affairs, though.

> > Anyway, it may or may not be the perfect system, but I'm not convinced
> > it needs changing.  Can you give a specific example of where having lots
> > of handlers becomes oppressive?
>
> The documentation is certainly a problem (e.g., the
> HTTPBasicAuthHandler page), though it could be organized differently
> without changing the code.  It's definitely ravioli code
> (http://c2.com/cgi/wiki?RavioliCode), with all that entails -- IMHO
> it's hard to document ravioli code well.  (It's not so important how
> things are structured internally, but currently urllib2 also exposes
> that complex class structure)

It's pretty simple conceptually: OpenerDirector asks all the handlers if
they want to handle, not handle, or abort a response.  It does the same
for errors.  Most of the handlers' functions are self-explanatory from
their class names (OK, I guessed CacheFTPHandler wrong, but it was 50-50
:-).  I wouldn't call that ravioli.

I'm still waiting for that example.

> Also urlopen is not really extensible.  You can't tell urlopen to use

Not directly, no.  You have to do it via build_opener, or via
OpenerDirector itself (or another class.  That's probably not ideal: what
did you have in mind instead?

> authentication information (and it doesn't obey browser URL
> conventions, like http://user:password@domain/).

What is that convention?  Is it standardised in an RFC?  I see
ProxyHandler knows about that syntax.  Obviously it's not an intrinsic
limitation of the handler system.

> And we want to add
> structured POST data to that method (but also allow non-structured

We do?  Why not just have a function (to make file upload data, assuming
that's what you're thinking of)?

> data), and cookies, and it might be nice to set the user-agent, and
> maybe other things that I haven't thought of.  If urlopen doesn't
> support these extra features then programmers have to learn a new API
> as their program becomes more complex.

Well, I can do those things already (cookies, set user-agent) using
urllib2.  User-Agent is a bit ugly, I'll grant you, but I don't lose sleep
over it.  I did find an extension (backwards-compatible, I hope & believe)
made things much cleaner -- see the RFE I mentioned earlier.  But no need
for a whole new layer.

Mind you, if your idea can do the same job as my RFE, then it should
certainly be considered alongside that.

> Yet none of these features
> would be all that difficult to add via urlopen or perhaps other simple
> functions, (instead of via classes).  I don't think there's any need
> for classes in the external API -- fetching URLs is about doing things,
> not representing things, and functions are easier to understand for
> doing.

Details?  The only example you've given so far involved a UserAgent class.

[...]
> > So, merely because you think "it feels like a new object", you're
> > proposing to create a whole new layer of complexity for users to learn?
> > Why should people have to learn a new API just to get caching?  If
> > somebody had implemented HTTP caching and found the handler mechanism
> > lacking, or had a specific argument that showed it to be so, a new
> > layer *might* be justified.  Otherwise, I think it's a bad idea.
>
> I think fetching and caching are two separate things.  The caching
> requires a context.  The fetching doesn't.  I think fetching things

The context is provided by the handler.

[...]
> I also don't see how caching would fit very well into the handler
> structure.  Maybe there'd be a HTTPCachingHandler, and you'd
> instantiate it with your caching policy? (where it stores files, how
> many files, etc)  Also a HTTPBasicAuthCachingHandler,
> HTTPDigestAuthCachingHandler, HTTPSCachingHandler, and so on?  This
> caching is orthogonal -- not just to things like authentication, but

My assumption was that it wasn't orthogonal, since RFC 2616 seems to have
rather a lot to say on the subject.

If it *is* (or part of it is) orthogonal, three options come to mind.
Let's say you have a cache class.

1. All the normal handlers know about the cache class, but have caching
   off by default.

2. Write a CacheHandler with a default_open.  If there's a cache hit,
   return it, otherwise return None (let somebody else try to handle it).

3. Subclass (or replace without bothering to subclassing) OpenerDirector.
   I guess open is probably what you'd want to change, but I don't know
   about HTTP and other protocols' caching rules.

I haven't thought it through so I certainly don't claim to know how any of
these will turn out (though I'd guess 2. would do the job of any caching
that's orthogonal to the various protocol schemes).  If you want to
justify a new layer, though, it's up to you to show caching *doesn't* fit
urllib2 as-is.  YAGNI.

> even to HTTP (to some degree).  The handler structure doesn't allow
> orthogonal features.  Except through mixins, but don't get me started
> on mixins...

I don't think that's true -- see above.

Again, my 'processors' patch is relevant here (see that RFE).  But no
point in re-iterating here the long discussion I posted on the SF bug
tracker.

> Using a separate class, not related to Handlers, isn't more complex.
> Either way we have to provide the same features and the same options,
> and document all of those.

I think it would be fruitless to comment on this until you put forward
some details.

> No matter which way you cut it, it's new
> stuff, it's another layer.  Implementing it in a new class is just
> calling it what it is.

Well, um, no.  Having a new layer is different to not having a new layer.
Otherwise, what was this little discussion of ours all about??

Another thing I think we shouldn't forget is that nobody has actually said
they're going to write any caching code yet!  Are you?  Do you have any
other requirements driving the need for this new layer, or is it all down
to caching?

John