[Catalog-sig] OpenID login to PyPI

Sun Nov 15 23:06:48 CET 2009

Ben Finney <ben+python at benfinney.id.au> writes:

> PyPI currently uses OpenID, but defeats much of the point of that
> system by skipping important parts of the authentication protocol and
> disallowing all but a small subset of identifiers (those that are
> within a small set of domains).

Here's a blow-by-blow. If you're just an OpenID user, you don't need to
know *any* of this; it's targeted at Martin, as someone *implementing*
OpenID (on PyPI, as an OpenID Relying Party). But it will likely be of
interest to those who want to know the details of what we're asking PyPI
to do.

On 15-Nov-2009, "Martin v. Löwis" wrote:
> > Well, when I login my registered ID is www.voidspace.org.uk and
> > *not* fuzzyman.myopenid.com - so I believe you are incorrect (and
> > in fact this very point was touted as one of the advantages of
> > openid - that your account is independent of your provider and
> > that you *can* change provider whilst retaining the same id).
> 
> On the wire (between relying party and provider),
> voidspace.org.co.uk does never appear.

That's because “between relying party and provider” is already in the
middle of the procedure: you've glossed over several steps.

The first step, Initiation, is when the relying party asks the user
“what is your OpenID?” with a form field named ‘openid_identifier’
<URL:http://openid.net/specs/openid-authentication-2_0.html#rfc.section.7.1>.
Michael's browser can recognise the field by that name and populate it
automatically, or he can type in ‘voidspace.org.uk’ himself and submit
the form. The provider is unknown at this point.

Then, the relying party performs Normalisation on Michael's input in
the ‘openid_identifier’ field to turn it into a full Identifier
<URL:http://openid.net/specs/openid-authentication-2_0.html#rfc.section.7.2>,
which (because the relying party must follow HTTP redirects, etc.)
results in ‘http://www.voidspace.co.uk/’ as the Claimed Identifier.
The provider is still unknown at this point.

The relying party only gets to the provider after performing Discovery
<URL:http://openid.net/specs/openid-authentication-2_0.html#rfc.section.7.3>
on the Claimed Identifier the user presents. For the Claimed
Identifier in this case, the path that succeeds is HTML discovery
<URL:http://openid.net/specs/openid-authentication-2_0.html#html_disco>,
and at *that* point the relying party has the OP-Local Identifier
<URL:http://openid.net/specs/openid-authentication-2_0.html#rfc.section.7.3.1>
to be used for this authentication session.

> So all I (as a relying party) get verifyied is
> fuzzyman.myopenid.com.

That (or rather, ‘http://fuzzyman.myopenid.com/’) is the OP-Local
Identifier. It is what the relying party presents to the OpenID
Provider during authentication.

But the OP-Local Identifier is transitory; it can change at any time
in the future if Michael chooses to modify the information at
<URL:http://www.voidspace.org.uk/>. The relying party should not
persistently associate the OP-Local Identifier with Michael's account;
only the Claimed Identifier is persistent. The rest of the information
is discovered each time Michael uses that identifier.

> Why should I trust that voidspace.org.uk is actually a valid ID?

It's valid if you can use it in the OpenID system. The relying party
needs to authenticate it each time Michael presents it for
authentication; the provider gets to say whether it's owned by the
user presenting it at the time.

> Can't you then produce hundreds of IDs, all delegating to the same
> identity?

Yes, that's part of the point of delegated identity. Of course,
Michael will only do that if he wants to go to the effort of
maintaining many separate identities (e.g. one for work, one for play,
one for his persona as a subversive garden gnome painter), because
they'll all be treated separately. Any time Michael wants to be using
the *same* identity again, he'll present the same OpenID.

> IOW, why should I (as a relying party) pay any attention to the ID
> that you entered, rather than to what I get actually validated?

Because that's what the user has *told you* is the OpenID they want to
use. It is his “Claimed Identifier”, and is the identity token.

It would help if you'd ignore what's going on when you go to
StackOverflow and click on the Google icon. That's apparently
confusing you in this case, because it's glossing over precisely the
step that Michael, Glyph, and I want you to take note of: the
Initiation step, where someone deliberately says to the relying party
“this is my OpenID”.

Think of the OpenID as a token, presented by the person as their
identity, like a combination of username and password. They get to
decide what it is (which is, again, part of the point of OpenID), and
they get to decide which provider is in charge of authenticating their
ownership of that token.

The relying party uses the OpenID Authentication protocol each time to
verify whether the provider agrees the person presenting that identity
actually owns it.

When Michael goes to StackOverflow and logs in, he will (if he
chooses) log in with the OpenID ‘voidspace.co.uk’, and that is what
StackOverflow will store as the OpenID for his account. He can also,
in the future, add extra OpenIDs to the same account; this tells
StackOverflow that those identities are actually the same person.

Next time Michael goes to StackOverflow, he'll present an OpenID,
StackOverflow will perform the whole OpenID Authentication protocol on
it, and *any* of the information used with that OpenID last time could
have changed. If verification of Michael's OpenID succeeds again,
he'll get access to his StackOverflow account.

I hope that helps you in implementing standard OpenID for PyPI. Thanks
again for working through this.

-- 
 \        “Odious ideas are not entitled to hide from criticism behind |
  `\          the human shield of their believers' feelings.” —Richard |
_o__)                                                         Stallman |
Ben Finney <ben at benfinney.id.au>