[Catalog-sig] OpenID login to PyPI

Mon Nov 16 00:06:32 CET 2009

Martin von Löwis writes:
> Glyph Lefkowitz wrote:
> > Why do you want [PyPI] to support OpenID only for a very small set
> > of providers?

> There are two reasons, one more technical than the other:

> 1. I read that there are numerous complaints about interoperability:
> […] In particular, I would prefer to restrict attention to OpenID 2,
> which I understand now.

I think that would be quite reasonable; OpenID 2.0 is stable and fixes
several small (and not-so-small) glitches that 1.1 had. The
specification requires “OpenID Authentication 1.1 Compatibility”
<URL:http://openid.net/specs/openid-authentication-2_0.html#compat_mode>,
which is all you need to care about for OpenID 1.1.

> 2. I also want the provider to provide me with a verified email
> address,

You can get the user's chosen email address during registration, *after*
authentication. The OpenID Simple Registration Extension (SReg)
<URL:http://openid.net/specs/openid-simple-registration-extension-1_0.html>
is designed for exactly that: getting details from the user at account
registration time, after verifying that they own the OpenID they
presented.

> But you *can* get the so that a) users can start using the account
> right away, rather than having to perform an address verification
> first

The verification, though, should be done by PyPI. You shouldn't be
trusting the OpenID provider to verify an email address, that's not its
job (and, as you rightly point out, you can't trust an arbitrary OpenID
provider to do so).

Of course, if during registration it happens that you *do* trust the
provider to verify the user's email address, then that's up to you. But
this should not limit the providers from which you accept
*authentication* of an OpenID.

> and b) to simplify the control flow - it's called a "provider" because
> it provides something to *me* also, as a relying party, something that
> I want to rely on (rather than having to verify it myself).

What is provided to you is the process of verifying the user's ownership
of their chosen identifier.

> If I can't rely on the data that the provider provides, I gain nothing
> by accepting OpenID. Instead, I likely open up the service to spammers
> (which will recognize the openid login, and automatically create
> accounts).

That's exactly the reason why you shouldn't *trust* the data, beyond
“this is what the user wants me to know about them”. User-generated data
is still only as trustworthy as it ever was; using OpenID doesn't change
that, since all that data *about* the user from the OpenID provider is
generated by the user at some previous time.

> What I do rely on is a verified email address. I need to be able to
> contact package owners in case there is a problem with the package
> (which happens fairly often, given the large number of packages that
> are now indexed).

This is such a common requirement that most OpenID providers will store
common registration fields input once by the user, and present them
(with the user's permission) to the relying party using OpenID Simple
Registration. It's “Simple Registration” because the user doesn't have
to keep typing it in every time he registers at a new site, but merely
give permission for it to be handed out to that new site.

> Your specific complaint ("I want to use a trusted provider, but still
> want to type my openid, because that's what I always do") is actually
> new, so I'm trying to understand it better.

I hope this is all starting to make it clearer: the OpenID *identifier*
(“what is my OpenID”) is intentionally decoupled from the OpenID
*provider*.

You don't have to trust the provider to do anything but verify that
identifier is controlled by the person presenting it. Just as you trust
the email verification procedure to verify that the user controls the
email address they specify.

> See above. If there was a legitimate use case for this UI (other than
> tradition), I could reluctantly add it to some place where it does no
> harm. I would then need to elaborate the reasoning as to why I
> consider verified email address as an important property.

With OpenID, registration should happen *after* authentication: Since
the user needs to submit their OpenID before you can check whether it
corresponds to an existing account, you might as well authenticate them
*before* bothering with checking if there's an existing account.

Here it is in detail:

  * Initiation: user is asked for their OpenID, Michael types in
    ‘voidspace.org.uk’.

  * Normalisation, then Discovery: after which PyPI knows the Claimed
    Identifier is ‘http://www.voidspace.org.uk/’ and where to go *this
    time* to verify that claim.

  * Authentication: the provider verifies Michael's claim; it then sends
    the result directly to PyPI.

  * Assuming Michael is authenticated, PyPI now knows Michael controls
    that identifier, and will use it to find his account.

If there's no account associated with ‘http://www.voidspace.org.uk/’,
*then* PyPI should begin Registration for a new account:

  * OpenID SReg is used to request the common parts (‘nickname’,
    ‘fullname’, ‘email’, ‘language’, ‘timezone’, etc.). I don't believe
    PyPI needs other information from the user than what is available
    with SReg, but in case it's needed, OpenID Attribute Exchange
    <URL:http://openid.net/specs/openid-attribute-exchange-1_0.html> can
    be used to request other arbitrary information about the user.

  * Any information thus provided can be used at PyPI to pre-populate
    the registration form, or (if all the information is successfully
    gathered) to skip the form altogether and automatically set up the
    user's new account.

So, none of this replaces the need to gather and keep information about
the user's account on PyPI, nor to do verification of any user-provided
information that PyPI needs to use in a trusted manner. What it does do
is ensure that the user can re-use their *existing* answers to the
common questions, answered a zillion times before ever coming to PyPI.

-- 
 \           “A thorough reading and understanding of the Bible is the |
  `\                           surest path to atheism.” —Donald Morgan |
_o__)                                                                  |
Ben Finney