[Python-Dev] [ssl] The weird case of IDNA

Nathaniel Smith njs at pobox.com
Sun Dec 31 02:27:04 EST 2017


On Sat, Dec 30, 2017 at 2:28 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Fri, 29 Dec 2017 21:54:46 +0100
> Christian Heimes <christian at python.org> wrote:
>>
>> On the other hand ssl module is currently completely broken. It converts
>> hostnames from bytes to text with 'idna' codec in some places, but not
>> in all. The SSLSocket.server_hostname attribute and callback function
>> SSLContext.set_servername_callback() are decoded as U-label.
>> Certificate's common name and subject alternative name fields are not
>> decoded and therefore A-labels. The *must* stay A-labels because
>> hostname verification is only defined in terms of A-labels. We even had
>> a security issue once, because partial wildcard like 'xn*.example.org'
>> must not match IDN hosts like 'xn--bcher-kva.example.org'.
>>
>> In issue [2] and PR [3], we all agreed that the only sensible fix is to
>> make 'SSLContext.server_hostname' an ASCII text A-label.
>
> What are the changes in API terms?  If I'm calling wrap_socket(), can I
> pass `server_hostname='straße'` and it will IDNA-encode it?  Or do I
> have to encode it myself?  If the latter, it seems like we are putting
> the burden of protocol compliance on users.

Part of what makes this confusing is that there are actually three
intertwined issues here. (Also, anything that deals with Unicode *or*
SSL/TLS is automatically confusing, and this is about both!)

Issue 1: Python's built-in IDNA implementation is wrong (implements
IDNA 2003, not IDNA 2008).
Issue 2: The ssl module insists on using Python's built-in IDNA
implementation whether you want it to or not.
Issue 3: Also, the ssl module has a separate bug that means
client-side cert validation has never worked for any IDNA domain.

Issue 1 is potentially a security issue, because it means that in a
small number of cases, Python will misinterpret a domain name. IDNA
2003 and IDNA 2008 are very similar, but there are 4 characters that
are interpreted differently, with ß being one of them. Fixing this
though is a big job, and doesn't exactly have anything to do with the
ssl module -- for example, socket.getaddrinfo("straße.de", 80) and
sock.connect("straße.de", 80) also do the wrong thing. Christian's not
proposing to fix this here. It's issues 2 and 3 that he's proposing to
fix.

Issue 2 is a problem because it makes it impossible to work around
issue 1, even for users who know what they're doing. In the socket
module, you can avoid Python's automagical IDNA handling by doing it
manually, and then calling socket.getaddrinfo("strasse.de", 80) or
socket.getaddrinfo("xn--strae-oqa.de", 80), whichever you prefer. In
the ssl module, this doesn't work. There are two places where ssl uses
hostnames. In client mode, the user specifies the server_hostname that
they want to see a certificate for, and then the module runs this
through Python's IDNA machinery *even if* it's already properly
encoded in ascii. And in server mode, when the user has specified an
SNI callback so they can find out which certificate an incoming client
connection is looking for, the module runs the incoming name through
Python's IDNA machinery before handing it to user code. In both cases,
the right thing to do would be to just pass through the ascii A-label
versions, so savvy users can do whatever they want with them. (This
also matches the general design principle around IDNA, which assumes
that the pretty unicode U-labels are used only for UI purposes, and
everything internal uses A-labels.)

Issue 3 is just a silly bug that needs to be fixed, but it's tangled
up here because the fix is the same as for Issue 2: the reason
client-side cert validation has never worked is that we've been taking
the A-label from the server's certificate and checking if it matches
the U-label we expect, and of course it never does because we're
comparing strings in different encodings. If we consistently converted
everything to A-labels as soon as possible and kept it that way, then
this bug would never have happened.

What makes it tricky is that on both the client and the server, fixing
this is actually user-visible.

On the client, checking sslsock.server_hostname used to always show a
U-label, but if we stop using U-labels internally then this doesn't
make sense. Fortunately, since this case has never worked at all,
fixing it shouldn't cause any problems.

On the server, the obvious fix would be to start passing
A-label-encoded names to the servername_callback, instead of
U-label-encoded names. Unfortunately, this is a bit trickier, because
this *has* historically worked (AFAIK) for IDNA names, so long as they
didn't use one of the four magic characters who changed meaning
between IDNA 2003 and IDNA 2008. But we do still need to do something.
For example, right now, it's impossible to use the ssl module to
implement a web server at https://straße.de, because incoming
connections will use SNI to say that they expect a cert for
"xn--strae-oqa.de", and then the ssl module will freak out and throw
an exception instead of invoking the servername callback.

It's ugly, but probably the simplest thing is to add a new function
like set_servername_callback2 that uses the A-label, and then redefine
set_servername_callback as a deprecated compatibility shim:

def set_servername_callback(self, cb):
    def shim_cb(sslobj, servername, sslctx):
        if servername is not None:
            servername = servername.encode("ascii").decode("idna")
        return cb(sslobj, servername, sslctx)
    self.set_servername_callback2(shim_cb)

We can bikeshed what the new name should be. Maybe set_sni_callback?
or set_server_hostname_callback, since the corresponding client-mode
argument is server_hostname?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Python-Dev mailing list