[Python-Dev] [ssl] The weird case of IDNA

Nathaniel Smith njs at pobox.com
Sun Dec 31 01:13:10 EST 2017


On Sat, Dec 30, 2017 at 7:26 AM, Stephen J. Turnbull
<turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
> Christian Heimes writes:
>  > Questions:
>  > - Is everybody OK with breaking backwards compatibility? The risk is
>  > small. ASCII-only domains are not affected
>
> That's not quite true, as your German example shows.  In some Oriental
> renderings it is impossible to distinguish halfwidth digits from
> full-width ones as the same glyphs are used.  (This occasionally
> happens with other ASCII characters, but users are more fussy about
> digits lining up.)  That is, while technically ASCII-only domain names
> are not affected, users of ASCII-only domain names are potentially
> vulnerable to confusable names when IDNA is introduced.  (Hopefully
> the Asian registrars are as woke as the German ones!  But you could
> still register a .com containing full-width digits or letters.)

This particular example isn't an issue: in IDNA encoding, full-width
and half-width digits are normalized together, so number1.com and
number1.com actually refer to the same domain name. This is true in
both the 2003 and 2008 versions:

# IDNA 2003
In [7]: "number\uff11.com".encode("idna")
Out[7]: b'number1.com'

# IDNA 2008 (using the 'idna' package from pypi)
In [8]: idna.encode("number\uff11.com", uts46=True)
Out[8]: b'number1.com'

That said, IDNA does still allow for a bunch of spoofing opportunities
that aren't possible with pure ASCII, and this requires some care:
https://unicode.org/faq/idn.html#16

This is mostly a UI issue, though; there's not much that the socket or
ssl modules can do to help here.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Python-Dev mailing list