[Python-Dev] Internationalizing domain names

Martin v. L÷wis martin@v.loewis.de
Sat, 8 Mar 2003 12:43:22 +0100

IETF has recently published a series of RFCs to support non-ASCII
characters in domain names. This is called IDNA, Internationalizing
domain names in applications. It works by applications converting
Unicode domain names into ASCII ones (using an ACE, ASCII compatible
encoding), which are then send to the DNS.

I have implemented this technology for Python, and would like to see
it included in Python 2.3. It consists of the following pieces:
- Tools/unicode/mkstringprep.py, which generates Lib/stringprep.py
  from the source of RFC 3454,
- Lib/encodings/punycode.py, patch 632643, implementing RFC 3492,
- Lib/encodings/idna.py, implementing both RFC 3493 (nameprep)
  and RFC 3490 (idna)
- modifications to the socket module, to accept Unicode for host
  names, and convert it using IDNA.
- various test cases

Changes to httplib, ftplib, etc are not necessary, as they just pass
the host names through to the socket calls.

I have no changes to the urllib* modules, as the work on IRIs
(internationalized resource identifiers) is still in progress. As the
result, if one puts non-ASCII into just the hostname part of an URL,
urllib will do the right thing; urllib2 will complain about the
non-ASCII characters.

Would anybody like to review these changes?