[issue9377] socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names

Wed Jul 28 04:44:44 CEST 2010

STINNER Victor <victor.stinner at haypocalc.com> added the comment:

I like the idea of using the PEP 383 for hostnames, but I don't understand the relation with IDNA (maybe because I don't know this encoding).

+this leaves IDNA ASCII-compatible encodings in ASCII
+form, but converts any non-ASCII bytes in the hostname to the Unicode
+lone surrogate codes U+DC80...U+DCFF.

What is an "IDNA ASCII-compatible encoding"?

--

ascii-surrogateescape.diff: 
 - I don't like unicode_from_hostname() name: "decode_hostname()" would be better.
 - It doesn't patch the doc and so cannot be applied alone. It doesn't matter, it's better to apply both patches at the same time. But thanks to have splitted them, it's easier to review them :-)

try-surrogateescape-first.diff:
 - hostname_to_bytes() should be called "encode_hostname()"
 - if (!PyErr_ExceptionMatches(PyExc_UnicodeError)):  you should catch UnicodeEncodeError here
 - "if this is not possible, :exc:`UnicodeError` is raised.": is it an UnicodeEncodeError?
 - use PyUnicode_AsEncodedString() instead of PyUnicode_AsEncodedObject(): it's faster for ASCII and ensure that the result is a bytes object (so you don't need to re-check the type)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9377>
_______________________________________