Unexpected exception from socket.getaddrinfo on Unicode URL

Steve Holden steve at holdenweb.com
Sat Apr 21 16:44:21 CEST 2007

John Nagle wrote:
>      Here's a strange little bug.  "socket.getaddrinfo" blows up
> if given a bad domain name containing ".." in Unicode.  The
> same string in ASCII produces the correct "gaierror" exception.
>      Actually, this deserves a documentation mention.  The "socket" module,
> given a Unicode string, calls the International Domain Name parser,
> "idna.py", which has a a whole error system of its own.  The IDNA
> documentation says that "Furthermore, the socket module transparently converts 
> Unicode host names to ACE, so that applications need not be concerned about 
> converting host names themselves when they pass them to the socket module."
> However, that's not quite true; the IDNA rules say that syntax errors must
> be treated as errors, so you have to be prepared for IDNA exceptions.
> They are all "UnicodeError" exceptions.
>      It's worth a mention in the documentation for "socket".
> 					John Nagle
> D:\>/python25/python.exe
> Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win
> 32
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> ss = 'www.gallery84..com'
>  >>> uss = unicode(ss)
>  >>> import socket
>  >>> socket.getaddrinfo(ss,"http")
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> socket.gaierror: (11001, 'getaddrinfo failed')
>  >>> socket.getaddrinfo(uss,"http")
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>    File "D:\python25\lib\encodings\idna.py", line 164, in encode
>      result.append(ToASCII(label))
>    File "D:\python25\lib\encodings\idna.py", line 73, in ToASCII
>      raise UnicodeError("label empty or too long")
> UnicodeError: label empty or too long
>  >>>
I took a look at the documentation but couldn't see where to add what, 
given that the documentation for socket already says:

"""All errors raise exceptions. The normal exceptions for invalid 
argument types and out-of-memory conditions can be raised; errors 
related to socket or address semantics raise the error socket.error.

Do we really need to specifically mention Unicode errors?

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb     http://del.icio.us/steve.holden
Recent Ramblings       http://holdenweb.blogspot.com

More information about the Python-list mailing list