Unexpected exception from socket.getaddrinfo on Unicode URL
Steve Holden
steve at holdenweb.com
Sat Apr 21 10:44:21 EDT 2007
John Nagle wrote:
> Here's a strange little bug. "socket.getaddrinfo" blows up
> if given a bad domain name containing ".." in Unicode. The
> same string in ASCII produces the correct "gaierror" exception.
>
> Actually, this deserves a documentation mention. The "socket" module,
> given a Unicode string, calls the International Domain Name parser,
> "idna.py", which has a a whole error system of its own. The IDNA
> documentation says that "Furthermore, the socket module transparently converts
> Unicode host names to ACE, so that applications need not be concerned about
> converting host names themselves when they pass them to the socket module."
> However, that's not quite true; the IDNA rules say that syntax errors must
> be treated as errors, so you have to be prepared for IDNA exceptions.
> They are all "UnicodeError" exceptions.
>
> It's worth a mention in the documentation for "socket".
>
> John Nagle
>
> D:\>/python25/python.exe
> Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win
> 32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> ss = 'www.gallery84..com'
> >>> uss = unicode(ss)
> >>> import socket
> >>> socket.getaddrinfo(ss,"http")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> socket.gaierror: (11001, 'getaddrinfo failed')
> >>> socket.getaddrinfo(uss,"http")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "D:\python25\lib\encodings\idna.py", line 164, in encode
> result.append(ToASCII(label))
> File "D:\python25\lib\encodings\idna.py", line 73, in ToASCII
> raise UnicodeError("label empty or too long")
> UnicodeError: label empty or too long
> >>>
>
I took a look at the documentation but couldn't see where to add what,
given that the documentation for socket already says:
"""All errors raise exceptions. The normal exceptions for invalid
argument types and out-of-memory conditions can be raised; errors
related to socket or address semantics raise the error socket.error.
""".
Do we really need to specifically mention Unicode errors?
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com
More information about the Python-list
mailing list