urllib.quote fails on Unicode URL
John Nagle
nagle at animats.com
Fri May 4 02:15:28 EDT 2007
The code in urllib.quote fails on Unicode input, when
called by robotparser.
That bit of code needs some attention.
- It still assumes ASCII goes up to 255, which hasn't been true in Python
for a while now.
- The initialization may not be thread-safe; a table is being initialized
on first use. The code is too clever and uncommented.
"robotparser" was trying to check if a URL,
"http://www.highbeam.com/DynamicContent/%E2%80%9D/mysaved/privacyPref.asp%22"
could be accessed, and there are some wierd characters in there. Unicode
URLs are legal, so this is a real bug.
Logged in as Bug #1712522.
John Nagle
More information about the Python-list
mailing list