[issue3300] urllib.quote and unquote - Unicode issues

Jim Jewett report at bugs.python.org
Wed Aug 6 23:03:12 CEST 2008


Jim Jewett <jimjjewett at users.sourceforge.net> added the comment:

Matt pointed out that the email package assumes Latin-1 rather than UTF-8; I 
assume Bill could patch his patch the same way Matt did, and this would 
resolve the email tests.  (Unless you pronounce to stick with Latin-1)

The cookiejar failure probably has the same root cause; that test is 
encoding (non-ASCII) Latin-1 characters, and urllib.parse.py/Quoter assumes 
Latin-1.

So I see some evidence (probably not enough) for sticking with Latin-1 
instead of UTF-8.  But I don't see any evidence that fixing the semantics 
(encoded results should be bytes) at the same time made the conversion any 
more painful.  

On the other hand, Matt shows that some of those extra str->byte code 
changes might never need to be done at all, except for purity.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3300>
_______________________________________


More information about the Python-bugs-list mailing list