[Python-Dev] urllib.quote and unquote - Unicode issues
Bill Janssen
janssen at parc.com
Wed Jul 30 18:52:26 CEST 2008
> On Wed, Jul 30, 2008 at 8:09 AM, André Malo <nd at perlig.de> wrote:
> > I'm actually in favour of encoding bytes only back and forth. A useful
> > extension would be *another* function which wraps quote/unquote and encod=
> es
> > and decodes characters.
>
> I'd reverse this. By all means, add a new pair of functions that is
> bytes in / bytes out. But keep the existing functions purely string in
> / string out, hardcoded to UTF-8. People wanting another encoding can
> use the bytes functions and explicit encode / decode calls.
Actually (as I pointed out before) the existing functions are not
string-in/string-out. They are something-in and bytes-out. just look
like string-in/string-out because of the confusion between byte
strings and Unicode strings in Python 1 and 2.
Look, Matt's suggestion is a degradation of the integrity of the
stdlib, because it enthrones a broken understanding, a misreading of
the RFC, in a very prominent place. I'd prefer not to have Python
contribute to that breakage. Keep the functions the way they are now:
bytes-in and bytes-out.
Bill
More information about the Python-Dev
mailing list