[Python-Dev] urllib.quote and unquote - Unicode issues

Matt Giuca matt.giuca at gmail.com
Thu Jul 31 19:19:57 CEST 2008


> so you can use quote_from_bytes on strings?

Yes, currently.

> I assumed Guido meant it was okay to have quote accept string/byte input and have a function that was redundant but limited in what it accepted (i.e. quote_from_bytes accepts only bytes)
>
> I suppose your implementation doesn't break anything... it just strikes me as "odd"

Yeah. I get exactly what you mean. Worse is it takes an
encoding/replace argument.

I'm in two minds about whether it should allow this or not. On one
hand, it kind of goes with the Python philosophy of not artificially
restricting the allowed types. And it avoids redundancy in the code.

But I'd be quite happy to let quote_from_bytes restrict its input to
just bytes, to avoid confusion. It would basically be a
slightly-modified version of quote:

def quote_from_bytes(s, safe = '/'):
    if isinstance(safe, str):
        safe = safe.encode('ascii', 'ignore')
    cachekey = (safe, always_safe)
    if not isinstance(s, bytes) or isinstance(s, bytearray):
        raise TypeError("quote_from_bytes() expected a bytes")
    try:
        quoter = _safe_quoters[cachekey]
    except KeyError:
        quoter = Quoter(safe)
        _safe_quoters[cachekey] = quoter
    res = map(quoter, s)
    return ''.join(res)

(Passes test suite).

I think I'm happier with this option. But the "if not isinstance(s,
bytes) or isinstance(s, bytearray)" is not very nice.
(The only difference to quote besides the missing arguments is the two
lines beginning "if not isinstance". Maybe we can generalise the rest
of the function).


More information about the Python-Dev mailing list