urllib.parser.quote() and RFC 2396: unreserved characters get encoded

dieter dieter at handshake.de
Thu Feb 12 01:48:06 EST 2015


Bruno Cauet <brunocauet at gmail.com> writes:
> Unicode characters outside the ASCII range also get encoded when they
> have no reason to, e.g.
>    >>> pathlib.PurePath("/home/싸이/").as_uri()
>    'file:///home/%EC%8B%B8%EC%9D%B4'

Non-ASCII characters are not legal uri characters.
Look at section 2.3 of "http://www.faqs.org/rfcs/rfc2396.html".
You see there "unreserved = alphanum | mark" with with "alphanum"
defined in section 1.6 as the ASCII letters and digits.

See also section 2.1 ("URI and non-ASCII characters"). It tells
that non-ASCII characters should be utf-8 encoded and then uri-escaped.

Thus, the handling (by "urllib")
of non-ASCII unicode characters seems to be correct.




More information about the Python-list mailing list