[Python-Dev] urllib.quote and unquote - Unicode issues
matt.giuca at gmail.com
Thu Aug 7 12:37:39 CEST 2008
Wow .. a lot of replies today!
On Thu, Aug 7, 2008 at 2:09 AM, "Martin v. Löwis" <martin at v.loewis.de>wrote:
> It hasn't been given priority: There are currently 606 patches in the
> tracker, many fixing bugs of some sort. It's not clear (to me, at least)
> why this should be given priority over all the other things such as
> interpreter crashes.
Sorry ... when I said "it hasn't been given priority" I mean "it hasn't been
given *a* priority" - as in, nobody's assigned a priority to it, whatever
that priority should rightfully be.
> We all agree it's a bug: no, I don't. I think it's a missing feature,
> at best, but I'm staying out of the discussion. As-is, urllib only
> supports ASCII in URLs, and that is fine for most purposes.
Seriously, Mr. L%C3%B6wis, that's a tremendously na%C3%AFve statement.
> URLs are just not made for non-ASCII characters. Implement IRIs if you
> want non-ASCII characters; the rules are much clearer for these.
Python 3.0 fully supports Unicode. URIs support *encoding* of arbitrary
characters (as of more recent revisions). The difference is that URIs *may
only consist* of ASCII characters (even though they can encode Unicode
characters), while IRIs may also consist of Unicode characters. It's our
responsibility to implement URIs here ... IRIs are a separate issue.
Having said this, I'm pretty sure Martin can't be convinced, so I'll leave
On Thu, Aug 7, 2008 at 3:34 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> So unquote() should probably try to decode using UTF-8 first
and then fall back to Latin-1 if that doesn't work.
That's an interesting proposal. I think I don't like it - for a user
application that's a good policy. But for a programming language library, I
think it should not do guesswork. It should use the encoding supplied, and
have a single default. But I'd be interested to hear if anyone else wants
As-is, it passes 'replace' to the errors argument, so encoding errors get
replaced by '�' characters.
OK I haven't looked at the review yet .. guess it's off to the tracker :)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev