<div dir="ltr">Wow .. a lot of replies today!<br><br><div class="gmail_quote">On Thu, Aug 7, 2008 at 2:09 AM, "Martin v. Löwis" <span dir="ltr"><<a href="mailto:martin@v.loewis.de">martin@v.loewis.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">It hasn't been given priority: There are currently 606 patches in the<br>
tracker, many fixing bugs of some sort. It's not clear (to me, at least)<br>
why this should be given priority over all the other things such as<br>
interpreter crashes.</blockquote><div><br>Sorry ... when I said "it hasn't been given priority" I mean "it hasn't been given <b>a</b> priority" - as in, nobody's assigned a priority to it, whatever that priority should rightfully be.<br>
</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">We all agree it's a bug: no, I don't. I think it's a missing feature,<br>
at best, but I'm staying out of the discussion. As-is, urllib only<br>
supports ASCII in URLs, and that is fine for most purposes.</blockquote><div><br>Seriously, Mr. L%C3%B6wis, that's a tremendously na%C3%AFve statement.<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
URLs are just not made for non-ASCII characters. Implement IRIs if you<br>
want non-ASCII characters; the rules are much clearer for these.</blockquote><div><br>Python 3.0 fully supports Unicode. URIs support <i>encoding</i> of arbitrary characters (as of more recent revisions). The difference is that URIs <i>may only consist</i> of ASCII characters (even though they can encode Unicode characters), while IRIs may also consist of Unicode characters. It's our responsibility to implement URIs here ... IRIs are a separate issue.<br>
<br>Having said this, I'm pretty sure Martin can't be convinced, so I'll leave that alone.<br><br>On Thu, Aug 7, 2008 at 3:34 AM, M.-A. Lemburg <span dir="ltr"><<a href="mailto:mal@egenix.com">mal@egenix.com</a>></span> wrote:<br>
<blockquote style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;" class="gmail_quote">
So unquote() should probably try to decode using UTF-8 first<br></blockquote><blockquote style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;" class="gmail_quote">
and then fall back to Latin-1 if that doesn't work.</blockquote><div><br>That's an interesting proposal. I think I don't like it - for a user application that's a good policy. But for a programming language library, I think it should not do guesswork. It should use the encoding supplied, and have a single default. But I'd be interested to hear if anyone else wants this.<br>
<br>As-is, it passes 'replace' to the errors argument, so encoding errors get replaced by '�' characters.<br><br>OK I haven't looked at the review yet .. guess it's off to the tracker :)<br><br>Matt<br>
</div></div></div></div>