<div dir="ltr">Wow .. a lot of replies today!<br><br><div class="gmail_quote">On Thu, Aug 7, 2008 at 2:09 AM, &quot;Martin v. Löwis&quot; <span dir="ltr">&lt;<a href="mailto:martin@v.loewis.de">martin@v.loewis.de</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">It hasn&#39;t been given priority: There are currently 606 patches in the<br>

tracker, many fixing bugs of some sort. It&#39;s not clear (to me, at least)<br>

why this should be given priority over all the other things such as<br>

interpreter crashes.</blockquote><div><br>Sorry ... when I said &quot;it hasn&#39;t been given priority&quot; I mean &quot;it hasn&#39;t been given <b>a</b> priority&quot; - as in, nobody&#39;s assigned a priority to it, whatever that priority should rightfully be.<br>

&nbsp;</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">We all agree it&#39;s a bug: no, I don&#39;t. I think it&#39;s a missing feature,<br>


at best, but I&#39;m staying out of the discussion. As-is, urllib only<br>

supports ASCII in URLs, and that is fine for most purposes.</blockquote><div><br>Seriously, Mr. L%C3%B6wis, that&#39;s a tremendously na%C3%AFve statement.<br>&nbsp;</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

URLs are just not made for non-ASCII characters. Implement IRIs if you<br>

want non-ASCII characters; the rules are much clearer for these.</blockquote><div><br>Python 3.0 fully supports Unicode. URIs support <i>encoding</i> of arbitrary characters (as of more recent revisions). The difference is that URIs <i>may only consist</i> of ASCII characters (even though they can encode Unicode characters), while IRIs may also consist of Unicode characters. It&#39;s our responsibility to implement URIs here ... IRIs are a separate issue.<br>

<br>Having said this, I&#39;m pretty sure Martin can&#39;t be convinced, so I&#39;ll leave that alone.<br><br>On Thu, Aug 7, 2008 at 3:34 AM, M.-A. Lemburg <span dir="ltr">&lt;<a href="mailto:mal@egenix.com">mal@egenix.com</a>&gt;</span> wrote:<br>

<blockquote style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;" class="gmail_quote">

So unquote() should probably try to decode using UTF-8 first<br></blockquote><blockquote style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;" class="gmail_quote">

and then fall back to Latin-1 if that doesn&#39;t work.</blockquote><div><br>That&#39;s an interesting proposal. I think I don&#39;t like it - for a user application that&#39;s a good policy. But for a programming language library, I think it should not do guesswork. It should use the encoding supplied, and have a single default. But I&#39;d be interested to hear if anyone else wants this.<br>

<br>As-is, it passes &#39;replace&#39; to the errors argument, so encoding errors get replaced by &#39;�&#39; characters.<br><br>OK I haven&#39;t looked at the review yet .. guess it&#39;s off to the tracker :)<br><br>Matt<br>

</div></div></div></div>