[Python-Dev] urllib.quote and unquote - Unicode issues

Wed Aug 6 17:05:16 CEST 2008

André Malo wrote:
> * Matt Giuca wrote: 
>> We've reached, to quote Guido, "as close as consensus as we can get on
>> this issue".
> 
> There are a lot of quotes around. Including "After the most recent flurry of 
> discussion I've lost track of what's the right thing to do."
> But I don't talk for other people.

Let's not play the who-can-find-the-best-quote-to-make-their-point game. 
Yes, there was a lot of discussion, and yes, it would be difficult to 
follow if it wasn't something you were paying much attention to. I 
believe (as someone who didn't even participate in the discussion) that 
it was clear that:

   * quote/unquote should "just work" for strings in 3.x. Meaning that
     quote should be str->str and unquote str->str, because *most* uses
     of quote/unquote are naive. There is a quite clear proclamation from
     the BDFL that quote/unquote should not be bound by pedantic readings
     of the RFCs.

   * an alternative set of functions that support byte->str and str->byte
     should be added to support other use-cases that are less naive. Matt
     added unquote_to_bytes/quote_from_bytes to fill this gap. Bill
     agreed it was sufficient to satisfy his use-cases (stipulating that
     they are a necessary addition if any change should be made at all).

>> There is a bug in Python. I've proposed a working fix, and nobody else
>> has.
> 
> Well, you proposed a patch ;-)
> It may fix things, it will break a lot. While this was denied over and over 
> again, it's still gonna happen, because the axioms are still not accounting 
> for the reality.

I've not read anyone other than Bill come forward saying they had a lot 
of code that uses quote/unquote that will be broke. Matt has went 
through the stdlib and found most uses consistent with the 
"quote/unquote users are naive" statement. I would suggest that the onus 
is on you to substantiate this claim that "it will break a lot."

>> I made all the changes the community suggested. 
> 
> I don't think so.

This short reply is useless. What are those changes? If your problem is 
that *your* suggestions were dropped, then I remind you that they *were 
discussed*. And, Matt correctly pointed out that setting the defaults to 
encoding in latin-1 and decoding in utf-8 would be a nightmare. It would 
be much more sane to pick one encoding for both. However, which encoding 
it should be is an arguable point. The apparent consensus was that most 
people didn't care as long as they could override it.

>> What more needs to be discussed here?
> 
> Huh? You feel, the discussion is over?

Can we please avoid discussions about discussion? Arguing about arguing 
does not benefit this discussion. If you have a problem with his 
proposed patch, then please elaborate on that rather than /just/ 
complain that it is unsatisfactory in some way.

Do you agree there is a bug? Do you agree it needs to be solved? And, 
what about the proposed solution is unsatisfactory?

-Scott

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu