[Python-Dev] urllib.quote and unquote - Unicode issues
Scott Dial
scott at scottdial.com
Wed Aug 6 17:05:16 CEST 2008
André Malo wrote:
> * Matt Giuca wrote:
>> We've reached, to quote Guido, "as close as consensus as we can get on
>> this issue".
>
> There are a lot of quotes around. Including "After the most recent flurry of
> discussion I've lost track of what's the right thing to do."
> But I don't talk for other people.
Let's not play the who-can-find-the-best-quote-to-make-their-point game.
Yes, there was a lot of discussion, and yes, it would be difficult to
follow if it wasn't something you were paying much attention to. I
believe (as someone who didn't even participate in the discussion) that
it was clear that:
* quote/unquote should "just work" for strings in 3.x. Meaning that
quote should be str->str and unquote str->str, because *most* uses
of quote/unquote are naive. There is a quite clear proclamation from
the BDFL that quote/unquote should not be bound by pedantic readings
of the RFCs.
* an alternative set of functions that support byte->str and str->byte
should be added to support other use-cases that are less naive. Matt
added unquote_to_bytes/quote_from_bytes to fill this gap. Bill
agreed it was sufficient to satisfy his use-cases (stipulating that
they are a necessary addition if any change should be made at all).
>> There is a bug in Python. I've proposed a working fix, and nobody else
>> has.
>
> Well, you proposed a patch ;-)
> It may fix things, it will break a lot. While this was denied over and over
> again, it's still gonna happen, because the axioms are still not accounting
> for the reality.
I've not read anyone other than Bill come forward saying they had a lot
of code that uses quote/unquote that will be broke. Matt has went
through the stdlib and found most uses consistent with the
"quote/unquote users are naive" statement. I would suggest that the onus
is on you to substantiate this claim that "it will break a lot."
>> I made all the changes the community suggested.
>
> I don't think so.
This short reply is useless. What are those changes? If your problem is
that *your* suggestions were dropped, then I remind you that they *were
discussed*. And, Matt correctly pointed out that setting the defaults to
encoding in latin-1 and decoding in utf-8 would be a nightmare. It would
be much more sane to pick one encoding for both. However, which encoding
it should be is an arguable point. The apparent consensus was that most
people didn't care as long as they could override it.
>> What more needs to be discussed here?
>
> Huh? You feel, the discussion is over?
Can we please avoid discussions about discussion? Arguing about arguing
does not benefit this discussion. If you have a problem with his
proposed patch, then please elaborate on that rather than /just/
complain that it is unsatisfactory in some way.
Do you agree there is a bug? Do you agree it needs to be solved? And,
what about the proposed solution is unsatisfactory?
-Scott
--
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu
More information about the Python-Dev
mailing list