[Python-Dev] PEP 383 (again)
v+python at g.nevcal.com
Wed Apr 29 09:49:37 CEST 2009
On approximately 4/29/2009 12:17 AM, came the following characters from
the keyboard of Martin v. Löwis:
>> OK, so you are saying that under PEP 383, utf-8b wouldn't be used
>> anywhere on Windows by default. That's not clear from your proposal.
> You didn't read it carefully enough. The first three paragraphs of
> the "Specification" section make that clear.
Sorry, rereading those paragraphs even with this declaration in mind,
does not make that clear. It is not enough to have a solution that
works; it is necessary to communicate that solution clearly enough that
people understand it. By the huge amount of feedback you have received,
it is clear that either the solution doesn't work, or that it wasn't
The following comments are an attempt to help you make the PEP clear,
based on your above declaration that UTF-8b wouldn't be used on Windows.
I may still be unclear about what you mean, but if you can accept
these enhancements to the PEP, then maybe we are approaching a common
understanding; if not, you should be aware that the PEP still needs
In the first paragraph, you should make it clear that Python 3.0 does
not use the Windows bytes interfaces, if it doesn't. "Python uses
*only* the wide character APIs..." would suffice. As stated, it seems
like Python *does* use the wide character APIs, but leaves open the
possibility that it might use byte APIs also. A short description of
what happens on Windows when Python code uses bytes APIs would also be
In the second paragraph, it speaks of "currently" but then speaks of
using the half-surrogates. I don't believe that happens "currently".
You did change tense, but that paragraph is quite confusing, currently,
because of the tense change. You should describe there, the action that
is currently taken by Python for non-decodable byes, and then in the
next paragraph talk about what the PEP changes.
The 4th paragraph is now confusing too... would it not be the decode
error handler that returns the byte strings, in addition to the Unicode
The 5th paragraph has apparently confused some people into thinking this
PEP only applies to locale's using UTF-8 encodings; you should have an
"else clause" to clear that up, pointing out that the reverse encoding
of half-surrogates by other encodings already produces errors, that
UTF-8 is a special case, not the only case.
The code added to the discussion has mismatched (), making me wonder if
it is complete. There is a reasonable possibility that only the final )
Glenn -- http://nevcal.com/
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
More information about the Python-Dev