[Python-Dev] PEP 383 (again)

Wed Apr 29 09:49:37 CEST 2009

On approximately 4/29/2009 12:17 AM, came the following characters from 
the keyboard of Martin v. Löwis:
>> OK, so you are saying that under PEP 383, utf-8b wouldn't be used
>> anywhere on Windows by default.  That's not clear from your proposal.
> 
> You didn't read it carefully enough. The first three paragraphs of
> the "Specification" section make that clear.

Sorry, rereading those paragraphs even with this declaration in mind, 
does not make that clear.  It is not enough to have a solution that 
works; it is necessary to communicate that solution clearly enough that 
people understand it.  By the huge amount of feedback you have received, 
it is clear that either the solution doesn't work, or that it wasn't 
communicated clearly.

The following comments are an attempt to help you make the PEP clear, 
based on your above declaration that UTF-8b wouldn't be used on Windows. 
  I may still be unclear about what you mean, but if you can accept 
these enhancements to the PEP, then maybe we are approaching a common 
understanding; if not, you should be aware that the PEP still needs 
clarification.

In the first paragraph, you should make it clear that Python 3.0 does 
not use the Windows bytes interfaces, if it doesn't.  "Python uses 
*only* the wide character APIs..." would suffice.  As stated, it seems 
like Python *does* use the wide character APIs, but leaves open the 
possibility that it might use byte APIs also.  A short description of 
what happens on Windows when Python code uses bytes APIs would also be 
helpful.

In the second paragraph, it speaks of "currently" but then speaks of 
using the half-surrogates.  I don't believe that happens "currently". 
You did change tense, but that paragraph is quite confusing, currently, 
because of the tense change.  You should describe there, the action that 
is currently taken by Python for non-decodable byes, and then in the 
next paragraph talk about what the PEP changes.

The 4th paragraph is now confusing too... would it not be the decode 
error handler that returns the byte strings, in addition to the Unicode 
strings?

The 5th paragraph has apparently confused some people into thinking this 
PEP only applies to locale's using UTF-8 encodings; you should have an 
"else clause" to clear that up, pointing out that the reverse encoding 
of half-surrogates by other encodings already produces errors, that 
UTF-8 is a special case, not the only case.

The code added to the discussion has mismatched (), making me wonder if 
it is complete.  There is a reasonable possibility that only the final ) 
is missing.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking