Skip Montanaro wrote:
I just noticed that PEP 100 (Python/Unicode integration) references
http://starship.python.net/~lemburg/unicode-proposal.txt
as the latest version. Sure enough, I visited that and found that it's newer than the PEP (1.8 v. 1.7).
True. I'm not sure why the above file is 1.8 and the CVS PEP at 1.7. I guess I forgot to update the PEP. FYI, here's adiff between the 1.7 and 1.8 version: --- unicode-proposal-1.7.txt Tue Oct 17 17:38:40 2000 +++ unicode-proposal.txt Tue Oct 17 17:38:40 2000 @@ -1,7 +1,7 @@ ============================================================================= - Python Unicode Integration Proposal Version: 1.7 + Python Unicode Integration Proposal Version: 1.8 ----------------------------------------------------------------------------- Introduction: ------------- @@ -612,11 +612,11 @@ Case Conversion: ---------------- Case conversion is rather complicated with Unicode data, since there are many different conditions to respect. See - http://www.unicode.org/unicode/reports/tr13/ + http://www.unicode.org/unicode/reports/tr21/ for some guidelines on implementing case conversion. For Python, we should only implement the 1-1 conversions included in Unicode. Locale dependent and other special case conversions (see the @@ -631,11 +631,15 @@ possible. Line Breaks: ------------ Line breaking should be done for all Unicode characters having the B property as well as the combinations CRLF, CR, LF (interpreted in that -order) and other special line separators defined by the standard. +order) and other special line separators defined by the standard. See + + http://www.unicode.org/unicode/reports/tr13/ + +for some guidelines on implementing line breaks and newline handling. The Unicode type should provide a .splitlines() method which returns a list of lines according to the above specification. See Unicode Methods. @@ -1010,11 +1014,11 @@ Unicode 3.0: Unicode-TechReports: http://www.unicode.org/unicode/reports/techreports.html Unicode-Mappings: - ftp://ftp.unicode.org/Public/MAPPINGS/ + http://www.unicode.org/Public/MAPPINGS/ Introduction to Unicode (a little outdated by still nice to read): http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html For comparison: @@ -1047,10 +1051,11 @@ Encodings: http://www.uazone.com/multiling/unicode/wg2n1035.html History of this Proposal: ------------------------- +1.8: Fixed some URLs to the unicode.org site. 1.7: Added note about the changed behaviour of "s#". 1.6: Changed <defencstr> to <defenc> since this is the name used in the implementation. Added notes about the usage of <defenc> in the buffer protocol implementation. 1.5: Added notes about setting the <default encoding>. Fixed some
Shouldn't the PEP be the most up-to-date public document? The comment right after that suggests this should be so:
[ed. note: new revisions should be made to this PEP document, while the historical record previous to version 1.7 should be retrieved from MAL's url, or Misc/unicode.txt]
Since this is now an informational PEP, I believe the wording should change to reflect functionality that has already been implemented. For instance, instead of
Python should provide a built-in constructor for Unicode strings which is available through __builtins__:
it should read
Python provides a built-in constructor for Unicode strings which is available through __builtins__:
True again; I just didn't find time to rewrite these bits. The PEP is basically a reformatted proposal. That's where the "should" wording originates from. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/