[Python-Dev] PEP 100 references & wording
M.-A. Lemburg
mal@lemburg.com
Fri, 11 Jan 2002 16:03:27 +0100
Skip Montanaro wrote:
>
> I just noticed that PEP 100 (Python/Unicode integration) references
>
> http://starship.python.net/~lemburg/unicode-proposal.txt
>
> as the latest version. Sure enough, I visited that and found that it's
> newer than the PEP (1.8 v. 1.7).
True. I'm not sure why the above file is 1.8 and the CVS PEP at 1.7.
I guess I forgot to update the PEP.
FYI, here's adiff between the 1.7 and 1.8 version:
--- unicode-proposal-1.7.txt Tue Oct 17 17:38:40 2000
+++ unicode-proposal.txt Tue Oct 17 17:38:40 2000
@@ -1,7 +1,7 @@
=============================================================================
- Python Unicode Integration Proposal Version: 1.7
+ Python Unicode Integration Proposal Version: 1.8
-----------------------------------------------------------------------------
Introduction:
-------------
@@ -612,11 +612,11 @@ Case Conversion:
----------------
Case conversion is rather complicated with Unicode data, since there
are many different conditions to respect. See
- http://www.unicode.org/unicode/reports/tr13/
+ http://www.unicode.org/unicode/reports/tr21/
for some guidelines on implementing case conversion.
For Python, we should only implement the 1-1 conversions included in
Unicode. Locale dependent and other special case conversions (see the
@@ -631,11 +631,15 @@ possible.
Line Breaks:
------------
Line breaking should be done for all Unicode characters having the B
property as well as the combinations CRLF, CR, LF (interpreted in that
-order) and other special line separators defined by the standard.
+order) and other special line separators defined by the standard. See
+
+ http://www.unicode.org/unicode/reports/tr13/
+
+for some guidelines on implementing line breaks and newline handling.
The Unicode type should provide a .splitlines() method which returns a
list of lines according to the above specification. See Unicode
Methods.
@@ -1010,11 +1014,11 @@ Unicode 3.0:
Unicode-TechReports:
http://www.unicode.org/unicode/reports/techreports.html
Unicode-Mappings:
- ftp://ftp.unicode.org/Public/MAPPINGS/
+ http://www.unicode.org/Public/MAPPINGS/
Introduction to Unicode (a little outdated by still nice to read):
http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html
For comparison:
@@ -1047,10 +1051,11 @@ Encodings:
http://www.uazone.com/multiling/unicode/wg2n1035.html
History of this Proposal:
-------------------------
+1.8: Fixed some URLs to the unicode.org site.
1.7: Added note about the changed behaviour of "s#".
1.6: Changed <defencstr> to <defenc> since this is the name used in the
implementation. Added notes about the usage of <defenc> in the
buffer protocol implementation.
1.5: Added notes about setting the <default encoding>. Fixed some
> Shouldn't the PEP be the most up-to-date public document? The comment right
> after that suggests this should be so:
>
> [ed. note: new revisions should be made to this PEP document, while the
> historical record previous to version 1.7 should be retrieved from
> MAL's url, or Misc/unicode.txt]
>
> Since this is now an informational PEP, I believe the wording should change
> to reflect functionality that has already been implemented. For instance,
> instead of
>
> Python should provide a built-in constructor for Unicode strings which
> is available through __builtins__:
>
> it should read
>
> Python provides a built-in constructor for Unicode strings which is
> available through __builtins__:
True again; I just didn't find time to rewrite these bits. The PEP
is basically a reformatted proposal. That's where the "should" wording
originates from.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/