[Python-checkins] CVS: python/nondist/peps pep-0263.txt,1.6,1.7
M.-A. Lemburg
lemburg@users.sourceforge.net
Wed, 27 Feb 2002 03:07:18 -0800
Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv5930
Modified Files:
pep-0263.txt
Log Message:
Changes regarding the default encoding and other minor tweaks.
See history for details.
Index: pep-0263.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0263.txt,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** pep-0263.txt 26 Feb 2002 20:26:07 -0000 1.6
--- pep-0263.txt 27 Feb 2002 11:07:16 -0000 1.7
***************
*** 40,45 ****
Defining the Encoding
! Python will default to Latin-1 as standard encoding if no other
! encoding hints are given.
To define a source code encoding, a magic comment must
--- 40,47 ----
Defining the Encoding
! Just as in coercion of strings to Unicode, Python will default to
! the interpreter's default encoding (which is ASCII in standard
! Python installations) as standard encoding if no other encoding
! hints are given.
To define a source code encoding, a magic comment must
***************
*** 50,53 ****
--- 52,60 ----
# -*- coding: <encoding name> -*-
+ More precise, the first or second line must match the regular
+ expression "coding[:=]\s*([\w-_]+)". The first group of this
+ expression is then interpreted as encoding name. If the encoding
+ is unknown to Python, an error is raised during compilation.
+
To aid with platforms such as Windows, which add Unicode BOM marks
to the beginning of Unicode files, the UTF-8 signature
***************
*** 67,71 ****
Embedding of differently encoded data is not allowed and will
result in a decoding error during compilation of the Python
! source code.
Only ASCII compatible encodings are allowed as source code
--- 74,78 ----
Embedding of differently encoded data is not allowed and will
result in a decoding error during compilation of the Python
! source code.
Only ASCII compatible encodings are allowed as source code
***************
*** 102,115 ****
subset of the encoding.
- For backwards compatibility, the implementation must assume
- Latin-1 as the original file encoding if not given (otherwise,
- binary data currently stored in 8-bit strings wouldn't make the
- roundtrip).
-
Implementation
Since changing the Python tokenizer/parser combination will
! require major changes in the internals of the interpreter, the
! proposed solution should be implemented in two phases:
1. Implement the magic comment detection and default encoding
--- 109,120 ----
subset of the encoding.
Implementation
Since changing the Python tokenizer/parser combination will
! require major changes in the internals of the interpreter and
! enforcing the use of magic comments in source code files which
! place non-default encoding characters in string literals, comments
! and Unicode literals, the proposed solution should be implemented
! in two phases:
1. Implement the magic comment detection and default encoding
***************
*** 117,133 ****
literals in the source file.
2. Change the tokenizer/compiler base string type from char* to
Py_UNICODE* and apply the encoding to the complete file.
Scope
! This PEP only affects Python source code which makes use of the
! proposed magic comment. Without the magic comment in the proposed
! position, Python will treat the source file as it does currently
! (using the Latin-1 encoding assumption) to maintain backwards
! compatibility.
History
1.3: Worked in comments by Martin v. Loewis:
UTF-8 BOM mark detection, Emacs style magic comment,
--- 122,153 ----
literals in the source file.
+ In addition to this step and to aid in the transition to
+ explicit encoding declaration, the tokenizer must check the
+ complete source file for compliance with the default encoding
+ (which usually is ASCII). If the source file does not properly
+ decode, a single warning is generated per file.
+
2. Change the tokenizer/compiler base string type from char* to
Py_UNICODE* and apply the encoding to the complete file.
+ Source files which fail to decode cause an error to be raised
+ during compilation.
+
+ The builtin compile() API will be enhanced to accept Unicode as
+ input. 8-bit string input is subject to the standard procedure
+ for encoding detection as decsribed above.
+
Scope
! This PEP intends to provide an upgrade path from th current
! (more-or-less) undefined source code encoding situation to a more
! robust and portable definition.
History
+ 1.7: Added warnings to phase 1 implementation. Replaced the
+ Latin-1 default encoding with the interpreter's default
+ encoding. Added tweaks to compile().
+ 1.4 - 1.6: Minor tweaks
1.3: Worked in comments by Martin v. Loewis:
UTF-8 BOM mark detection, Emacs style magic comment,
***************
*** 138,146 ****
This document has been placed in the public domain.
-
Local Variables:
mode: indented-text
indent-tabs-mode: nil
- fill-column: 70
End:
--- 158,164 ----