[Python-checkins] peps: Some typo fixes in PEP 3138; also add variables footer.

Sun Sep 30 08:56:39 CEST 2012

http://hg.python.org/peps/rev/d8b05eefbf45
changeset:   4526:d8b05eefbf45
user:        Georg Brandl <georg at python.org>
date:        Sun Sep 30 08:55:27 2012 +0200
summary:
  Some typo fixes in PEP 3138; also add variables footer.

files:
  pep-3138.txt |  273 ++++++++++++++++++++------------------
  1 files changed, 146 insertions(+), 127 deletions(-)

diff --git a/pep-3138.txt b/pep-3138.txt
--- a/pep-3138.txt
+++ b/pep-3138.txt
@@ -13,11 +13,11 @@
 Abstract
 ========
 
-This PEP proposes a new string representation form for Python 3000. In
-Python prior to Python 3000, the repr() built-in function converted
-arbitrary objects to printable ASCII strings for debugging and logging.
-For Python 3000, a wider range of characters, based on the Unicode
-standard, should be considered 'printable'.
+This PEP proposes a new string representation form for Python 3000.
+In Python prior to Python 3000, the repr() built-in function converted
+arbitrary objects to printable ASCII strings for debugging and
+logging.  For Python 3000, a wider range of characters, based on the
+Unicode standard, should be considered 'printable'.
 
 
 Motivation
@@ -28,8 +28,8 @@
 
 - Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
 
-- Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII
-  characters(>=0x80) to '\\xXX'.
+- Convert other non-printable characters(0x00-0x1f, 0x7f) and
+  non-ASCII characters (>= 0x80) to '\\xXX'.
 
 - Backslash-escape quote characters (apostrophe, ') and add the quote
   character at the beginning and the end.
@@ -39,177 +39,184 @@
 - Convert leading surrogate pair characters without trailing character
   (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
 
-- Convert 16-bit characters(>=0x100) to '\\uXXXX'.
+- Convert 16-bit characters (>= 0x100) to '\\uXXXX'.
 
-- Convert 21-bit characters(>=0x10000) and surrogate pair characters to
-  '\\U00xxxxxx'.
+- Convert 21-bit characters (>= 0x10000) and surrogate pair characters
+  to '\\U00xxxxxx'.
 
 This algorithm converts any string to printable ASCII, and repr() is
 used as a handy and safe way to print strings for debugging or for
-logging. Although all non-ASCII characters are escaped, this does not
-matter when most of the string's characters are ASCII. But for other
+logging.  Although all non-ASCII characters are escaped, this does not
+matter when most of the string's characters are ASCII.  But for other
 languages, such as Japanese where most characters in a string are not
 ASCII, this is very inconvenient.
 
 We can use ``print(aJapaneseString)`` to get a readable string, but we
 don't have a similar workaround for printing strings from collections
-such as lists or tuples. ``print(listOfJapaneseStrings)`` uses repr() to
-build the string to be printed, so the resulting strings are always
-hex-escaped. Or when ``open(japaneseFilemame)`` raises an exception, the
-error message is something like ``IOError: [Errno 2] No such file or
-directory: '\u65e5\u672c\u8a9e'``, which isn't helpful.
+such as lists or tuples.  ``print(listOfJapaneseStrings)`` uses repr()
+to build the string to be printed, so the resulting strings are always
+hex-escaped.  Or when ``open(japaneseFilemame)`` raises an exception,
+the error message is something like ``IOError: [Errno 2] No such file
+or directory: '\u65e5\u672c\u8a9e'``, which isn't helpful.
 
 Python 3000 has a lot of nice features for non-Latin users such as
-non-ASCII identifiers, so it would be helpful if Python could also progress
-in a similar way for printable output.
+non-ASCII identifiers, so it would be helpful if Python could also
+progress in a similar way for printable output.
 
 Some users might be concerned that such output will mess up their
-console if they print binary data like images. But this is unlikely to
-happen in practice because bytes and strings are different types in
+console if they print binary data like images.  But this is unlikely
+to happen in practice because bytes and strings are different types in
 Python 3000, so printing an image to the console won't mess it up.
 
-This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected.
+This issue was once discussed by Hye-Shik Chang [1]_, but was rejected.
 
 
 Specification
 =============
 
 - Add a new function to the Python C API ``int Py_UNICODE_ISPRINTABLE
-  (Py_UNICODE ch)``. This function returns 0 if repr() should escape the
-  Unicode character ``ch``; otherwise it returns 1. Characters that should
-  be escaped are defined in the Unicode character database as:
+  (Py_UNICODE ch)``.  This function returns 0 if repr() should escape
+  the Unicode character ``ch``; otherwise it returns 1.  Characters
+  that should be escaped are defined in the Unicode character database
+  as:
 
- * Cc (Other, Control)
- * Cf (Other, Format)
- * Cs (Other, Surrogate)
- * Co (Other, Private Use)
- * Cn (Other, Not Assigned)
- * Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
- * Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR ('\\u2029').
- * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
-   this category should be escaped to avoid ambiguity.
+  * Cc (Other, Control)
+  * Cf (Other, Format)
+  * Cs (Other, Surrogate)
+  * Co (Other, Private Use)
+  * Cn (Other, Not Assigned)
+  * Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028').
+  * Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR
+    ('\\u2029').
+  * Zs (Separator, Space) other than ASCII space ('\\x20').  Characters
+    in this category should be escaped to avoid ambiguity.
 
 - The algorithm to build repr() strings should be changed to:
 
- * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
+  * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
 
- * Convert non-printable ASCII characters(0x00-0x1f, 0x7f) to '\\xXX'.
+  * Convert non-printable ASCII characters (0x00-0x1f, 0x7f) to
+    '\\xXX'.
 
- * Convert leading surrogate pair characters without trailing character
-   (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
+  * Convert leading surrogate pair characters without trailing
+    character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to
+    '\\uXXXX'.
 
- * Convert non-printable characters(Py_UNICODE_ISPRINTABLE() returns 0)
-   to 'xXX', '\\uXXXX' or '\\U00xxxxxx'.
+  * Convert non-printable characters (Py_UNICODE_ISPRINTABLE() returns
+    0) to 'xXX', '\\uXXXX' or '\\U00xxxxxx'.
 
- * Backslash-escape quote characters (apostrophe, 0x27) and add quote
-   character at the beginning and the end.
+  * Backslash-escape quote characters (apostrophe, 0x27) and add a
+    quote character at the beginning and the end.
 
-- Set the Unicode error-handler for sys.stderr to 'backslashreplace' by
-  default.
+- Set the Unicode error-handler for sys.stderr to 'backslashreplace'
+  by default.
 
 - Add a new function to the Python C API ``PyObject *PyObject_ASCII
-  (PyObject *o)``. This function converts any python object to a string
-  using PyObject_Repr() and then hex-escapes all non-ASCII characters. 
-  ``PyObject_ASCII()`` generates the same string as ``PyObject_Repr()``
-  in Python 2.
+  (PyObject *o)``.  This function converts any python object to a
+  string using PyObject_Repr() and then hex-escapes all non-ASCII
+  characters.  ``PyObject_ASCII()`` generates the same string as
+  ``PyObject_Repr()`` in Python 2.
 
-- Add a new built-in function, ``ascii()``. This function converts any
-  python object to a string using repr() and then hex-escapes all non-ASCII
-  characters. ``ascii()`` generates the same string as ``repr()`` in
-  Python 2.
+- Add a new built-in function, ``ascii()``.  This function converts
+  any python object to a string using repr() and then hex-escapes all
+  non-ASCII characters.  ``ascii()`` generates the same string as
+  ``repr()`` in Python 2.
 
-- Add ``'%a'`` string format operator. ``'%a'`` converts any python
+- Add a ``'%a'`` string format operator.  ``'%a'`` converts any python
   object to a string using repr() and then hex-escapes all non-ASCII
-  characters. The ``'%a'`` format operator generates the same string as
-  ``'%r'`` in Python 2. Also, add ``'!a'`` conversion flags to the
+  characters.  The ``'%a'`` format operator generates the same string
+  as ``'%r'`` in Python 2.  Also, add ``'!a'`` conversion flags to the
   ``string.format()`` method and add ``'%A'`` operator to the
-  PyUnicode_FromFormat(). They converts any object to an ASCII string
+  PyUnicode_FromFormat().  They convert any object to an ASCII string
   as ``'%a'`` string format operator.
 
-- Add an ``isprintable()`` method to the string type. ``str.isprintable()``
-  returns False if repr() should escape any character in the string;
-  otherwise returns True. The ``isprintable()`` method calls the
-  ``Py_UNICODE_ISPRINTABLE()`` function internally.
+- Add an ``isprintable()`` method to the string type.
+  ``str.isprintable()`` returns False if repr() would escape any
+  character in the string; otherwise returns True.  The
+  ``isprintable()`` method calls the ``Py_UNICODE_ISPRINTABLE()``
+  function internally.
 
 
 Rationale
 =========
 
-The repr() in Python 3000 should be Unicode not ASCII based, just like
-Python 3000 strings. Also, conversion should not be affected by the
-locale setting, because the locale is not necessarily the same as the
-output device's locale. For example, it is common for a daemon process
-to be invoked in an ASCII setting, but writes UTF-8 to its log files.
-Also, web applications might want to report the error information in
-more readable form based on the HTML page's encoding.
+The repr() in Python 3000 should be Unicode, not ASCII based, just
+like Python 3000 strings.  Also, conversion should not be affected by
+the locale setting, because the locale is not necessarily the same as
+the output device's locale.  For example, it is common for a daemon
+process to be invoked in an ASCII setting, but writes UTF-8 to its log
+files.  Also, web applications might want to report the error
+information in more readable form based on the HTML page's encoding.
 
 Characters not supported by the user's console could be hex-escaped on
-printing, by the Unicode encoder's error-handler. If the error-handler
-of the output file is 'backslashreplace', such characters are
-hex-escaped without raising UnicodeEncodeError. For example, if your default
-encoding is ASCII, ``print('Hello ¢')`` will print 'Hello \\xa2'. If
-your encoding is ISO-8859-1, 'Hello ¢' will be printed.
+printing, by the Unicode encoder's error-handler.  If the
+error-handler of the output file is 'backslashreplace', such
+characters are hex-escaped without raising UnicodeEncodeError.  For
+example, if the default encoding is ASCII, ``print('Hello ¢')`` will
+print 'Hello \\xa2'.  If the encoding is ISO-8859-1, 'Hello ¢' will be
+printed.
 
-The default error-handler for sys.stdout is 'strict'. Other applications
-reading the output might not understand hex-escaped characters, so
-unsupported characters should be trapped when writing. If you need to
-escape unsupported characters, you should explicitly change the
-error-handler. Unlike sys.stdout, sys.stderr doesn't raise
+The default error-handler for sys.stdout is 'strict'.  Other
+applications reading the output might not understand hex-escaped
+characters, so unsupported characters should be trapped when writing.
+If unsupported characters must be escaped, the error-handler should be
+changed explicitly.  Unlike sys.stdout, sys.stderr doesn't raise
 UnicodeEncodingError by default, because the default error-handler is
-'backslashreplace'. So printing error messeges containing non-ASCII
-characters to sys.stderr will not raise an exception. Also, information
-about uncaught exceptions (exception object, traceback) are printed by
-the interpreter without raising exceptions.
+'backslashreplace'.  So printing error messages containing non-ASCII
+characters to sys.stderr will not raise an exception.  Also,
+information about uncaught exceptions (exception object, traceback) is
+printed by the interpreter without raising exceptions.
 
 Alternate Solutions
 -------------------
 
-To help debugging in non-Latin languages without changing repr(), other
-suggestions were made.
+To help debugging in non-Latin languages without changing repr(),
+other suggestions were made.
 
 - Supply a tool to print lists or dicts.
 
-  Strings to be printed for debugging are not only contained by lists or
-  dicts, but also in many other types of object. File objects contain a
-  file name in Unicode, exception objects contain a message in Unicode,
-  etc. These strings should be printed in readable form when repr()ed.
-  It is unlikely to be possible to implement a tool to print all
-  possible object types.
+  Strings to be printed for debugging are not only contained by lists
+  or dicts, but also in many other types of object.  File objects
+  contain a file name in Unicode, exception objects contain a message
+  in Unicode, etc.  These strings should be printed in readable form
+  when repr()ed.  It is unlikely to be possible to implement a tool to
+  print all possible object types.
 
 - Use sys.displayhook and sys.excepthook.
 
   For interactive sessions, we can write hooks to restore hex escaped
-  characters to the original characters. But these hooks are called only
-  when printing the result of evaluating an expression entered in an
-  interactive Python session, and doesn't work for the ``print()`` function,
-  for non-interactive sessions or for ``logging.debug("%r", ...)``, etc.
+  characters to the original characters.  But these hooks are called
+  only when printing the result of evaluating an expression entered in
+  an interactive Python session, and don't work for the ``print()``
+  function, for non-interactive sessions or for ``logging.debug("%r",
+  ...)``, etc.
 
 - Subclass sys.stdout and sys.stderr.
 
   It is difficult to implement a subclass to restore hex-escaped
-  characters since there isn't enough information left by the time it's
-  a string to undo the escaping correctly in all cases. For example,
-  ``print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
-  there is no chance to tell file objects apart.
+  characters since there isn't enough information left by the time
+  it's a string to undo the escaping correctly in all cases.  For
+  example, ``print("\\"+"u0041")`` should be printed as '\\u0041', not
+  'A'. But there is no chance to tell file objects apart.
 
 - Make the encoding used by unicode_repr() adjustable, and make the
   existing repr() the default.
 
   With adjustable repr(), the result of using repr() is unpredictable
   and would make it impossible to write correct code involving repr().
-  And if current repr() is the default, then the old convention remains
-  intact and users may expect ASCII strings as the result of repr().
-  Third party applications or libraries could be confused when a custom
-  repr() function is used.
+  And if current repr() is the default, then the old convention
+  remains intact and users may expect ASCII strings as the result of
+  repr().  Third party applications or libraries could be confused
+  when a custom repr() function is used.
 
 
 Backwards Compatibility
 =======================
 
 Changing repr() may break some existing code, especially testing code.
-Five of Python's regression tests fail with this modification. If you
-need repr() strings without non-ASCII character as Python 2, you can use
-the following function. ::
+Five of Python's regression tests fail with this modification.  If you
+need repr() strings without non-ASCII character as Python 2, you can
+use the following function. ::
 
   def repr_ascii(obj):
       return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")
@@ -221,25 +228,25 @@
   log.write(repr(data))     # UnicodeEncodeError will be raised
                             # if data contains unsupported characters.
 
-To avoid exceptions being raised, you can explicitly specify the error-
-handler. ::
+To avoid exceptions being raised, you can explicitly specify the
+error-handler. ::
 
   log = open("logfile", "w", errors="backslashreplace")
   log.write(repr(data))  # Unsupported characters will be escaped.
 
 
-For a console that uses a Unicode-based encoding, for example, en_US.
-utf8 or de_DE.utf8, the backslashescape trick doesn't work and all
-printable characters are not escaped. This will cause a problem of
-similarly drawing characters in Western, Greek and Cyrillic languages.
-These languages use similar (but different) alphabets (descended from a
-common ancestor) and contain letters that look similar but have
-different character codes. For example, it is hard to distinguish Latin
-'a', 'e' and 'o' from Cyrillic 'а', 'е' and 'о'. (The visual
-representation, of course, very much depends on the fonts used but
-usually these letters are almost indistinguishable.) To avoid the
-problem, the user can adjust the terminal encoding to get a result
-suitable for their environment.
+For a console that uses a Unicode-based encoding, for example,
+en_US.utf8 or de_DE.utf8, the backslashreplace trick doesn't work and
+all printable characters are not escaped.  This will cause a problem
+of similarly drawing characters in Western, Greek and Cyrillic
+languages.  These languages use similar (but different) alphabets
+(descended from a common ancestor) and contain letters that look
+similar but have different character codes.  For example, it is hard
+to distinguish Latin 'a', 'e' and 'o' from Cyrillic 'а', 'е' and 'о'.
+(The visual representation, of course, very much depends on the fonts
+used but usually these letters are almost indistinguishable.)  To
+avoid the problem, the user can adjust the terminal encoding to get a
+result suitable for their environment.
 
 
 Rejected Proposals
@@ -252,20 +259,21 @@
   idea. [2]_
 
 - Use character names to escape characters, instead of hex character
-  codes. For example, ``repr('\u03b1')`` can be converted to ``"\N{GREEK
-  SMALL LETTER ALPHA}"``.
+  codes.  For example, ``repr('\u03b1')`` can be converted to
+  ``"\N{GREEK SMALL LETTER ALPHA}"``.
 
-  Using character names can be very verbose compared to hex-escape. 
-  e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR
-  KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``.
+  Using character names can be very verbose compared to hex-escape.
+  e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE
+  UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED
+  FORM}"``.
 
 - Default error-handler of sys.stdout should be 'backslashreplace'.
 
   Stuff written to stdout might be consumed by another program that
-  might misinterpret the \ escapes. For interactive session, it is
-  possible to make 'backslashreplace' error-handler to default, but may
-  add confusion of the kind "it works in interactive mode but not when
-  redirecting to a file".
+  might misinterpret the \\ escapes.  For interactive sessions, it is
+  possible to make the 'backslashreplace' error-handler the default,
+  but this may add confusion of the kind "it works in interactive mode
+  but not when redirecting to a file".
 
 
 Implementation
@@ -288,3 +296,14 @@
 =========
 
 This document has been placed in the public domain.
+
+
+
+..
+  Local Variables:
+  mode: indented-text
+  indent-tabs-mode: nil
+  sentence-end-double-space: t
+  fill-column: 70
+  coding: utf-8
+  End:

-- 
Repository URL: http://hg.python.org/peps