[Python-checkins] python/nondist/peps pep-0000.txt, 1.341, 1.342 pep-0349.txt, 1.2, 1.3

Mon Aug 22 23:12:20 CEST 2005

Update of /cvsroot/python/python/nondist/peps
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27113

Modified Files:
	pep-0000.txt pep-0349.txt 
Log Message:
New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.


Index: pep-0000.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0000.txt,v
retrieving revision 1.341
retrieving revision 1.342
diff -u -d -r1.341 -r1.342

--- pep-0000.txt	13 Aug 2005 12:37:53 -0000	1.341
+++ pep-0000.txt	22 Aug 2005 21:12:08 -0000	1.342
@@ -105,7 +105,7 @@
  S   345  Metadata for Python Software Packages 1.2    Jones
  P   347  Migrating the Python CVS to Subversion       von Löwis
  S   348  Exception Reorganization for Python 3.0      Cannon
- S   349  Generalized String Coercion                  Schemenauer
+ S   349  Allow str() to return unicode strings        Schemenauer
  S   754  IEEE 754 Floating Point Special Values       Warnes
 
  Finished PEPs (done, implemented in CVS)
@@ -393,7 +393,7 @@
  SR  346  User Defined ("with") Statements             Coghlan
  P   347  Migrating the Python CVS to Subversion       von Löwis
  S   348  Exception Reorganization for Python 3.0      Cannon
- S   349  Generalized String Coercion                  Schemenauer
+ S   349  Allow str() to return unicode strings        Schemenauer
  SR  666  Reject Foolish Indentation                   Creighton
  S   754  IEEE 754 Floating Point Special Values       Warnes
  I  3000  Python 3.0 Plans                             Kuchling, Cannon

Index: pep-0349.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0349.txt,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -d -r1.2 -r1.3
--- pep-0349.txt	6 Aug 2005 04:05:48 -0000	1.2
+++ pep-0349.txt	22 Aug 2005 21:12:08 -0000	1.3
@@ -1,5 +1,5 @@
 PEP: 349
-Title: Generalised String Coercion
+Title: Allow str() to return unicode strings
 Version: $Revision$
 Last-Modified: $Date$
 Author: Neil Schemenauer <nas at arctrix.com>
@@ -7,20 +7,18 @@
 Type: Standards Track
 Content-Type: text/plain
 Created: 02-Aug-2005
-Post-History:
+Post-History: 06-Aug-2005
 Python-Version: 2.5
 
 
 Abstract
 
-    This PEP proposes the introduction of a new built-in function,
-    text(), that provides a way of generating a string representation
-    of an object without forcing the result to be a particular string
-    type.  In addition, the behavior %s format specifier would be
-    changed to call text() on the argument.  These two changes would
-    make it easier to write library code that can be used by
-    applications that use only the str type and by others that also use
-    the unicode type.
+    This PEP proposes to change the str() built-in function so that it
+    can return unicode strings.  This change would make it easier to
+    write code that works with either string type and would also make
+    some existing code handle unicode strings.  The C function
+    PyObject_Str() would remain unchanged and the function
+    PyString_New() would be added instead.
 
 
 Rationale
@@ -64,51 +62,35 @@
     object; an operation traditionally accomplished by using the str()
     built-in function.
     
-    Using str() makes the code not Unicode-safe.  Replacing a str()
-    call with a unicode() call makes the code not str-stable.  Using a
-    string format almost accomplishes the goal but not quite.
-    Consider the following code:
-
-        def text(obj):
-            return '%s' % obj
-
-    It behaves as desired except if 'obj' is not a basestring instance
-    and needs to return a Unicode representation of itself.  In that
-    case, the string format will attempt to coerce the result of
-    __str__ to a str instance.  Defining a __unicode__ method does not
-    help since it will only be called if the right-hand operand is a
-    unicode instance.  Using a unicode instance for the right-hand
-    operand does not work because the function is no longer str-stable
-    (i.e. it will coerce everything to unicode).
+    Using the current str() function makes the code not Unicode-safe.
+    Replacing a str() call with a unicode() call makes the code not
+    str-stable.  Changing str() so that it could return unicode
+    instances would solve this problem.  As a further benefit, some code
+    that is currently not Unicode-safe because it uses str() would
+    become Unicode-safe.
 
 
 Specification
 
-    A Python implementation of the text() built-in follows:
+    A Python implementation of the str() built-in follows:
 
-        def text(s):
+        def str(s):
             """Return a nice string representation of the object.  The
-            return value is a basestring instance.
+            return value is a str or unicode instance.
             """
-            if isinstance(s, basestring):
+            if type(s) is str or type(s) is unicode:
                 return s
             r = s.__str__()
-            if not isinstance(r, basestring):
+            if not isinstance(r, (str, unicode)):
                 raise TypeError('__str__ returned non-string')
             return r
             
-    Note that it is currently possible, although not very useful, to
-    write __str__ methods that return unicode instances.
-
-    The %s format specifier for str objects would be changed to call
-    text() on the argument.  Currently it calls str() unless the
-    argument is a unicode instance (in which case the object is
-    substituted as is and the % operation returns a unicode instance).
-
     The following function would be added to the C API and would be the
-    equivalent of the text() function:
+    equivalent to the str() built-in (ideally it be called PyObject_Str,
+    but changing that function could cause a massive number of
+    compatibility problems):
 
-        PyObject *PyObject_Text(PyObject *o);
+        PyObject *PyString_New(PyObject *);
 
     A reference implementation is available on Sourceforge [1] as a
     patch.
@@ -116,52 +98,36 @@
                 
 Backwards Compatibility
 
-    The change to the %s format specifier would result in some %
-    operations returning a unicode instance rather than raising a
-    UnicodeDecodeError exception.  It seems unlikely that the change
-    would break currently working code.
-
+    Some code may require that str() returns a str instance.  In the
+    standard library, only one such case has been found so far.  The
+    function email.header_decode() requires a str instance and the
+    email.Header.decode_header() function tries to ensure this by
+    calling str() on its argument.  The code was fixed by changing
+    the line "header = str(header)" to:
 
-Alternative Solutions
+        if isinstance(header, unicode):
+            header = header.encode('ascii')
 
-    Rather than adding the text() built-in, if PEP 246 were
-    implemented then adapt(s, basestring) could be equivalent to
-    text(s).  The advantage would be one less built-in function.  The
-    problem is that PEP 246 is not implemented.
+    Whether this is truly a bug is questionable since decode_header()
+    really operates on byte strings, not character strings.  Code that
+    passes it a unicode instance could itself be considered buggy.
 
-    Fredrik Lundh has suggested [2] that perhaps a new slot should be
-    added (e.g. __text__), that could return any kind of string that's
-    compatible with Python's text model.  That seems like an
-    attractive idea but many details would still need to be worked
-    out.
 
-    Instead of providing the text() built-in, the %s format specifier
-    could be changed and a string format could be used instead of
-    calling text().  However, it seems like the operation is important
-    enough to justify a built-in.
+Alternative Solutions
 
-    Instead of providing the text() built-in, the basestring type
-    could be changed to provide the same functionality.  That would
-    possibly be confusing behaviour for an abstract base type.
+    A new built-in function could be added instead of changing str().
+    Doing so would introduce virtually no backwards compatibility
+    problems.  However, since the compatibility problems are expected to
+    rare, changing str() seems preferable to adding a new built-in.
 
-    Some people have suggested [3] that an easier migration path would
-    be to change the default encoding to be UTF-8.  Code that is not
-    Unicode safe would then encode Unicode strings as UTF-8 and
-    operate on them as str instances, rather than raising a
-    UnicodeDecodeError exception.  Other code would assume that str
-    instances were encoded using UTF-8 and decode them if necessary.
-    While that solution may work for some applications, it seems
-    unsuitable as a general solution.  For example, some applications
-    get string data from many different sources and assuming that all
-    str instances were encoded using UTF-8 could easily introduce
-    subtle bugs.
+    The basestring type could be changed to have the proposed behaviour,
+    rather than changing str().  However, that would be confusing
+    behaviour for an abstract base type.
 
 
 References
 
-    [1] http://www.python.org/sf/1159501
-    [2] http://mail.python.org/pipermail/python-dev/2004-September/048755.html
-    [3] http://blog.ianbicking.org/illusive-setdefaultencoding.html
+    [1] http://www.python.org/sf/1266570
 
 
 Copyright