[Python-checkins] python/nondist/peps pep-0000.txt, 1.341, 1.342 pep-0349.txt, 1.2, 1.3
nascheme@users.sourceforge.net
nascheme at users.sourceforge.net
Mon Aug 22 23:12:20 CEST 2005
Update of /cvsroot/python/python/nondist/peps
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv27113
Modified Files:
pep-0000.txt pep-0349.txt
Log Message:
New version of PEP 349. Propose that str() be changed rather than
adding a new built-in function.
Index: pep-0000.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0000.txt,v
retrieving revision 1.341
retrieving revision 1.342
diff -u -d -r1.341 -r1.342
--- pep-0000.txt 13 Aug 2005 12:37:53 -0000 1.341
+++ pep-0000.txt 22 Aug 2005 21:12:08 -0000 1.342
@@ -105,7 +105,7 @@
S 345 Metadata for Python Software Packages 1.2 Jones
P 347 Migrating the Python CVS to Subversion von Löwis
S 348 Exception Reorganization for Python 3.0 Cannon
- S 349 Generalized String Coercion Schemenauer
+ S 349 Allow str() to return unicode strings Schemenauer
S 754 IEEE 754 Floating Point Special Values Warnes
Finished PEPs (done, implemented in CVS)
@@ -393,7 +393,7 @@
SR 346 User Defined ("with") Statements Coghlan
P 347 Migrating the Python CVS to Subversion von Löwis
S 348 Exception Reorganization for Python 3.0 Cannon
- S 349 Generalized String Coercion Schemenauer
+ S 349 Allow str() to return unicode strings Schemenauer
SR 666 Reject Foolish Indentation Creighton
S 754 IEEE 754 Floating Point Special Values Warnes
I 3000 Python 3.0 Plans Kuchling, Cannon
Index: pep-0349.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0349.txt,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -d -r1.2 -r1.3
--- pep-0349.txt 6 Aug 2005 04:05:48 -0000 1.2
+++ pep-0349.txt 22 Aug 2005 21:12:08 -0000 1.3
@@ -1,5 +1,5 @@
PEP: 349
-Title: Generalised String Coercion
+Title: Allow str() to return unicode strings
Version: $Revision$
Last-Modified: $Date$
Author: Neil Schemenauer <nas at arctrix.com>
@@ -7,20 +7,18 @@
Type: Standards Track
Content-Type: text/plain
Created: 02-Aug-2005
-Post-History:
+Post-History: 06-Aug-2005
Python-Version: 2.5
Abstract
- This PEP proposes the introduction of a new built-in function,
- text(), that provides a way of generating a string representation
- of an object without forcing the result to be a particular string
- type. In addition, the behavior %s format specifier would be
- changed to call text() on the argument. These two changes would
- make it easier to write library code that can be used by
- applications that use only the str type and by others that also use
- the unicode type.
+ This PEP proposes to change the str() built-in function so that it
+ can return unicode strings. This change would make it easier to
+ write code that works with either string type and would also make
+ some existing code handle unicode strings. The C function
+ PyObject_Str() would remain unchanged and the function
+ PyString_New() would be added instead.
Rationale
@@ -64,51 +62,35 @@
object; an operation traditionally accomplished by using the str()
built-in function.
- Using str() makes the code not Unicode-safe. Replacing a str()
- call with a unicode() call makes the code not str-stable. Using a
- string format almost accomplishes the goal but not quite.
- Consider the following code:
-
- def text(obj):
- return '%s' % obj
-
- It behaves as desired except if 'obj' is not a basestring instance
- and needs to return a Unicode representation of itself. In that
- case, the string format will attempt to coerce the result of
- __str__ to a str instance. Defining a __unicode__ method does not
- help since it will only be called if the right-hand operand is a
- unicode instance. Using a unicode instance for the right-hand
- operand does not work because the function is no longer str-stable
- (i.e. it will coerce everything to unicode).
+ Using the current str() function makes the code not Unicode-safe.
+ Replacing a str() call with a unicode() call makes the code not
+ str-stable. Changing str() so that it could return unicode
+ instances would solve this problem. As a further benefit, some code
+ that is currently not Unicode-safe because it uses str() would
+ become Unicode-safe.
Specification
- A Python implementation of the text() built-in follows:
+ A Python implementation of the str() built-in follows:
- def text(s):
+ def str(s):
"""Return a nice string representation of the object. The
- return value is a basestring instance.
+ return value is a str or unicode instance.
"""
- if isinstance(s, basestring):
+ if type(s) is str or type(s) is unicode:
return s
r = s.__str__()
- if not isinstance(r, basestring):
+ if not isinstance(r, (str, unicode)):
raise TypeError('__str__ returned non-string')
return r
- Note that it is currently possible, although not very useful, to
- write __str__ methods that return unicode instances.
-
- The %s format specifier for str objects would be changed to call
- text() on the argument. Currently it calls str() unless the
- argument is a unicode instance (in which case the object is
- substituted as is and the % operation returns a unicode instance).
-
The following function would be added to the C API and would be the
- equivalent of the text() function:
+ equivalent to the str() built-in (ideally it be called PyObject_Str,
+ but changing that function could cause a massive number of
+ compatibility problems):
- PyObject *PyObject_Text(PyObject *o);
+ PyObject *PyString_New(PyObject *);
A reference implementation is available on Sourceforge [1] as a
patch.
@@ -116,52 +98,36 @@
Backwards Compatibility
- The change to the %s format specifier would result in some %
- operations returning a unicode instance rather than raising a
- UnicodeDecodeError exception. It seems unlikely that the change
- would break currently working code.
-
+ Some code may require that str() returns a str instance. In the
+ standard library, only one such case has been found so far. The
+ function email.header_decode() requires a str instance and the
+ email.Header.decode_header() function tries to ensure this by
+ calling str() on its argument. The code was fixed by changing
+ the line "header = str(header)" to:
-Alternative Solutions
+ if isinstance(header, unicode):
+ header = header.encode('ascii')
- Rather than adding the text() built-in, if PEP 246 were
- implemented then adapt(s, basestring) could be equivalent to
- text(s). The advantage would be one less built-in function. The
- problem is that PEP 246 is not implemented.
+ Whether this is truly a bug is questionable since decode_header()
+ really operates on byte strings, not character strings. Code that
+ passes it a unicode instance could itself be considered buggy.
- Fredrik Lundh has suggested [2] that perhaps a new slot should be
- added (e.g. __text__), that could return any kind of string that's
- compatible with Python's text model. That seems like an
- attractive idea but many details would still need to be worked
- out.
- Instead of providing the text() built-in, the %s format specifier
- could be changed and a string format could be used instead of
- calling text(). However, it seems like the operation is important
- enough to justify a built-in.
+Alternative Solutions
- Instead of providing the text() built-in, the basestring type
- could be changed to provide the same functionality. That would
- possibly be confusing behaviour for an abstract base type.
+ A new built-in function could be added instead of changing str().
+ Doing so would introduce virtually no backwards compatibility
+ problems. However, since the compatibility problems are expected to
+ rare, changing str() seems preferable to adding a new built-in.
- Some people have suggested [3] that an easier migration path would
- be to change the default encoding to be UTF-8. Code that is not
- Unicode safe would then encode Unicode strings as UTF-8 and
- operate on them as str instances, rather than raising a
- UnicodeDecodeError exception. Other code would assume that str
- instances were encoded using UTF-8 and decode them if necessary.
- While that solution may work for some applications, it seems
- unsuitable as a general solution. For example, some applications
- get string data from many different sources and assuming that all
- str instances were encoded using UTF-8 could easily introduce
- subtle bugs.
+ The basestring type could be changed to have the proposed behaviour,
+ rather than changing str(). However, that would be confusing
+ behaviour for an abstract base type.
References
- [1] http://www.python.org/sf/1159501
- [2] http://mail.python.org/pipermail/python-dev/2004-September/048755.html
- [3] http://blog.ianbicking.org/illusive-setdefaultencoding.html
+ [1] http://www.python.org/sf/1266570
Copyright
More information about the Python-checkins
mailing list