[Patches] [ python-Patches-446754 ] Enhanced unicode constructor

noreply@sourceforge.net noreply@sourceforge.net
Wed, 01 Aug 2001 06:55:39 -0700


Patches item #446754, was opened at 2001-08-01 05:59
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=446754&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Enhanced unicode constructor

Initial Comment:
This patch (against descr-branch) uses a slightly 
enhanced version of PyObject_Unicode instead of 
PyUnicode_FromEncodedObject in the unicode constructor 
(Objects/unicodeobject.h/unicode_new), which gives the 
unicode constructor the same functionality as the str 
constructor: creating string representations with the 
__str__ method/tp_str slot. Example:

Python 2.2a1 (#10, Aug  1 2001, 14:26:06) 
[GCC 2.95.2 19991024 (release)] on linux2
Type "help", "copyright", "credits" or "license" for 
more information.
>>> str("u"), unicode("u")
('u', u'u')
>>> str(u"u"), unicode(u"u")
('u', u'u')
>>> str(None), unicode(None)
('None', u'None')
>>> str(42), unicode(42)
('42', u'42')
>>> str(23.), unicode(23.)
('23.0', u'23.0')
>>> str([1,2,3]), unicode([1,2,3])
('[1, 2, 3]', u'[1, 2, 3]')
>>> str({"u": 23, u"ü": 42}), unicode({"u": 23, u"ü": 
42})
("{'u': 23, u'\xfc': 42}", u"{'u': 23, u'\\xfc': 
42}")
>>> class foo:
...    def __init__(self, x):
...       self.x = x
...    def __str__(self):
...       return self.x
... 
>>> str(foo("bar")), unicode(foo("bar"))
('bar', u'bar')
>>> str(foo(u"bar")), unicode(foo(u"bar"))
('bar', u'bar')

Passing the encoding and errors argument still works
and the will be used for any 8bit string returned from 
__str__.

Perhaps for symmetry encoding and errors arguments 
should be added to the str constructor too, which will 
be used when a unicode object is returned from __str__ 
for encoding the object.

One problem is that unicode([u"ü"]) returns 
u"[u'\\xfc']" because __repr__ returns a 8bit escape 
encoded string, it would be better if the result was 
u"[u'\xfc']", but this would require a 
PyObject_UnicodeRepr (and/or changing the list tp_str 
slot (and many others) to return Unicode)


----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2001-08-01 06:55

Message:
Logged In: YES 
user_id=89016

Oops, sorry I accidentally hit the submit button! :-/
The foo class is the one from the first message.

str(...) always does a unicodeescape encoding when it 
encounters a Unicode object, so you'll get escape 
characters, but this will not be decoded when constructing 
the unicode object. I.e. u"..." != unicode(str(u"...")).

Basically the unicode type should have the same 
functionality as the str type to ease unicode migration.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-08-01 06:49

Message:
Logged In: YES 
user_id=89016

>>> unicode(u"ü")
u'\xfc'
*>>> unicode(str(u"ü"))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range
(128)

>>> unicode(foo(u"ü"))
u'\xfc'
*>>> unicode(str(foo(u"ü")))    
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range
(128)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-08-01 06:45

Message:
Logged In: YES 
user_id=6380

What does this have to offer over unicode(str(x))?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=446754&group_id=5470