[Python-bugs-list] [ python-Bugs-460020 ] bug or feature: unicode() and subclasses
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 11 Sep 2001 05:01:02 -0700
Bugs item #460020, was opened at 2001-09-09 08:41
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=460020&group_id=5470
Category: Type/class unification
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Tim Peters (tim_one)
Summary: bug or feature: unicode() and subclasses
Initial Comment:
The unicode constructor returns the object passed in,
when an instance of a subclass of unicode is passed in:
--
class U(unicode):
pass
u1 = U(u"foo")
print type(u1)
u2 = unicode(u1)
print type(u2)
--
this gives
--
<type '__main__.U'>
<type '__main__.U'>
--
instead of
--
<type '__main__.U'>
<type 'unicode'>
--
as it probably should be (The unicode constructor
should construct unicode objects). With the current
behaviour it is nearly impossible to construct a
unicode object with the value of an instance of a
unicode subclass, because most methods are optimized
to return the original object if possible, e.g.
--
print type(unicode.__getslice__(u1, 0, 3))
print type(unicode.__getslice__(u1, 0, 2))
--
gives
--
<type '__main__.U'>
<type 'unicode'>
--
This should be made consistent, so that either a
unicode object is always returned, or all methods use
a "virtual constructor", i.e. create an object of the
type passed in. This would simplify deriving classes
from unicode as far fewer methods have to be
overwritten.
But first of all the constructor should be fixed, so
that the argument is returned unmodified only when it
is an instance of unicode and not of a unicode
subclass.
----------------------------------------------------------------------
>Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-11 05:01
Message:
Logged In: YES
user_id=6380
You're asking for the impossible though. I don't think any
other OO language supports this automatically (although I
could be wrong). The problem is, what to do with a subclass
of unicode like this:
class U(unicode):
def __init__(self, arg):
self.orig = arg
How is U("foobar")[0:3] going to know what argument to pass
in to __init__? The base class simply can't know what
additional invariants the subclass imposes.
----------------------------------------------------------------------
Comment By: Walter Dörwald (doerwalter)
Date: 2001-09-11 04:31
Message:
Logged In: YES
user_id=89016
Thanks for the quick fix, but the second problem still
remains:
---
class U(unicode):
pass
u = U(u"foo")
print type(u[0:3])
print type(u[0:2])
---
This gives:
---
<type '__main__.U'>
<type 'unicode'>
---
I think this should be changed to either always return a
unicode object, or to always return an instance of the real
class passed in. (This should be done for all unicode
methods that return a new unicode object). The second
solution would simplify creating derived classes, because
all the methods that return unicode objects would
automatically return the derived type, so these methods
don't have to be overwritten.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-09-10 20:09
Message:
Logged In: YES
user_id=31435
unicode() repaired in
Include/unicodeobject.h; new revision: 2.33
Lib/test/test_descr.py; new revision: 1.39
Objects/unicodeobject.c; new revision: 2.111
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-09-10 18:43
Message:
Logged In: YES
user_id=31435
str() repaired (yes, unicode is next <wink>), in
Include/stringobject.h; new revision: 2.31
Lib/test/test_descr.py; new revision: 1.37
Objects/object.c; new revision: 2.146
Objects/stringobject.c; new revision: 2.130
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-09-10 16:39
Message:
Logged In: YES
user_id=31435
tuple() repaired, in
Include/tupleobject.h; new revision: 2.27
Lib/test/test_descr.py; new revision: 1.36
Objects/abstract.c; new revision: 2.77
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-09-10 14:29
Message:
Logged In: YES
user_id=31435
float() also repaired, in
Include/floatobject.h; new revision: 2.20
Lib/test/test_descr.py; new revision: 1.34
Objects/abstract.c; new revision: 2.76
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-09-10 13:57
Message:
Logged In: YES
user_id=31435
Partially repaired (for int and long) in:
Include/intobject.h; new revision: 2.24
Include/longintrepr.h; new revision: 2.12
Include/longobject.h; new revision: 2.24
Lib/test/test_descr.py; new revision: 1.33
Objects/abstract.c; new revision: 2.75
Objects/longobject.c; new revision: 1.104
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-09-10 13:45
Message:
Logged In: YES
user_id=31435
Reassigned to me.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-10 07:48
Message:
Logged In: YES
user_id=6380
Good catch! Other types also suffer from this, e.g. int.
added to my to-do list.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=460020&group_id=5470