
__str__ and __unicode__ seem to behave differently. A __str__ overwrite in a str subclass is used when calling str(), a __unicode__ overwrite in a unicode subclass is *not* used when calling unicode(): ------------------------------- class str2(str): def __str__(self): return "foo" x = str2("bar") print str(x) class unicode2(unicode): def __unicode__(self): return u"foo" x = unicode2(u"bar") print unicode(x) ------------------------------- This outputs: foo bar IMHO this should be fixed so that __unicode__() is used in the second case too. Bye, Walter Dörwald

Walter Dörwald wrote:
If you drop the base class for unicode, this already works. This code in object.c:PyObject_Unicode() is responsible for the sub-class version not doing what you'd expect: if (PyUnicode_Check(v)) { /* For a Unicode subtype that's not a Unicode object, return a true Unicode object with the same data. */ return PyUnicode_FromUnicode(PyUnicode_AS_UNICODE(v), PyUnicode_GET_SIZE(v)); } So the question is whether conversion of a Unicode sub-type to a true Unicode object should honor __unicode__ or not. The same question can be asked for many other types, e.g. floats (and __float__), integers (and __int__), etc.
I think we need general consensus on what the strategy should be: honor these special hooks in conversions to base types or not ? Maybe the string case is the real problem ... :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 10 2005)
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

M.-A. Lemburg wrote:
That's cheating! ;) My use case is an XML DOM API: __unicode__() should extract the character data from the DOM. For Text nodes this is the text, for comments and processing instructions this is u"" etc. To reduce memory footprint and to inherit all the unicode methods, it would be good if Text, Comment and ProcessingInstruction could be subclasses of unicode.
I'd say, these hooks should be honored, because it gives us more possibilities: If you want the original value, simply don't implement the hook.
Maybe the string case is the real problem ... :-)
At least it seems that the string case is the exception. So if we fix __str__ this would be a bugfix for 2.4.1. If we fix the rest, this would be a new feature for 2.5. Bye, Walter Dörwald

On Jan 19, 2005, at 4:40, Walter Dörwald wrote:
It sounds like a really bad idea to have a class that supports both of these properties: - unicode as a base class - non-trivial result from unicode(foo) Do you REALLY think this should be True?! isinstance(foo, unicode) and foo != unicode(foo) Why don't you just call this "extract character data" method something other than __unicode__? That way, you get the reduced memory footprint and convenience methods of unicode, with none of the craziness. -bob

Bob Ippolito wrote:
IMHO __unicode__ is the most natural and logical choice. isinstance(foo, unicode) is just an implementation detail. But you're right: the consequences of this can be a bit scary.
That way, you get the reduced memory footprint and convenience methods of unicode, with none of the craziness.
Without this craziness we wouldn't have discovered the problem. ;) Whether this craziness gets implemented, depends on the solution to this problem. Bye, Walter Dörwald

On 2005 Jan 19, at 11:10, Bob Ippolito wrote:
Do you REALLY think this should be True?! isinstance(foo, unicode) and foo != unicode(foo)
Hmmmm -- why not? In the generic case, talking about some class B, it certainly violates no programming principle known to me that "isinstance(foo, B) and foo != B(foo)"; it seems a rather common case -- ``casting to the base class'' (in C++ terminology, I guess) ``slices off'' some parts of foo, and thus equality does not hold. If this is specifically a bad idea for the specific case where B is unicode, OK, that's surely possible, but if so it seems it should be possible to explain this in terms of particular properties of type unicode. Alex

Walter Dörwald wrote:
Indeed.
So if we fix __str__ this would be a bugfix for 2.4.1. If we fix the rest, this would be a new feature for 2.5.
I have a feeling that we're better off with the bug fix than the new feature. __str__ and __unicode__ as well as the other hooks were specifically added for the type constructors to use. However, these were added at a time where sub-classing of types was not possible, so it's time now to reconsider whether this functionality should be extended to sub-classes as well. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 10 2005)
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

M.-A. Lemburg wrote:
It seems oddly inconsistent though: """Define __str__ to determine what your class returns for str(x). NOTE: This won't work if your class directly or indirectly inherits from str. If that is the case, you cannot alter the results of str(x).""" At present, most of the type constructors need the caveat, whereas __str__ actually agrees with the simple explanation in the first line. Going back to PyUnicode, PyObject_Unicode's handling of subclasses of builtins is decidedly odd: Py> class C(str): ... def __str__(self): return "I am a string!" ... def __unicode__(self): return "I am not unicode!" ... Py> c = C() Py> str(c) 'I am a string!' Py> unicode(c) u'' Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net

Nick Coghlan wrote:
Those APIs were all written long before there were sub-classes of types.
Ah, looks as if the function needs a general overhaul :-) This section should be do a PyString_CheckExact(): if (PyString_Check(v)) { Py_INCREF(v); res = v; } But before we start hacking the function, we need a general picture of what we think is right. Note, BTW, that there is also a tp_str slot that serves as hook. The overall solution to this apparent mess should be consistent for all hooks (__str__, tp_str, __unicode__ and a future tp_unicode). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 10 2005)
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

M.-A. Lemburg wrote:
Those APIs were all written long before there were sub-classes of types.
Understood. PyObject_Unicode certainly looked like an 'evolved' piece of code :)
But before we start hacking the function, we need a general picture of what we think is right.
Aye.
I imagine many people are like me, with __str__ being the only one of these hooks they use frequently (Helping out with the Decimal implementation is the only time I can recall using the slots for the numeric types, and I rarely need to deal with Unicode). Anyway, they're heavy use suggests to me that __str__ and str() are likely to provide a good model for the desired behaviour - they're the ones that are likely to have been nudged in the most useful direction by bug reports and the like. Regards, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net

Nick Coghlan wrote:
+1 __foo__ provides conversion to foo, no matter whether foo is among the direct or indirect base classes. Simply moving the PyUnicode_Check() call in PyObject_Unicode() after the __unicode__ call (after the PyErr_Clear() call) will implement this (but does not fix Nick's bug). Running the test suite with this change reveals no other problems. Bye, Walter Dörwald

Walter Dörwald wrote:
I don't have a clear picture of what the consensus currently looks like :-) If we're going for for a solution that implements the hook awareness for all __<typename>__ hooks, I'd be +1 on that. If we only touch the __unicode__ case, we'd only be created yet another special case. I'd vote -0 on that. Another solution would be to have all type constructors ignore the __<typename>__ hooks (which were originally added to provide classes with a way to mimic type behavior). In general, I think we should try to get rid off special cases and go for a clean solution (either way). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 23 2005)
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

M.-A. Lemburg wrote:
Here's the patch that implements this for int/long/float/unicode: http://www.python.org/sf/1109424 Note that complex already did the right thing. For int/long/float this is implemented in the following way: Converting an instance of a subclass to the base class is done in the appropriate slot of the type (i.e. intobject.c::int_int() etc.) instead of in PyNumber_Int()/PyNumber_Long()/PyNumber_Float(). It's still possible for a conversion method to return an instance of a subclass of int/long/float. Bye, Walter Dörwald

On Wed, Jan 19, 2005, Walter D?rwald wrote:
Nope. Unless you're claiming the __str__ behavior is new in 2.4? (Haven't been following the thread closely.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "19. A language that doesn't affect the way you think about programming, is not worth knowing." --Alan Perlis
participants (7)
-
Aahz
-
Alex Martelli
-
Bob Ippolito
-
Brett C.
-
M.-A. Lemburg
-
Nick Coghlan
-
Walter Dörwald