[Python-bugs-list] [ python-Bugs-504343 ] Unicode docstrings and new style classes
noreply@sourceforge.net
noreply@sourceforge.net
Sun, 27 Jan 2002 01:37:12 -0800
Bugs item #504343, was opened at 2002-01-16 04:10
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=504343&group_id=5470
Category: Type/class unification
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unicode docstrings and new style classes
Initial Comment:
Unicode docstrings don't work with new style
classes. With old style classes they work:
----
class foo:
u"föö"
class bar(object):
u"bär"
print repr(foo.__doc__)
print repr(bar.__doc__)
----
This prints
----
u'f\xf6\xf6'
None
----------------------------------------------------------------------
Comment By: James Henstridge (jhenstridge)
Date: 2002-01-27 01:37
Message:
Logged In: YES
user_id=146903
I am posting some comments about this patch after my similar
bug was closed as a duplicate:
http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=507394
I just tested the typeobject.c patch, and it doesn't work
when using a descriptor as the __doc__ for an object (the
descriptor itself is returned for class.__doc__ rather than
the result of the tp_descr_get function).
With the patch applied, the output of the program attached
to the above mentioned bug is:
OldClass.__doc__ = 'object=None
type=OldClass'
OldClass().__doc__ = 'object=OldClass instance
type=OldClass'
NewClass.__doc__ = <__main__.DocDescr object at
0x811ce34>
NewClass().__doc__ = 'object=NewClass instance
type=NewClass'
The suggestion I gave in the other bug is to get rid of the
type.__doc__ property/getset all together, and make
PyType_Ready() set __doc__ in tp_dict based on the value of
tp_doc. Is there any reason why this wouldn't work? (it
would seem to give behaviour more consistant with old style
classes, which would be good).
I will look at producing a patch to do this shortly.
----------------------------------------------------------------------
Comment By: Walter Dörwald (doerwalter)
Date: 2002-01-17 08:14
Message:
Logged In: YES
user_id=89016
This sound much better. With my current patch all the
docstrings for the builltin types are gone, because int
etc. never goes through typeobject.c/type_new().
I updated the patch to use Guido's method.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-17 06:25
Message:
Logged In: YES
user_id=6380
Wouldn't it be easier to set the __doc__ attribute in
tp_dict and be done with it? That's what classic classes do.
The accessor should still be a bit special: it should be
implemented as a property (in tp_getsets), and first look
for __doc__ in tp_dict and fall back to tp_doc.
----------------------------------------------------------------------
Comment By: Walter Dörwald (doerwalter)
Date: 2002-01-17 06:19
Message:
Logged In: YES
user_id=89016
OK, I've attached the patch.
Note that I had to change the return value of
PyStructSequence_InitType from void to int.
Introducing tp_docobject should provide backwards
compatibility for C extensions that still want to use
tp_doc as char *. If this is not relevant then we could
switch to PyObject *tp_doc immediately, but this
complicates initializing a static type structure.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-01-17 05:45
Message:
Logged In: YES
user_id=21627
Adding tp_docobject would work, although it may be somewhat
hackish (why should we have this kind of redundancy). I'm
not sure how you will convert that to the 8bit version,
though: what encoding? If you use the default encoding,
tp_doc will be sometimes set, sometimes it won't.
In any case, I'd encourage you to produce a patch.
----------------------------------------------------------------------
Comment By: Walter Dörwald (doerwalter)
Date: 2002-01-16 05:03
Message:
Logged In: YES
user_id=89016
What we could do is add a new slot tp_docobject, that holds
the doc object. Then type_members would include
{"__doc__", T_OBJECT, offsetof(PyTypeObject, tp_docobject),
READONLY},
tp_doc should be initialized with an 8bit version of
tp_docobject (using the default encoding and error='ignore'
if tp_docobject is unicode).
Does this sound reasonably?
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-01-16 04:18
Message:
Logged In: YES
user_id=21627
There is a good chance that is caused by the lines following
XXX What if it's a Unicode string? Don't know -- this
ignores it.
in Objects/typeobject.c. :-) Would you like to investigate
the options and propose a patch?
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=504343&group_id=5470