[Python-bugs-list] [ python-Bugs-504343 ] Unicode docstrings and new style classes

noreply@sourceforge.net noreply@sourceforge.net
Sun, 27 Jan 2002 02:10:48 -0800


Bugs item #504343, was opened at 2002-01-16 04:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=504343&group_id=5470

Category: Type/class unification
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unicode docstrings and new style classes

Initial Comment:
Unicode docstrings don't work with new style
classes. With old style classes they work:
----
class foo:
   u"föö"
class bar(object):
   u"bär"

print repr(foo.__doc__)
print repr(bar.__doc__)
----
This prints
----
u'f\xf6\xf6'
None


----------------------------------------------------------------------

Comment By: James Henstridge (jhenstridge)
Date: 2002-01-27 02:10

Message:
Logged In: YES 
user_id=146903

Put together a patch that gets rid of the type.__doc__
property, and sets __doc__ in PyType_Ready() (if
appropriate).  Seems to work okay in my tests and as a
bonus, "print type.__doc__" actually prints documentation on
using the type() function :)

SF doesn't seem to give me a way to attach a patch to this
bug, so I will paste a copy of the patch here (if it is
mangled, email me at james@daa.com.au for a copy):

--- Python-2.2/Objects/typeobject.c.orig	Tue Dec 18 01:14:22
2001
+++ Python-2.2/Objects/typeobject.c	Sun Jan 27 17:56:37 2002
@@ -8,7 +8,6 @@ static PyMemberDef type_members[] = {
 	{"__basicsize__", T_INT,
offsetof(PyTypeObject,tp_basicsize),READONLY},
 	{"__itemsize__", T_INT, offsetof(PyTypeObject,
tp_itemsize), READONLY},
 	{"__flags__", T_LONG, offsetof(PyTypeObject, tp_flags),
READONLY},
-	{"__doc__", T_STRING, offsetof(PyTypeObject, tp_doc),
READONLY},
 	{"__weakrefoffset__", T_LONG,
 	 offsetof(PyTypeObject, tp_weaklistoffset), READONLY},
 	{"__base__", T_OBJECT, offsetof(PyTypeObject, tp_base),
READONLY},
@@ -1044,9 +1043,9 @@ type_new(PyTypeObject *metatype,
PyObjec
 	}
 
 	/* Set tp_doc to a copy of dict['__doc__'], if the latter
is there
-	   and is a string (tp_doc is a char* -- can't copy a
general object
-	   into it).
-	   XXX What if it's a Unicode string?  Don't know -- this
ignores it.
+	   and is a string.  Note that the tp_doc slot will only
be used
+	   by C code -- python code will use the version in
tp_dict, so
+	   it isn't that important that non string __doc__'s are
ignored.
 	*/
 	{
 		PyObject *doc = PyDict_GetItemString(dict, "__doc__");
@@ -2024,6 +2023,19 @@ PyType_Ready(PyTypeObject *type)
 			inherit_slots(type, (PyTypeObject *)b);
 	}
 
+	/* if the type dictionary doesn't contain a __doc__, set
it from
+	   the tp_doc slot.
+	 */
+	if (PyDict_GetItemString(type->tp_dict, "__doc__") ==
NULL) {
+		if (type->tp_doc != NULL) {
+			PyObject *doc = PyString_FromString(type->tp_doc);
+			PyDict_SetItemString(type->tp_dict, "__doc__", doc);
+			Py_DECREF(doc);
+		} else {
+			PyDict_SetItemString(type->tp_dict, "__doc__", Py_None);
+		}
+	}
+
 	/* Some more special stuff */
 	base = type->tp_base;
 	if (base != NULL) {


----------------------------------------------------------------------

Comment By: James Henstridge (jhenstridge)
Date: 2002-01-27 01:37

Message:
Logged In: YES 
user_id=146903

I am posting some comments about this patch after my similar
bug was closed as a duplicate:
 
http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=507394

I just tested the typeobject.c patch, and it doesn't work
when using a descriptor as the __doc__ for an object (the
descriptor itself is returned for class.__doc__ rather than
the result of the tp_descr_get function).

With the patch applied, the output of the program attached
to the above mentioned bug is:
  OldClass.__doc__   = 'object=None              
type=OldClass'
  OldClass().__doc__ = 'object=OldClass instance 
type=OldClass'
  NewClass.__doc__   = <__main__.DocDescr object at
0x811ce34>
  NewClass().__doc__ = 'object=NewClass instance 
type=NewClass'

The suggestion I gave in the other bug is to get rid of the
type.__doc__ property/getset all together, and make
PyType_Ready() set __doc__ in tp_dict based on the value of
tp_doc.  Is there any reason why this wouldn't work?  (it
would seem to give behaviour more consistant with old style
classes, which would be good).

I will look at producing a patch to do this shortly.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-01-17 08:14

Message:
Logged In: YES 
user_id=89016

This sound much better. With my current patch all the 
docstrings for the builltin types are gone, because int 
etc. never goes through typeobject.c/type_new().

I updated the patch to use Guido's method.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-01-17 06:25

Message:
Logged In: YES 
user_id=6380

Wouldn't it be easier to set the __doc__ attribute in
tp_dict and be done with it? That's what classic classes do.
The accessor should still be a bit special: it should be
implemented as a property (in tp_getsets), and first look
for __doc__ in tp_dict and fall back to tp_doc.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-01-17 06:19

Message:
Logged In: YES 
user_id=89016

OK, I've attached the patch.

Note that I had to change the return value of 
PyStructSequence_InitType from void to int.

Introducing tp_docobject should provide backwards
compatibility for C extensions that still want to use
tp_doc as char *. If this is not relevant then we could
switch to PyObject *tp_doc immediately, but this 
complicates initializing a static type structure.



----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-17 05:45

Message:
Logged In: YES 
user_id=21627

Adding tp_docobject would work, although it may be somewhat
hackish (why should we have this kind of redundancy). I'm
not sure how you will convert that to the 8bit version,
though: what encoding? If you use the default encoding,
tp_doc will be sometimes set, sometimes it won't.

In any case, I'd encourage you to produce a patch.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2002-01-16 05:03

Message:
Logged In: YES 
user_id=89016

What we could do is add a new slot tp_docobject, that holds 
the doc object. Then type_members would include

{"__doc__", T_OBJECT, offsetof(PyTypeObject, tp_docobject), 
READONLY},

tp_doc should be initialized with an 8bit version of 
tp_docobject (using the default encoding and error='ignore' 
if tp_docobject is unicode).

Does this sound reasonably?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-01-16 04:18

Message:
Logged In: YES 
user_id=21627

There is a good chance that is caused by the lines following

XXX What if it's a Unicode string?  Don't know -- this
ignores it.

in Objects/typeobject.c. :-) Would you like to investigate
the options and propose a patch?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=504343&group_id=5470