Changes in semantics to str()?
When we changed floats to behave different on repr() than on str(), we briefly discussed changes to the container objects as well, but nothing came of it. Currently, str() of a tuple, list or dictionary is the same as repr() of those objects. This is not very consistent. For example, when we have a float like 1.1 which can't be represented exactly, str() yields "1.1" but repr() yields "1.1000000000000001". But if we place the same number in a list, it doesn't matter which function we use: we always get "[1.1000000000000001]". Below I have included changes to listobject.c, tupleobject.c and dictobject.c that fix this. The fixes change the print and str() callbacks for these objects to use PyObject_Str() on the contained items -- except if the item is a string or Unicode string. I made these exceptions because I don't like the idea of str(["abc"]) yielding [abc] -- I'm too used to the idea of seeing ['abc'] here. And str() of a Unicode object fails when it contains non-ASCII characters, so that's no good either -- it would break too much code. Is it too late to check this in? Another negative consequence would be that for user-defined or 3rd party extension objects that have different repr() and str(), like NumPy arrays, it might break some code -- but I think this is not very likely. --Guido van Rossum (home page: http://www.python.org/~guido/) *** dictobject.c 2000/09/01 23:29:27 2.65 --- dictobject.c 2000/09/30 16:03:04 *************** *** 594,599 **** --- 594,601 ---- register int i; register int any; register dictentry *ep; + PyObject *item; + int itemflags; i = Py_ReprEnter((PyObject*)mp); if (i != 0) { *************** *** 609,620 **** if (ep->me_value != NULL) { if (any++ > 0) fprintf(fp, ", "); ! if (PyObject_Print((PyObject *)ep->me_key, fp, 0)!=0) { Py_ReprLeave((PyObject*)mp); return -1; } fprintf(fp, ": "); ! if (PyObject_Print(ep->me_value, fp, 0) != 0) { Py_ReprLeave((PyObject*)mp); return -1; } --- 611,630 ---- if (ep->me_value != NULL) { if (any++ > 0) fprintf(fp, ", "); ! item = (PyObject *)ep->me_key; ! itemflags = flags; ! if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) ! itemflags = 0; ! if (PyObject_Print(item, fp, itemflags)!=0) { Py_ReprLeave((PyObject*)mp); return -1; } fprintf(fp, ": "); ! item = ep->me_value; ! itemflags = flags; ! if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) ! itemflags = 0; ! if (PyObject_Print(item, fp, itemflags) != 0) { Py_ReprLeave((PyObject*)mp); return -1; } *************** *** 661,666 **** --- 671,722 ---- return v; } + static PyObject * + dict_str(dictobject *mp) + { + auto PyObject *v; + PyObject *sepa, *colon, *item, *repr; + register int i; + register int any; + register dictentry *ep; + + i = Py_ReprEnter((PyObject*)mp); + if (i != 0) { + if (i > 0) + return PyString_FromString("{...}"); + return NULL; + } + + v = PyString_FromString("{"); + sepa = PyString_FromString(", "); + colon = PyString_FromString(": "); + any = 0; + for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { + if (ep->me_value != NULL) { + if (any++) + PyString_Concat(&v, sepa); + item = ep->me_key; + if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) + repr = PyObject_Repr(item); + else + repr = PyObject_Str(item); + PyString_ConcatAndDel(&v, repr); + PyString_Concat(&v, colon); + item = ep->me_value; + if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) + repr = PyObject_Repr(item); + else + repr = PyObject_Str(item); + PyString_ConcatAndDel(&v, repr); + } + } + PyString_ConcatAndDel(&v, PyString_FromString("}")); + Py_ReprLeave((PyObject*)mp); + Py_XDECREF(sepa); + Py_XDECREF(colon); + return v; + } + static int dict_length(dictobject *mp) { *************** *** 1193,1199 **** &dict_as_mapping, /*tp_as_mapping*/ 0, /* tp_hash */ 0, /* tp_call */ ! 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ --- 1249,1255 ---- &dict_as_mapping, /*tp_as_mapping*/ 0, /* tp_hash */ 0, /* tp_call */ ! (reprfunc)dict_str, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Index: listobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v retrieving revision 2.88 diff -c -r2.88 listobject.c *** listobject.c 2000/09/26 05:46:01 2.88 --- listobject.c 2000/09/30 16:03:04 *************** *** 197,203 **** static int list_print(PyListObject *op, FILE *fp, int flags) { ! int i; i = Py_ReprEnter((PyObject*)op); if (i != 0) { --- 197,204 ---- static int list_print(PyListObject *op, FILE *fp, int flags) { ! int i, itemflags; ! PyObject *item; i = Py_ReprEnter((PyObject*)op); if (i != 0) { *************** *** 210,216 **** for (i = 0; i < op->ob_size; i++) { if (i > 0) fprintf(fp, ", "); ! if (PyObject_Print(op->ob_item[i], fp, 0) != 0) { Py_ReprLeave((PyObject *)op); return -1; } --- 211,221 ---- for (i = 0; i < op->ob_size; i++) { if (i > 0) fprintf(fp, ", "); ! item = op->ob_item[i]; ! itemflags = flags; ! if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) ! itemflags = 0; ! if (PyObject_Print(item, fp, itemflags) != 0) { Py_ReprLeave((PyObject *)op); return -1; } *************** *** 245,250 **** --- 250,285 ---- return s; } + static PyObject * + list_str(PyListObject *v) + { + PyObject *s, *comma, *item, *repr; + int i; + + i = Py_ReprEnter((PyObject*)v); + if (i != 0) { + if (i > 0) + return PyString_FromString("[...]"); + return NULL; + } + s = PyString_FromString("["); + comma = PyString_FromString(", "); + for (i = 0; i < v->ob_size && s != NULL; i++) { + if (i > 0) + PyString_Concat(&s, comma); + item = v->ob_item[i]; + if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) + repr = PyObject_Repr(item); + else + repr = PyObject_Str(item); + PyString_ConcatAndDel(&s, repr); + } + Py_XDECREF(comma); + PyString_ConcatAndDel(&s, PyString_FromString("]")); + Py_ReprLeave((PyObject *)v); + return s; + } + static int list_compare(PyListObject *v, PyListObject *w) { *************** *** 1484,1490 **** 0, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ ! 0, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ --- 1519,1525 ---- 0, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ ! (reprfunc)list_str, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ *************** *** 1561,1567 **** 0, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ ! 0, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ --- 1596,1602 ---- 0, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ ! (reprfunc)list_str, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ Index: tupleobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/tupleobject.c,v retrieving revision 2.46 diff -c -r2.46 tupleobject.c *** tupleobject.c 2000/09/15 07:32:39 2.46 --- tupleobject.c 2000/09/30 16:03:04 *************** *** 167,178 **** static int tupleprint(PyTupleObject *op, FILE *fp, int flags) { ! int i; fprintf(fp, "("); for (i = 0; i < op->ob_size; i++) { if (i > 0) fprintf(fp, ", "); ! if (PyObject_Print(op->ob_item[i], fp, 0) != 0) return -1; } if (op->ob_size == 1) --- 167,183 ---- static int tupleprint(PyTupleObject *op, FILE *fp, int flags) { ! int i, itemflags; ! PyObject *item; fprintf(fp, "("); for (i = 0; i < op->ob_size; i++) { if (i > 0) fprintf(fp, ", "); ! item = op->ob_item[i]; ! itemflags = flags; ! if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) ! itemflags = 0; ! if (PyObject_Print(item, fp, itemflags) != 0) return -1; } if (op->ob_size == 1) *************** *** 200,205 **** --- 205,234 ---- return s; } + static PyObject * + tuplestr(PyTupleObject *v) + { + PyObject *s, *comma, *item, *repr; + int i; + s = PyString_FromString("("); + comma = PyString_FromString(", "); + for (i = 0; i < v->ob_size && s != NULL; i++) { + if (i > 0) + PyString_Concat(&s, comma); + item = v->ob_item[i]; + if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) + repr = PyObject_Repr(item); + else + repr = PyObject_Str(item); + PyString_ConcatAndDel(&s, repr); + } + Py_DECREF(comma); + if (v->ob_size == 1) + PyString_ConcatAndDel(&s, PyString_FromString(",")); + PyString_ConcatAndDel(&s, PyString_FromString(")")); + return s; + } + static int tuplecompare(register PyTupleObject *v, register PyTupleObject *w) { *************** *** 412,418 **** 0, /*tp_as_mapping*/ (hashfunc)tuplehash, /*tp_hash*/ 0, /*tp_call*/ ! 0, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ --- 441,447 ---- 0, /*tp_as_mapping*/ (hashfunc)tuplehash, /*tp_hash*/ 0, /*tp_call*/ ! (reprfunc)tuplestr, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/
On Sat, Sep 30, 2000 at 03:56:18PM -0500, Guido van Rossum wrote:
Below I have included changes to listobject.c, tupleobject.c and dictobject.c that fix this. The fixes change the print and str() callbacks for these objects to use PyObject_Str() on the contained items -- except if the item is a string or Unicode string. I made these exceptions because I don't like the idea of str(["abc"]) yielding [abc] -- I'm too used to the idea of seeing ['abc'] here. And str() of a Unicode object fails when it contains non-ASCII characters, so that's no good either -- it would break too much code.
Personally, I'm -0, or perhaps -1 (I'll make up my mind while going out for some lunch/breakfast ;) I would be in favor if the str() output were only used to display something to a user, but that's not the case. And too many people are under the impression that the 'print' handler is the same as the 'str' handler to make that distinction now, I'm afraid. My main gripe with this change is that it makes str() for container objects unreliable... Strings are a special case, but class-instances are not -- so something like UserString will be displayed without quotes. I don't like the idea of sometimes doing 'str' and sometimes doing 'repr'. I understand what it's trying to solve, but I don't think that's a worse inconsistency than the one this change introduces. It's also easy to explain: 'str(list or dict) is the same as repr(list or dict)'. The new phrase would be something like 'str(list or dict) calls str() on the objects it contains, except for string and unicode objects.'. And even that breaks when you mix in instances that wrap a container. A list containing a UserList containing a set of (unicode-)strings would display the strings without quotes. And you can't see that the second container is a UserList. I also don't think this change should be made between the final beta and 2.0. Jeremy, don't let him ruin your feature freeze! :-) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
I guess Thomas has settled the fate of my str() patch -- UserString won't be dealt with properly. I hereby withdraw the patch. (I'm not sure what Marc-Andre means by buffer objects whose str() is long but whose repr() is short, but it's probably a similar issue.) --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
I guess Thomas has settled the fate of my str() patch -- UserString won't be dealt with properly. I hereby withdraw the patch.
(I'm not sure what Marc-Andre means by buffer objects whose str() is long but whose repr() is short, but it's probably a similar issue.)
Example for the record:
b = buffer('long string') b <read-only buffer for 0x82a0160, ptr 0x82a0174, size 11 at 0x82db170> l = [b] l [<read-only buffer for 0x82a0160, ptr 0x82a0174, size 11 at 0x82db170>] [str(b)] ['long string']
-- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Guido van Rossum wrote:
When we changed floats to behave different on repr() than on str(), we briefly discussed changes to the container objects as well, but nothing came of it.
Currently, str() of a tuple, list or dictionary is the same as repr() of those objects. This is not very consistent. For example, when we have a float like 1.1 which can't be represented exactly, str() yields "1.1" but repr() yields "1.1000000000000001". But if we place the same number in a list, it doesn't matter which function we use: we always get "[1.1000000000000001]".
Below I have included changes to listobject.c, tupleobject.c and dictobject.c that fix this. The fixes change the print and str() callbacks for these objects to use PyObject_Str() on the contained items -- except if the item is a string or Unicode string. I made these exceptions because I don't like the idea of str(["abc"]) yielding [abc] -- I'm too used to the idea of seeing ['abc'] here. And str() of a Unicode object fails when it contains non-ASCII characters, so that's no good either -- it would break too much code.
Is it too late to check this in? Another negative consequence would be that for user-defined or 3rd party extension objects that have different repr() and str(), like NumPy arrays, it might break some code -- but I think this is not very likely.
-1 I don't think that such a change is really worth breaking code which relies on the current output of repr(list) or str(list). As the test script from Fredrik for the Unicode database showed there are some very subtle implications to changes in str() and repr() -- e.g. in the mentioned case the hash value would change. Also, if I understand the patch correctly, str(list) would be almost the same as '[' + ', '.join(map(str, entries)) + ']' and '[' + ', '.join(map(repr, entries)) + ']' for repr(). While this may seem more transparent, I think it will cause problems in practice: e.g. for large data buffers, str(list) would now return the contents of the buffers rather than just an abstract note about a buffer containing the memory mapped data of file xyz.txt. As consequence, you'd suddenly get a few MBs of output instead of a 100 char string... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (3)
-
Guido van Rossum
-
M.-A. Lemburg
-
Thomas Wouters