Inconsistency in 2.4.3 for __repr__() returning unicode
We got an inconsistency for __repr__() returning unicode as reported in http://python.org/sf/1459029 : class s1: def __repr__(self): return '\\n' class s2: def __repr__(self): return u'\\n' print repr(s1()), repr(s2()) Until 2.4.2: \n \n 2.4.3: \n \\n \\n looks bit weird but it's correct. As once discussed[1] in python-dev before, if __repr__ returns unicode object, PyObject_Repr encodes it via unicode-escape codec. So, non-latin character also could be in repr neutrally. But our unicode-escape had a bug since when it is introduced. The bug was that it doesn't escape backslashes. Therefore, backslashes wasn't escaped in repr while it sholud be escaped because we used the unicode-escape codec. So, fixing the bug made a behavior inconsistency. How do we resolve the problem? Hye-Shik [1] http://mail.python.org/pipermail/python-dev/2000-July/005353.html
Hye-Shik Chang wrote:
We got an inconsistency for __repr__() returning unicode as reported in http://python.org/sf/1459029 :
class s1: def __repr__(self): return '\\n'
class s2: def __repr__(self): return u'\\n'
print repr(s1()), repr(s2())
Until 2.4.2: \n \n 2.4.3: \n \\n
\\n looks bit weird but it's correct. As once discussed[1] in python-dev before, if __repr__ returns unicode object, PyObject_Repr encodes it via unicode-escape codec. So, non-latin character also could be in repr neutrally.
I don't think that using unicode-escape is the right choice for converting a string returned by __repr__ to a string - why would you want to escape a Unicode string that was specifically prepared to provide the representation of an object ?
But our unicode-escape had a bug since when it is introduced. The bug was that it doesn't escape backslashes. Therefore, backslashes wasn't escaped in repr while it sholud be escaped because we used the unicode-escape codec.
So, fixing the bug made a behavior inconsistency. How do we resolve the problem?
Change PyObject_Repr() to use the default encoding (again) which is also consistent with how PyObject_Str() works. To make repr() conversion more robust, we could have PyObject_Repr() apply the conversion using the 'replace' error strategy - after all, repr() is usually only used for debugging, where it's more important that you do get an output rather than an exception.
Hye-Shik
[1] http://mail.python.org/pipermail/python-dev/2000-July/005353.html
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 27 2006)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
On Monday 27 March 2006 21:14, M.-A. Lemburg wrote:
Change PyObject_Repr() to use the default encoding (again) which is also consistent with how PyObject_Str() works.
For 2.4.3, I plan to just revert the following patch (and supply a test case) Index: Objects/object.c =================================================================== --- Objects/object.c (revision 16197) +++ Objects/object.c (revision 16198) @@ -267,7 +267,7 @@ return NULL; if (PyUnicode_Check(res)) { PyObject* str; - str = PyUnicode_AsEncodedString(res, NULL, NULL); + str = PyUnicode_AsUnicodeEscapeString(res); Py_DECREF(res); if (str) res = str; Does anyone have any objections to this? The test suite passes with this (including the new test) as do various random tests I could string together. I need to apply this in the next short while, so if you have an issue with it, please speak up now! Thanks, Anthony
To make repr() conversion more robust, we could have PyObject_Repr() apply the conversion using the 'replace' error strategy - after all, repr() is usually only used for debugging, where it's more important that you do get an output rather than an exception.
--
Anthony Baxter
Never mind. For 2.4.3, I reverted perky's patch for the unicode-escape, and reverted the old patch for PyObject_Repr on the trunk. After talking to perky and Neal, this seemed like the safest option for 2.4.3. Anthony
participants (3)
-
Anthony Baxter
-
Hye-Shik Chang
-
M.-A. Lemburg