[issue15952] format(value) and value.__format__() behave differently with unicode format

New submission from Chris Jerdonek: format(value) and value.__format__() behave differently even though the documentation says otherwise: "Note: format(value, format_spec) merely calls value.__format__(format_spec)." (from http://docs.python.org/library/functions.html?#format ) The difference happens when the format string is unicode. For example:
format(10, u'n') u'10' (10).__format__(u'n') # parentheses needed to prevent SyntaxError '10'
So either the documentation should be changed, or the behavior should be changed to match. Related to this: neither the "Format Specification Mini-Language" documentation nor the string.Formatter docs seem to say anything about the effect that a unicode format string should have on the return value (in particular, should it cause the return value to be unicode or not): http://docs.python.org/library/string.html#formatspec http://docs.python.org/library/string.html#string-formatting See also issue 15276 (int formatting), issue 15951 (empty format string), and issue 7300 (unicode arguments). ---------- assignee: docs@python components: Documentation messages: 170575 nosy: cjerdonek, docs@python priority: normal severity: normal status: open title: format(value) and value.__format__() behave differently with unicode format type: behavior versions: Python 2.7 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue15952> _______________________________________

Changes by Arfrever Frehtes Taifersar Arahesis <Arfrever.FTA@GMail.Com>: ---------- nosy: +Arfrever _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue15952> _______________________________________

Chris Jerdonek added the comment: See this code comment: /* don't define FORMAT_LONG, FORMAT_FLOAT, and FORMAT_COMPLEX, since we can live with only the string versions of those. The builtin format() will convert them to unicode. */ from http://hg.python.org/cpython/file/19601d451d4c/Python/formatter_unicode.c In other words, it was deliberate not to make value.__format__(format_spec) return unicode when format_spec is unicode. So the docs should be adjusted to say that they are not always the same. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue15952> _______________________________________

Eric V. Smith added the comment: I believe the conversion is happening in Objects/abstract.c in PyObject_Format, around line 864, near this comment: /* Convert to unicode, if needed. Required if spec is unicode and result is str */ I think changing the docs will result in more confusion than clarity, but if you can come up with some good wording, I'd be okay with it. I think changing the code will likely break things with little or no benefit. ---------- nosy: +eric.smith _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue15952> _______________________________________

Chris Jerdonek added the comment: Here is a proposed patch. One note on the patch. I feel the second sentence of the note is worth adding because value.__format__() departs from what PEP 3101 says: "Note for Python 2.x: The 'format_spec' argument will be either a string object or a unicode object, depending on the type of the original format string. The __format__ method should test the type of the specifiers parameter to determine whether to return a string or unicode object. It is the responsibility of the __format__ method to return an object of the proper type." The extra sentence will help in heading off and when responding to issues about value.__format__() that are similar to issue 15951. ---------- keywords: +patch stage: -> patch review Added file: http://bugs.python.org/file27218/issue-15952-1-branch-27.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue15952> _______________________________________

Chris Jerdonek added the comment: To clarify, one of the sentences above should have read, "I feel the second sentence of the note *in the patch* was worth adding..." (not the second sentence of the PEP note I quoted). ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue15952> _______________________________________

Changes by Chris Jerdonek <chris.jerdonek@gmail.com>: ---------- nosy: +ezio.melotti _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue15952> _______________________________________

Ezio Melotti added the comment: ``format(value, format_spec)`` merely calls - ``value.__format__(format_spec)``. + ``value.__format__(format_spec)`` and, if *format_spec* is Unicode, + converts the value to Unicode if it is not already Unicode. This is correct, but should be rephrased (and "value" should be "return value"). + The method ``value.__format__(format_spec)`` may return 8-bit strings + for some built-in types when *format_spec* is Unicode. This is not limited to built-in types. __format__() might return either str or unicode, and format() returns the same -- except for the aforementioned case. This is a summary of the possible cases. __format__ can return unicode or str:
class Uni(object): ... def __format__(*args): return u'uni' ... class Str(object): ... def __format__(*args): return 'str' ...
format() and __format__ return the same value, except when the format_spec is unicode and __format__ returns str:
format(Uni(), 'd'), Uni().__format__( 'd') # same (u'uni', u'uni') format(Uni(), u'd'), Uni().__format__(u'd') # same (u'uni', u'uni') format(Str(), 'd'), Str().__format__( 'd') # same ('str', 'str') format(Str(), u'd'), Str().__format__(u'd') # different (u'str', 'str')
It is also not true that the type of return value is the same of the format_spec, because in the first case the returned type is unicode even if the format_spec is str. Therefore this part of the patch should be changed: + Per :pep:`3101`, the function returns a Unicode object if *format_spec* is + Unicode. Otherwise, it returns an 8-bit string. The behavior might be against PEP 3101 (see quotation in msg170669), even thought the wording of the PEP is somewhat lenient IMHO ("proper type" doesn't necessary mean "same type"). ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue15952> _______________________________________

Change by Serhiy Storchaka <storchaka+cpython@gmail.com>: ---------- resolution: -> out of date stage: patch review -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue15952> _______________________________________
participants (5)
-
Arfrever Frehtes Taifersar Arahesis
-
Chris Jerdonek
-
Eric V. Smith
-
Ezio Melotti
-
Serhiy Storchaka