bpo-34595: How to format a type name?
Hi, Last week, I opened an issue to propose to add a new %T formatter to PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat() and PyErr_Format(): https://bugs.python.org/issue34595 I merged my change, but then Serhiy Storchaka asked if we can add something to get the "fully qualified name" (FQN) of a type, ex "datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name). I proposed a second pull request to add %t (short) in addition to %T (FQN). But then Petr Viktorin asked me to open a thread on python-dev to get a wider discussion. So here I am. The rationale for this change is to fix multiple issues: * C extensions use Py_TYPE(obj)->tp_name which returns a fully qualified name for C types, but the name (without the module) for Python name. Python modules use type(obj).__name__ which always return the short name. * currently, many C extensions truncate the type name: use "%.80s" instead of "%s" to format a type name * "%s" with Py_TYPE(obj)->tp_name is used more than 200 times in the C code, and I dislike this complex pattern. IMHO "%t" with obj would be simpler to read, write and maintain. * I want C extensions and Python modules to have the same behavior: respect the PEP 399. Petr considers that error messages are not part of the PEP 399, but the issue is wider than only error messages. The main issue is that at the C level, Py_TYPE(obj)->tp_name is "usually" the fully qualified name for types defined in C, but it's only the "short" name for types defined in Python. For example, if you get the C accelerator "_datetime", PyTYPE(obj)->tp_name of a datetime.timedelta object gives you "datetime.timedelta", but if you don't have the accelerator, tp_name is just "timedelta". Another example, this script displays "mytimedelta(0)" if you have the C accelerator, but "__main__.mytimedelta(0)" if you use the Python implementation: --- import sys #sys.modules['_datetime'] = None import datetime class mytimedelta(datetime.timedelta): pass print(repr(mytimedelta())) --- So I would like to fix this kind of issue. Type names are mainly used for two purposes: * format an error message * obj.__repr__() It's unclear to me if we should use the "short" or the "fully qualified" name. It should maybe be decided on a case by case basis. There is also a 3rd usage: to implement __reduce__, here backward compatibility matters. Note: The discussion evolved since my first implementation of %T which just used the not well defined Py_TYPE(obj)->tp_name. -- Petr asked me why not exposing functions to get these names. For example, with my second PR (not merged), there are 3 (private) functions: /* type.__name__ */ const char* _PyType_Name(PyTypeObject *type); /* type.__qualname__ */ PyObject* _PyType_QualName(PyTypeObject *type); * type.__module__ "." type.__qualname__ (but type.__qualname__ for builtin types) */ PyObject * _PyType_FullName(PyTypeObject *type); My concern here is that each caller has to handler error: PyErr_Format(PyExc_TypeError, "must be str, not %.100s", Py_TYPE(obj)->tp_name); would become: PyObject *type_name = _PyType_FullName(Py_TYPE(obj)); if (name == NULL) { /* do something with this error ... */ PyErr_Format(PyExc_TypeError, "must be str, not %U", type_name); Py_DECREF(name); When I report an error, I dislike having to handle *new* errors... I prefer that the error handling is done inside PyErr_Format() for me, to reduce the risk of additional bugs. -- Serhiy also asked if we could expose the same feature at the *Python* level: provide something to get the fully qualified name of a type. It's not just f"{type(obj).__module}.{type(obj).__name__}", but you have to skip the module for builtin types like "str" (not return "builtins.str"). Maybe we can have "name: {0:t}, FQN: {0:T}".format(type(obj)). "t" for name and "T" for fully qualfied name. We would only have to modify type.__format__(). I'm not sure if we need to add new formatters to str % args. Example of Python code: raise TypeError("must be str, not %s" % type(fmt).__name__) I'm not sure about Python changes. My first concern was just to avoid Py_TYPE(obj)->tp_name at the C level. But again, we should keep C and Python consistent. If the behavior of C extensions change, Python modules should be adapted as well, to get the same behavior. Note: I reverted my change which added the %T formatter from PyUnicode_FromFormatV() to clarify the status of this issue. Victor
On 09/11/18 15:23, Victor Stinner wrote:
Hi,
Last week, I opened an issue to propose to add a new %T formatter to PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat() and PyErr_Format():
https://bugs.python.org/issue34595
I merged my change, but then Serhiy Storchaka asked if we can add something to get the "fully qualified name" (FQN) of a type, ex "datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name). I proposed a second pull request to add %t (short) in addition to %T (FQN).
But then Petr Viktorin asked me to open a thread on python-dev to get a wider discussion. So here I am.
The rationale for this change is to fix multiple issues:
* C extensions use Py_TYPE(obj)->tp_name which returns a fully qualified name for C types, but the name (without the module) for Python name. Python modules use type(obj).__name__ which always return the short name.
That might be a genuine problem, but I wonder if "%T" is fixing the symptom rather than the cause here. Or is this only an issue for PyUnicode_FromFormat()?
* currently, many C extensions truncate the type name: use "%.80s" instead of "%s" to format a type name
That's an orthogonal issue -- you can change "%.80s" to "%s", and presumably you could use "%.80t" as well.
* "%s" with Py_TYPE(obj)->tp_name is used more than 200 times in the C code, and I dislike this complex pattern. IMHO "%t" with obj would be simpler to read, write and maintain.
I consider `Py_TYPE(obj)->tp_name` much more understandable than "%t". It's longer to spell out, but it's quite self-documenting.
* I want C extensions and Python modules to have the same behavior: respect the PEP 399. Petr considers that error messages are not part of the PEP 399, but the issue is wider than only error messages.
The other major use is for __repr__, which AFAIK we also don't guarantee to be stable, so I don't think PEP 399 applies to it. Having the same behavior between C and Python versions of a module is nice, but PEP 399 doesn't prescribe it. There are other differences as well -- for example, `_datetime.datetime` is immutable, and that's OK. If error messages and __repr__s should be consistent between Python and the C accelerator, are you planning to write tests for all the affected modules when switching them to %T/%t?
The main issue is that at the C level, Py_TYPE(obj)->tp_name is "usually" the fully qualified name for types defined in C, but it's only the "short" name for types defined in Python.
For example, if you get the C accelerator "_datetime", PyTYPE(obj)->tp_name of a datetime.timedelta object gives you "datetime.timedelta", but if you don't have the accelerator, tp_name is just "timedelta".
Another example, this script displays "mytimedelta(0)" if you have the C accelerator, but "__main__.mytimedelta(0)" if you use the Python implementation: --- import sys #sys.modules['_datetime'] = None import datetime
class mytimedelta(datetime.timedelta): pass
print(repr(mytimedelta())) ---
So I would like to fix this kind of issue.
Type names are mainly used for two purposes:
* format an error message * obj.__repr__()
It's unclear to me if we should use the "short" or the "fully qualified" name. It should maybe be decided on a case by case basis.
There is also a 3rd usage: to implement __reduce__, here backward compatibility matters.
Note: The discussion evolved since my first implementation of %T which just used the not well defined Py_TYPE(obj)->tp_name.
--
Petr asked me why not exposing functions to get these names. For example, with my second PR (not merged), there are 3 (private) functions:
/* type.__name__ */ const char* _PyType_Name(PyTypeObject *type); /* type.__qualname__ */ PyObject* _PyType_QualName(PyTypeObject *type); * type.__module__ "." type.__qualname__ (but type.__qualname__ for builtin types) */ PyObject * _PyType_FullName(PyTypeObject *type);
My concern here is that each caller has to handler error:
PyErr_Format(PyExc_TypeError, "must be str, not %.100s", Py_TYPE(obj)->tp_name);
would become:
PyObject *type_name = _PyType_FullName(Py_TYPE(obj)); if (name == NULL) { /* do something with this error ... */ PyErr_Format(PyExc_TypeError, "must be str, not %U", type_name); Py_DECREF(name);
When I report an error, I dislike having to handle *new* errors... I prefer that the error handling is done inside PyErr_Format() for me, to reduce the risk of additional bugs.
--
Serhiy also asked if we could expose the same feature at the *Python* level: provide something to get the fully qualified name of a type. It's not just f"{type(obj).__module}.{type(obj).__name__}", but you have to skip the module for builtin types like "str" (not return "builtins.str").
Maybe we can have "name: {0:t}, FQN: {0:T}".format(type(obj)). "t" for name and "T" for fully qualfied name. We would only have to modify type.__format__().
I'm not sure if we need to add new formatters to str % args.
Example of Python code:
raise TypeError("must be str, not %s" % type(fmt).__name__)
I'm not sure about Python changes. My first concern was just to avoid Py_TYPE(obj)->tp_name at the C level. But again, we should keep C and Python consistent. If the behavior of C extensions change, Python modules should be adapted as well, to get the same behavior.
Note: I reverted my change which added the %T formatter from PyUnicode_FromFormatV() to clarify the status of this issue.
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/encukou%40gmail.com
On 2018-09-11 23:23, Victor Stinner wrote:
Hi,
Last week, I opened an issue to propose to add a new %T formatter to PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat() and PyErr_Format():
https://bugs.python.org/issue34595
I merged my change, but then Serhiy Storchaka asked if we can add something to get the "fully qualified name" (FQN) of a type, ex "datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name). I proposed a second pull request to add %t (short) in addition to %T (FQN).
But then Petr Viktorin asked me to open a thread on python-dev to get a wider discussion. So here I am.
The rationale for this change is to fix multiple issues:
* C extensions use Py_TYPE(obj)->tp_name which returns a fully qualified name for C types, but the name (without the module) for Python name. Python modules use type(obj).__name__ which always return the short name.
* currently, many C extensions truncate the type name: use "%.80s" instead of "%s" to format a type name
* "%s" with Py_TYPE(obj)->tp_name is used more than 200 times in the C code, and I dislike this complex pattern. IMHO "%t" with obj would be simpler to read, write and maintain.
* I want C extensions and Python modules to have the same behavior: respect the PEP 399. Petr considers that error messages are not part of the PEP 399, but the issue is wider than only error messages.
The main issue is that at the C level, Py_TYPE(obj)->tp_name is "usually" the fully qualified name for types defined in C, but it's only the "short" name for types defined in Python.
For example, if you get the C accelerator "_datetime", PyTYPE(obj)->tp_name of a datetime.timedelta object gives you "datetime.timedelta", but if you don't have the accelerator, tp_name is just "timedelta".
Another example, this script displays "mytimedelta(0)" if you have the C accelerator, but "__main__.mytimedelta(0)" if you use the Python implementation: --- import sys #sys.modules['_datetime'] = None import datetime
class mytimedelta(datetime.timedelta): pass
print(repr(mytimedelta())) ---
So I would like to fix this kind of issue.
Type names are mainly used for two purposes:
* format an error message * obj.__repr__()
It's unclear to me if we should use the "short" or the "fully qualified" name. It should maybe be decided on a case by case basis.
There is also a 3rd usage: to implement __reduce__, here backward compatibility matters.
Note: The discussion evolved since my first implementation of %T which just used the not well defined Py_TYPE(obj)->tp_name.
--
Petr asked me why not exposing functions to get these names. For example, with my second PR (not merged), there are 3 (private) functions:
/* type.__name__ */ const char* _PyType_Name(PyTypeObject *type); /* type.__qualname__ */ PyObject* _PyType_QualName(PyTypeObject *type); * type.__module__ "." type.__qualname__ (but type.__qualname__ for builtin types) */ PyObject * _PyType_FullName(PyTypeObject *type);
My concern here is that each caller has to handler error:
PyErr_Format(PyExc_TypeError, "must be str, not %.100s", Py_TYPE(obj)->tp_name);
would become:
PyObject *type_name = _PyType_FullName(Py_TYPE(obj)); if (name == NULL) { /* do something with this error ... */ PyErr_Format(PyExc_TypeError, "must be str, not %U", type_name); Py_DECREF(name);
When I report an error, I dislike having to handle *new* errors... I prefer that the error handling is done inside PyErr_Format() for me, to reduce the risk of additional bugs.
--
Serhiy also asked if we could expose the same feature at the *Python* level: provide something to get the fully qualified name of a type. It's not just f"{type(obj).__module}.{type(obj).__name__}", but you have to skip the module for builtin types like "str" (not return "builtins.str").
Maybe we can have "name: {0:t}, FQN: {0:T}".format(type(obj)). "t" for name and "T" for fully qualfied name. We would only have to modify type.__format__().
I'm not sure if we need to add new formatters to str % args.
Example of Python code:
raise TypeError("must be str, not %s" % type(fmt).__name__)
I'm not sure about Python changes. My first concern was just to avoid Py_TYPE(obj)->tp_name at the C level. But again, we should keep C and Python consistent. If the behavior of C extensions change, Python modules should be adapted as well, to get the same behavior.
Note: I reverted my change which added the %T formatter from PyUnicode_FromFormatV() to clarify the status of this issue.
I'm not sure about having 2 different, though similar, format codes for 2 similar, though slightly different, cases. (And, for all we know, we might want to use "%t" at some later date for something else.) Perhaps we could have a single format code plus an optional '#' for the "alternate form": %T for short form %#T for fully qualified name
FWIW, I personally think this went to python-dev prematurely. This is a shallow problem and IMO doesn't need all that much handwringing. Yes, there are a few different alternatives. So list them all concisely in the tracker and think about it for a few minutes and then pick one. No matter what's picked it'll be better than the status quo -- which is that printing a class name produces either a full or a short name depending on whether it's defined in C or Python, and there's no simple pattern (in C) to print either the full or the short name. On Tue, Sep 11, 2018 at 4:09 PM MRAB <python@mrabarnett.plus.com> wrote:
On 2018-09-11 23:23, Victor Stinner wrote:
Hi,
Last week, I opened an issue to propose to add a new %T formatter to PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat() and PyErr_Format():
https://bugs.python.org/issue34595
I merged my change, but then Serhiy Storchaka asked if we can add something to get the "fully qualified name" (FQN) of a type, ex "datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name). I proposed a second pull request to add %t (short) in addition to %T (FQN).
But then Petr Viktorin asked me to open a thread on python-dev to get a wider discussion. So here I am.
The rationale for this change is to fix multiple issues:
* C extensions use Py_TYPE(obj)->tp_name which returns a fully qualified name for C types, but the name (without the module) for Python name. Python modules use type(obj).__name__ which always return the short name.
* currently, many C extensions truncate the type name: use "%.80s" instead of "%s" to format a type name
* "%s" with Py_TYPE(obj)->tp_name is used more than 200 times in the C code, and I dislike this complex pattern. IMHO "%t" with obj would be simpler to read, write and maintain.
* I want C extensions and Python modules to have the same behavior: respect the PEP 399. Petr considers that error messages are not part of the PEP 399, but the issue is wider than only error messages.
The main issue is that at the C level, Py_TYPE(obj)->tp_name is "usually" the fully qualified name for types defined in C, but it's only the "short" name for types defined in Python.
For example, if you get the C accelerator "_datetime", PyTYPE(obj)->tp_name of a datetime.timedelta object gives you "datetime.timedelta", but if you don't have the accelerator, tp_name is just "timedelta".
Another example, this script displays "mytimedelta(0)" if you have the C accelerator, but "__main__.mytimedelta(0)" if you use the Python implementation: --- import sys #sys.modules['_datetime'] = None import datetime
class mytimedelta(datetime.timedelta): pass
print(repr(mytimedelta())) ---
So I would like to fix this kind of issue.
Type names are mainly used for two purposes:
* format an error message * obj.__repr__()
It's unclear to me if we should use the "short" or the "fully qualified" name. It should maybe be decided on a case by case basis.
There is also a 3rd usage: to implement __reduce__, here backward compatibility matters.
Note: The discussion evolved since my first implementation of %T which just used the not well defined Py_TYPE(obj)->tp_name.
--
Petr asked me why not exposing functions to get these names. For example, with my second PR (not merged), there are 3 (private) functions:
/* type.__name__ */ const char* _PyType_Name(PyTypeObject *type); /* type.__qualname__ */ PyObject* _PyType_QualName(PyTypeObject *type); * type.__module__ "." type.__qualname__ (but type.__qualname__ for builtin types) */ PyObject * _PyType_FullName(PyTypeObject *type);
My concern here is that each caller has to handler error:
PyErr_Format(PyExc_TypeError, "must be str, not %.100s", Py_TYPE(obj)->tp_name);
would become:
PyObject *type_name = _PyType_FullName(Py_TYPE(obj)); if (name == NULL) { /* do something with this error ... */ PyErr_Format(PyExc_TypeError, "must be str, not %U", type_name); Py_DECREF(name);
When I report an error, I dislike having to handle *new* errors... I prefer that the error handling is done inside PyErr_Format() for me, to reduce the risk of additional bugs.
--
Serhiy also asked if we could expose the same feature at the *Python* level: provide something to get the fully qualified name of a type. It's not just f"{type(obj).__module}.{type(obj).__name__}", but you have to skip the module for builtin types like "str" (not return "builtins.str").
Maybe we can have "name: {0:t}, FQN: {0:T}".format(type(obj)). "t" for name and "T" for fully qualfied name. We would only have to modify type.__format__().
I'm not sure if we need to add new formatters to str % args.
Example of Python code:
raise TypeError("must be str, not %s" % type(fmt).__name__)
I'm not sure about Python changes. My first concern was just to avoid Py_TYPE(obj)->tp_name at the C level. But again, we should keep C and Python consistent. If the behavior of C extensions change, Python modules should be adapted as well, to get the same behavior.
Note: I reverted my change which added the %T formatter from PyUnicode_FromFormatV() to clarify the status of this issue.
I'm not sure about having 2 different, though similar, format codes for 2 similar, though slightly different, cases. (And, for all we know, we might want to use "%t" at some later date for something else.)
Perhaps we could have a single format code plus an optional '#' for the "alternate form":
%T for short form %#T for fully qualified name _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
MRAB wrote on 9/11/18 16:06:
On 2018-09-11 23:23, Victor Stinner wrote:
Last week, I opened an issue to propose to add a new %T formatter to PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat() and PyErr_Format():
https://bugs.python.org/issue34595
I merged my change, but then Serhiy Storchaka asked if we can add something to get the "fully qualified name" (FQN) of a type, ex "datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name). I proposed a second pull request to add %t (short) in addition to %T (FQN).
But then Petr Viktorin asked me to open a thread on python-dev to get a wider discussion. So here I am.
+1 for adding format specs for the types, and for giving a consistent way to specify which you want. %t (short) and %T (long) do seem like the logical choices, and I think in context it'll be pretty evident what they mean.
Perhaps we could have a single format code plus an optional '#' for the "alternate form":
%T for short form %#T for fully qualified name
OTOH, if %T and variants meant "type" but %t mean something entirely different, that *would* probably be confusing. -Barry
On 09/11/2018 05:21 PM, Barry Warsaw wrote:
MRAB wrote on 9/11/18 16:06:
Perhaps we could have a single format code plus an optional '#' for the "alternate form":
%T for short form %#T for fully qualified name
OTOH, if %T and variants meant "type" but %t mean something entirely different, that *would* probably be confusing.
I think folks used to %-formatting are already used to un-related but similar codes (and related but dissimilar): - %M for minute - %m for month (or maybe I have that backwards) - %H for 24-hour clock - %I for 12-hour clock - %w for weekday as decimal number - %W for week number of the year I always have to look it up. :( -- ~Ethan~
On 2018-09-12 02:00, Ethan Furman wrote:
On 09/11/2018 05:21 PM, Barry Warsaw wrote:
MRAB wrote on 9/11/18 16:06:
Perhaps we could have a single format code plus an optional '#' for the "alternate form":
%T for short form %#T for fully qualified name
OTOH, if %T and variants meant "type" but %t mean something entirely different, that *would* probably be confusing.
I think folks used to %-formatting are already used to un-related but similar codes (and related but dissimilar):
- %M for minute - %m for month (or maybe I have that backwards) - %H for 24-hour clock - %I for 12-hour clock - %w for weekday as decimal number - %W for week number of the year
I always have to look it up. :(
Well, for the time of day (24-hour) it's %H:%M:%S, all uppercase.
On 09/11/18 15:23, Victor Stinner wrote:
Hi,
Last week, I opened an issue to propose to add a new %T formatter to PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat() and PyErr_Format():
https://bugs.python.org/issue34595
I merged my change, but then Serhiy Storchaka asked if we can add something to get the "fully qualified name" (FQN) of a type, ex "datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name). I proposed a second pull request to add %t (short) in addition to %T (FQN).
But then Petr Viktorin asked me to open a thread on python-dev to get a wider discussion. So here I am.
After a discussion with Victor. I'll summarize where we are now. There are actually two inconsistencies to fix: - Python modules use `type(obj).__name__` and C extensions use `Py_TYPE(obj)->tp_name`, which inconsistent. - Usage __name__ or __qualname__, and prepending __module__ or not, is inconsistent across types/modules. It turns out that today, when you want to print out a type name, you nearly always want the fully qualified name (including the module unless it's "builtins"). So we can just have "%T" and not "%t". (Or we can add "%t" if a use case arises). It should be possible to do this also in Python, preferably using a name similar to "%T". Most of the usage is in error messages and __repr__, where we don't need to worry about compatibility too much. It should be possible to get the name if you have the type object, but not an instance of it. So, the proposed `PyUnicode_FromFormat("%T", obj)` is incomplete -- if we go that way, we'll also need a function like PyType_GetFullName. Making "%T" work on the type, e.g. `PyUnicode_FromFormat("%T", Py_TYPE(obj))`, would be more general. --- So, I propose adding a "%T" formatter to PyUnicode_FromFormat, to be used like this: PyUnicode_FromFormat("%T", Py_TYPE(obj)) and a "T" format code for type.__format__, to be used like this: f"{type(obj):T}"
12.09.18 01:23, Victor Stinner пише:
But then Petr Viktorin asked me to open a thread on python-dev to get a wider discussion. So here I am.
Thank you for opening this discussion Victor. I wanted to do it myself, but you have wrote much better description of the problem. See also related issues: https://bugs.python.org/issue21861 https://bugs.python.org/issue22032 (solved) https://bugs.python.org/issue22033 (solved) https://bugs.python.org/issue27541 (solved) https://bugs.python.org/issue28062 There were also attempts to change repr/str of types so that they return just a FQN. It would help to solve the issue from Python side. This idea was initially suggested by Guido, but later he changed his mind.
The rationale for this change is to fix multiple issues:
* C extensions use Py_TYPE(obj)->tp_name which returns a fully qualified name for C types, but the name (without the module) for Python name. Python modules use type(obj).__name__ which always return the short name.
Sometimes Python modules use FQN, but this is not common, and the code is cumbersome. It is more common to use obj.__class__ instead of type(obj), the difference is intentionally ignored.
* currently, many C extensions truncate the type name: use "%.80s" instead of "%s" to format a type name
AFAIK the rationale of this in PyUnicode_FromFormat() is that if you have corrupted type object, tp_name can point on arbitrary place in memory, and an attempt to interpret it as null terminated string can output a large amount of trash. It is better to get a truncated type name in error message (names of real types usually are below that limit) than get tons of trash or an error in attempt to format it.
Maybe we can have "name: {0:t}, FQN: {0:T}".format(type(obj)). "t" for name and "T" for fully qualfied name. We would only have to modify type.__format__().
This will make the feature inconsistent in Python and C. In Python, the argument is a type, in C it is an instance of the type. We need a way to format a FQN in C for types themselves. It is less common case, but using _PyType_FullName() for it is very non-convenient as you have shown above.
Hi, For the type name, sometimes, we only get a type (not an instance), and we want to format its FQN. IMHO we need to provide ways to format the FQN of a type for *types* and for *instances*. Here is my proposal: * Add !t conversion to format string * Add ":T" format to type.__format__() * Add "%t" and "%T" formatters to PyUnicode_FromUnicodeV() * Add a read-only type.__fqn__ property # Python: "!t" for instance raise TypeError(f"must be str, not {obj!t}") /* C: "%t" for instance */ PyErr_Format(PyExc_TypeError, "must be str, not %t", obj); /* C: "%T" for type */ PyErr_Format(PyExc_TypeError, "must be str, not %T", mytype); # Python: ":T" for type raise TypeError(f"must be str, not {mytype!T}") Open question: Should we also add "%t" and "%T" formatters to the str % args operator at the Python level? I have a proof-of-concept implementation: https://github.com/python/cpython/pull/9251 Victor Victor
On 13 Sep 2018, at 2:33, Victor Stinner wrote:
Hi,
For the type name, sometimes, we only get a type (not an instance), and we want to format its FQN. IMHO we need to provide ways to format the FQN of a type for *types* and for *instances*. Here is my proposal:
* Add !t conversion to format string * Add ":T" format to type.__format__() * Add "%t" and "%T" formatters to PyUnicode_FromUnicodeV()
As far as I can remember, the distinction between lowercase and uppercase format letter for PyUnicode_FromUnicodeV() and friends was: lowercase letters are for formatting C types (like `char *` etc.) and uppercase formatting letters are for Python types (i.e. the C type is `PyObject *`). IMHO we should keep that distinction.
* Add a read-only type.__fqn__ property
I like that.
# Python: "!t" for instance raise TypeError(f"must be str, not {obj!t}")
/* C: "%t" for instance */ PyErr_Format(PyExc_TypeError, "must be str, not %t", obj);
/* C: "%T" for type */ PyErr_Format(PyExc_TypeError, "must be str, not %T", mytype);
# Python: ":T" for type raise TypeError(f"must be str, not {mytype!T}")
We could solve the problem with instances and classes by adding two new ! operators to str.format/f-strings and making them chainable. The !t operator would get the class of the argument and the !c operator would require a class argument and would convert it to its name (which is obj.__module__ + "." + obj.__qualname__ (or only obj.__qualname__ for builtin types)). So: >>> import pathlib >>> p = pathlib.Path("spam.py") >>> print(f"{pathlib.Path}") <class 'pathlib.Path'> >>> print(f"{pathlib.Path!c}") pathlib.Path >>> print(f"{pathlib.Path!c!r}") 'pathlib.Path' >>> print(f"{p!t}") <class 'pathlib.Path'> >>> print(f"{p!t!c}") pathlib.Path >>> print(f"{p!c}") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object is not a class This would also give us: >>> print(f"{p!s!r}") 'spam.py' Which is different from: >>> print(f"{p}") spam.py >>> print(f"{p!r}") PosixPath('spam.py')
Open question: Should we also add "%t" and "%T" formatters to the str % args operator at the Python level?
I have a proof-of-concept implementation: https://github.com/python/cpython/pull/9251
Victor
Servus, Walter
On 9/12/2018 8:33 PM, Victor Stinner wrote:
Hi,
For the type name, sometimes, we only get a type (not an instance), and we want to format its FQN. IMHO we need to provide ways to format the FQN of a type for *types* and for *instances*. Here is my proposal:
* Add !t conversion to format string
I'm strongly opposed to this. This !t conversion would not be widely applicable enough to be generally useful, and would need to be exposed in the f-string and str.format() documentation, even though 99% of programmers would never need or see it. The purpose of the conversions is not to save you from making a function call when you know the type of the arguments. The purpose was specifically to convert arguments to strings so that your format specifier could always use the string formatting mini-language. It was more useful in str.format(), where the format string might be written separately (as user input or a translation, say) and not know the types of the arguments. You can (and I have!) argued that the conversions are completely unneeded in f-strings. raise TypeError(f"must be str, not {obj!t}") Should be written as: raise TypeError(f"must be str, not {type(obj)}")
* Add ":T" format to type.__format__() As you know (I've read the patch) this is just "T". I mention it here for future readers. They should understand that the ":" is a str.format() and f-string construct, and is unknown to __format__().
That said, I think this is a good idea. type.__format__() could also understand "#" to specify qualname.
* Add "%t" and "%T" formatters to PyUnicode_FromUnicodeV() I think "T" is a good idea, but I think you're adding in obj vs type(obj) just because of the borrowed reference issue in Py_TYPE(). That issue is so much larger than string formatting the type of an object that it shouldn't be addressed here.
* Add a read-only type.__fqn__ property I'm not sure of the purpose of this. When in your examples is it used? # Python: "!t" for instance raise TypeError(f"must be str, not {obj!t}")
/* C: "%t" for instance */ PyErr_Format(PyExc_TypeError, "must be str, not %t", obj);
/* C: "%T" for type */ PyErr_Format(PyExc_TypeError, "must be str, not %T", mytype);
# Python: ":T" for type raise TypeError(f"must be str, not {mytype!T}")
Open question: Should we also add "%t" and "%T" formatters to the str % args operator at the Python level?
No. Again, I think any formatting of type names should not be in a widely used interface, and should remain in our type-specific interface, __format__. %-formatting has no per-type extensibility, and I don't think we should start adding codes for every possible use case. Format codes for datetimes would be way more useful that %t, and I'd be opposed to adding them, too. (I realize my analogy is stretched, because every object has a type. But still.) Eric
I have a proof-of-concept implementation: https://github.com/python/cpython/pull/9251
Victor
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40tru...
On 09/13/2018 07:01 AM, Eric V. Smith wrote:
On 9/12/2018 8:33 PM, Victor Stinner wrote:
Hi,
For the type name, sometimes, we only get a type (not an instance), and we want to format its FQN. IMHO we need to provide ways to format the FQN of a type for *types* and for *instances*. Here is my proposal:
* Add !t conversion to format string
I'm strongly opposed to this. This !t conversion would not be widely applicable enough to be generally useful, and would need to be exposed in the f-string and str.format() documentation, even though 99% of programmers would never need or see it.
I discussed this with Eric in-person this morning at the core dev sprints. Eric's understanding is that this is motivated by the fact that Py_TYPE() returns a borrowed reference, and by switching to this !t conversion we could avoid using Py_TYPE() when formatting error messages. My quick thoughts on this: * If Py_TYPE() is a bad API, then it's a bad API and should be replaced. We should have a new version of Py_TYPE() that returns a strong reference. * If we're talking about formatting error messages, we're formatting an exception, which means we're already no longer in performance-sensitive code. So we should use the new API that returns a strong reference. The negligible speed hit of taking the extra reference will be irrelevant. Cheers, //arry/
Le jeu. 13 sept. 2018 à 16:01, Eric V. Smith <eric@trueblade.com> a écrit :
* Add !t conversion to format string
I'm strongly opposed to this. This !t conversion would not be widely applicable enough to be generally useful, and would need to be exposed in the f-string and str.format() documentation, even though 99% of programmers would never need or see it.
(I'm thinking aloud.) In the Python code base, I found 115 lines using type(obj).__name__ and 228 lines using obj.__class__.__name__. $ scm.py grep 'type(.*).__name__'|wc -l 115 $ scm.py grep '.__class__.__name__'|wc -l 228 I don't know how to compare these numbers, so I tried to count the number of f-strings: $ git grep '[^"%-]\<f"'|wc -l 405 $ git grep "[^'%-]\<f'"|wc -l 864 I'm not sure if type(obj) or obj.__class__ should be used, but I can say that they are different: obj.__class__ can be overriden: --- class OtherType: pass class MyType: __class__ = OtherType x = MyType() print(f"type(x): {type(x)})") print(f"x.__class__: {x.__class__}") --- Output: --- type(x): <class '__main__.MyType'>) x.__class__: <class 'int'> --- Moreover, it's also possible to override the "type" symbol in the global or local scope: --- type = id num = 42 print(f"type(num): {type(num)}") # Output: "type(num): 139665950357856" --- One advantage of having a builtin formatter would be to always use internally the builtin type() function to get the type of an object, or not use "type()" in the current scope. The second advantage is to prevent the need of having to decide between type(obj) and obj.__class__ :-)
raise TypeError(f"must be str, not {obj!t}")
Should be written as: raise TypeError(f"must be str, not {type(obj)}")
f"{type(obj)}" behaves as str(type(obj)), but in practice it uses repr(type(obj)):
f"{type(42)}" "<class 'int'>"
My proposed f"{obj!t}" returns the fully qualified name of the object type:
f"{42!t}" "int"
Do you want to modify str(type) to return a value different than repr(type)? Or maybe it's just a typo and you wanted to write f"{type(obj):T}"?
That said, I think this is a good idea. type.__format__() could also understand "#" to specify qualname.
When I discussed with Petr Viktorin, we failed to find an usecase where __qualname__ was needed. We agreed that we always want the fully qualified name, not just the qualified name.
I think "T" is a good idea, but I think you're adding in obj vs type(obj) just because of the borrowed reference issue in Py_TYPE(). That issue is so much larger than string formatting the type of an object that it shouldn't be addressed here.
Right, that's a side effect of the discussion on the C API. It seems like Py_TYPE() has to go in the new C API. Sorry, the rationale is not written down yet, but Dino convinced me that Py_TYPE() has to go :-)
Open question: Should we also add "%t" and "%T" formatters to the str % args operator at the Python level?
No. Again, I think any formatting of type names should not be in a widely used interface, (...)
Ok, that's fine with me :-) Victor
On 09/13/18 14:08, Victor Stinner wrote:
Le jeu. 13 sept. 2018 à 16:01, Eric V. Smith <eric@trueblade.com> a écrit :
* Add !t conversion to format string
I'm strongly opposed to this. This !t conversion would not be widely applicable enough to be generally useful, and would need to be exposed in the f-string and str.format() documentation, even though 99% of programmers would never need or see it.
(I'm thinking aloud.)
In the Python code base, I found 115 lines using type(obj).__name__ and 228 lines using obj.__class__.__name__. [...]
"!t" is not a big improvement over ":T" and "type(obj)".
I'm not sure if type(obj) or obj.__class__ should be used, but I can say that they are different: obj.__class__ can be overriden: [...]
Moreover, it's also possible to override the "type" symbol in the global or local scope: [...]
I don't think either of those are problematic. If you override `__class__` or `type`, things will behave weirdly, and that's OK.
One advantage of having a builtin formatter would be to always use internally the builtin type() function to get the type of an object, or not use "type()" in the current scope. The second advantage is to prevent the need of having to decide between type(obj) and obj.__class__ :-)
raise TypeError(f"must be str, not {obj!t}")
Should be written as: raise TypeError(f"must be str, not {type(obj)}") [...]
Do you want to modify str(type) to return a value different than repr(type)?
Or maybe it's just a typo and you wanted to write f"{type(obj):T}"?
Yes, AFAIK that was a typo.
I think "T" is a good idea, but I think you're adding in obj vs type(obj) just because of the borrowed reference issue in Py_TYPE(). That issue is so much larger than string formatting the type of an object that it shouldn't be addressed here.
Right, that's a side effect of the discussion on the C API. It seems like Py_TYPE() has to go in the new C API. Sorry, the rationale is not written down yet, but Dino convinced me that Py_TYPE() has to go :-)
I'll be happy when we get rid of Py_TYPE and get to use moving garbage collectors... but now is not the time. The API for "%T" should be "give me the type". The best way to do that might change in the future. But at this point, we're bikeshedding. I think all the relevant voices have been heard.
On 9/13/2018 5:52 PM, Petr Viktorin wrote:
On 09/13/18 14:08, Victor Stinner wrote:
Le jeu. 13 sept. 2018 à 16:01, Eric V. Smith <eric@trueblade.com> a écrit :
* Add !t conversion to format string
I'm strongly opposed to this. This !t conversion would not be widely applicable enough to be generally useful, and would need to be exposed in the f-string and str.format() documentation, even though 99% of programmers would never need or see it.
(I'm thinking aloud.)
In the Python code base, I found 115 lines using type(obj).__name__ and 228 lines using obj.__class__.__name__. [...]
"!t" is not a big improvement over ":T" and "type(obj)".
I'm not sure if type(obj) or obj.__class__ should be used, but I can say that they are different: obj.__class__ can be overriden: [...]
Moreover, it's also possible to override the "type" symbol in the global or local scope: [...]
I don't think either of those are problematic. If you override `__class__` or `type`, things will behave weirdly, and that's OK.
One advantage of having a builtin formatter would be to always use internally the builtin type() function to get the type of an object, or not use "type()" in the current scope. The second advantage is to prevent the need of having to decide between type(obj) and obj.__class__ :-)
raise TypeError(f"must be str, not {obj!t}")
Should be written as: raise TypeError(f"must be str, not {type(obj)}") [...]
Do you want to modify str(type) to return a value different than repr(type)?
Or maybe it's just a typo and you wanted to write f"{type(obj):T}"?
Yes, AFAIK that was a typo.
f'{type(obj)}' becomes type(obj).__format__(''), so you can return something other than __str__ or __repr__ does. It's only by convention that an object's __format__ returns __str__: it need not do so. Eric
I think "T" is a good idea, but I think you're adding in obj vs type(obj) just because of the borrowed reference issue in Py_TYPE(). That issue is so much larger than string formatting the type of an object that it shouldn't be addressed here.
Right, that's a side effect of the discussion on the C API. It seems like Py_TYPE() has to go in the new C API. Sorry, the rationale is not written down yet, but Dino convinced me that Py_TYPE() has to go :-)
I'll be happy when we get rid of Py_TYPE and get to use moving garbage collectors... but now is not the time. The API for "%T" should be "give me the type". The best way to do that might change in the future.
But at this point, we're bikeshedding. I think all the relevant voices have been heard. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40tru...
Le ven. 14 sept. 2018 à 00:09, Eric V. Smith <eric@trueblade.com> a écrit :
f'{type(obj)}' becomes type(obj).__format__(''), so you can return something other than __str__ or __repr__ does. It's only by convention that an object's __format__ returns __str__: it need not do so.
What's New in Python 3.7 contains:
object.__format__(x, '') is now equivalent to str(x) rather than format(str(self), ''). (Contributed by Serhiy Storchaka in bpo-28974.)
https://bugs.python.org/issue28974 Oh, I didn't know that a type is free to change this behavior: return something different than str(obj) if the format spec is an empty string. So are you suggesting to change type(obj).__format__('') to return the fully qualified name instead of repr(type)? So "%s" % type(obj) would use repr(), but "{}".format(type(obj)) and f"{type(obj)}" would return the fully qualified name? Victor
On 9/13/2018 8:04 PM, Victor Stinner wrote:
Le ven. 14 sept. 2018 à 00:09, Eric V. Smith <eric@trueblade.com> a écrit :
f'{type(obj)}' becomes type(obj).__format__(''), so you can return something other than __str__ or __repr__ does. It's only by convention that an object's __format__ returns __str__: it need not do so. What's New in Python 3.7 contains:
object.__format__(x, '') is now equivalent to str(x) rather than format(str(self), ''). (Contributed by Serhiy Storchaka in bpo-28974.) https://bugs.python.org/issue28974
Oh, I didn't know that a type is free to change this behavior: return something different than str(obj) if the format spec is an empty string.
True! That issue was specific to object.__format__, not any other classes implementation of __format__.
So are you suggesting to change type(obj).__format__('') to return the fully qualified name instead of repr(type)?
I'm not suggesting it, I'm saying it's possible. It indeed might be the most useful behavior.
So "%s" % type(obj) would use repr(), but "{}".format(type(obj)) and f"{type(obj)}" would return the fully qualified name?
"%s" % type(obj) would use str(), not repr. You could either: - keep with convention and have type(obj).__format__('') return type(obj).__str__(), while type(obj).__format__('#') (or what other char you want to use) return the qualname; or - just have type(obj).__format__('') return the qualname, if that's the more useful behavior. Eric
On 2018-09-13, Victor Stinner wrote:
Right, that's a side effect of the discussion on the C API. It seems like Py_TYPE() has to go in the new C API. Sorry, the rationale is not written down yet, but Dino convinced me that Py_TYPE() has to go :-)
My understanding is that using Py_TYPE() inside the CPython internals is okay (i.e. using a borrowed reference). However, extension modules would preferrably not use APIs that give back borrowed references. A clean API redesign would remove all of those. So, what are extension modules supposed to do? We want to give them an easy to use API. If we give them %t that takes an object and internally does the Py_TYPE() call, they have a simple way to do the right thing. E.g. PyErr_Format(PyExc_TypeError, "\"%s\" must be string, not %.200s", name, src->ob_type->tp_name); becomes PyErr_Format(PyExc_TypeError, "\"%s\" must be string, not %t", name, src); This kind of code occurs often in extension modules. If you make them get a strong reference to the type, they have to remember to decref it. It's not a huge deal but is a bit harder to use. I like the proposal to provide both %t and %T. Our format code is a bit more complicated but many extension modules get a bit simpler. That's a win, IMHO. For the Python side, I don't think you need the % format codes. You need a idiomatic way of getting the type name. repr() and str() of the type object is not it. I don't think changing them at this point is a good idea. So, having a new property would seem the obvious solution. E.g. f'"{name}" must be string, not {src.__class__.__qualname__}' That __qualname__ property will be useful for other things, not just building type error messages.
participants (11)
-
Barry Warsaw
-
Eric V. Smith
-
Ethan Furman
-
Guido van Rossum
-
Larry Hastings
-
MRAB
-
Neil Schemenauer
-
Petr Viktorin
-
Serhiy Storchaka
-
Victor Stinner
-
Walter Dörwald