Re: [Python-Dev] bpo-34595: How to format a type name?

11 Sep 2018


      On 09/11/18 15:23, Victor Stinner wrote:
...
Hi,
Last week, I opened an issue to propose to add a new %T formatter to
PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat()
and PyErr_Format():
https://bugs.python.org/issue34595
I merged my change, but then Serhiy Storchaka asked if we can add
something to get the "fully qualified name" (FQN) of a type, ex
"datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name).
I proposed a second pull request to add %t (short) in addition to %T
(FQN).
But then Petr Viktorin asked me to open a thread on python-dev to get
a wider discussion. So here I am.
The rationale for this change is to fix multiple issues:
* C extensions use Py_TYPE(obj)->tp_name which returns a fully
qualified name for C types, but the name (without the module) for
Python name. Python modules use type(obj).__name__ which always return
the short name.
That might be a genuine problem, but I wonder if "%T" is fixing the 
symptom rather than the cause here.
Or is this only an issue for PyUnicode_FromFormat()?
...
* currently, many C extensions truncate the type name: use "%.80s"
instead of "%s" to format a type name
That's an orthogonal issue -- you can change "%.80s" to "%s", and 
presumably you could use "%.80t" as well.
...
* "%s" with Py_TYPE(obj)->tp_name is used more than 200 times in the C
code, and I dislike this complex pattern. IMHO "%t" with obj would be
simpler to read, write and maintain.
I consider `Py_TYPE(obj)->tp_name` much more understandable than "%t".
It's longer to spell out, but it's quite self-documenting.
...
* I want C extensions and Python modules to have the same behavior:
respect the PEP 399. Petr considers that error messages are not part
of the PEP 399, but the issue is wider than only error messages.
The other major use is for __repr__, which AFAIK we also don't guarantee 
to be stable, so I don't think PEP 399 applies to it.
Having the same behavior between C and Python versions of a module is 
nice, but PEP 399 doesn't prescribe it. There are other differences as 
well -- for example, `_datetime.datetime` is immutable, and that's OK.

If error messages and __repr__s should be consistent between Python and 
the C accelerator, are you planning to write tests for all the affected 
modules when switching them to %T/%t?
...
The main issue is that at the C level, Py_TYPE(obj)->tp_name is
"usually" the fully qualified name for types defined in C, but it's
only the "short" name for types defined in Python.
For example, if you get the C accelerator "_datetime",
PyTYPE(obj)->tp_name of a datetime.timedelta object gives you
"datetime.timedelta", but if you don't have the accelerator, tp_name
is just "timedelta".
Another example, this script displays "mytimedelta(0)" if you have the
C accelerator, but "__main__.mytimedelta(0)" if you use the Python
implementation:
---
import sys
#sys.modules['_datetime'] = None
import datetime
class mytimedelta(datetime.timedelta):
     pass
print(repr(mytimedelta()))
---
So I would like to fix this kind of issue.
Type names are mainly used for two purposes:
* format an error message
* obj.__repr__()
It's unclear to me if we should use the "short" or the "fully
qualified" name. It should maybe be decided on a case by case basis.
There is also a 3rd usage: to implement __reduce__, here backward
compatibility matters.
Note: The discussion evolved since my first implementation of %T which
just used the not well defined Py_TYPE(obj)->tp_name.
--
Petr asked me why not exposing functions to get these names. For
example, with my second PR (not merged), there are 3 (private)
functions:
/* type.__name__ */
const char* _PyType_Name(PyTypeObject *type);
/* type.__qualname__ */
PyObject* _PyType_QualName(PyTypeObject *type);
* type.__module__ "." type.__qualname__ (but type.__qualname__ for
builtin types) */
PyObject * _PyType_FullName(PyTypeObject *type);
My concern here is that each caller has to handler error:
PyErr_Format(PyExc_TypeError, "must be str, not %.100s",
Py_TYPE(obj)->tp_name);
would become:
PyObject *type_name = _PyType_FullName(Py_TYPE(obj));
   if (name == NULL) { /* do something with this error ... */
   PyErr_Format(PyExc_TypeError, "must be str, not %U", type_name);
   Py_DECREF(name);
When I report an error, I dislike having to handle *new* errors... I
prefer that the error handling is done inside PyErr_Format() for me,
to reduce the risk of additional bugs.
--
Serhiy also asked if we could expose the same feature at the *Python*
level: provide something to get the fully qualified name of a type.
It's not just f"{type(obj).__module}.{type(obj).__name__}", but you
have to skip the module for builtin types like "str" (not return
"builtins.str").
Maybe we can have "name: {0:t}, FQN: {0:T}".format(type(obj)). "t" for
name and "T" for fully qualfied name. We would only have to modify
type.__format__().
I'm not sure if we need to add new formatters to str % args.
Example of Python code:
raise TypeError("must be str, not %s" % type(fmt).__name__)
I'm not sure about Python changes. My first concern was just to avoid
Py_TYPE(obj)->tp_name at the C level. But again, we should keep C and
Python consistent. If the behavior of C extensions change, Python
modules should be adapted as well, to get the same behavior.
Note: I reverted my change which added the %T formatter from
PyUnicode_FromFormatV() to clarify the status of this issue.
Victor
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/encukou%40gmail.com