Mailman 3 bpo-34595: How to format a type name? - Python-Dev

11 Sep 2018

      Hi,

Last week, I opened an issue to propose to add a new %T formatter to
PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat()
and PyErr_Format():

   https://bugs.python.org/issue34595

I merged my change, but then Serhiy Storchaka asked if we can add
something to get the "fully qualified name" (FQN) of a type, ex
"datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name).
I proposed a second pull request to add %t (short) in addition to %T
(FQN).

But then Petr Viktorin asked me to open a thread on python-dev to get
a wider discussion. So here I am.

The rationale for this change is to fix multiple issues:

* C extensions use Py_TYPE(obj)->tp_name which returns a fully
qualified name for C types, but the name (without the module) for
Python name. Python modules use type(obj).__name__ which always return
the short name.

* currently, many C extensions truncate the type name: use "%.80s"
instead of "%s" to format a type name

* "%s" with Py_TYPE(obj)->tp_name is used more than 200 times in the C
code, and I dislike this complex pattern. IMHO "%t" with obj would be
simpler to read, write and maintain.

* I want C extensions and Python modules to have the same behavior:
respect the PEP 399. Petr considers that error messages are not part
of the PEP 399, but the issue is wider than only error messages.

The main issue is that at the C level, Py_TYPE(obj)->tp_name is
"usually" the fully qualified name for types defined in C, but it's
only the "short" name for types defined in Python.

For example, if you get the C accelerator "_datetime",
PyTYPE(obj)->tp_name of a datetime.timedelta object gives you
"datetime.timedelta", but if you don't have the accelerator, tp_name
is just "timedelta".

Another example, this script displays "mytimedelta(0)" if you have the
C accelerator, but "__main__.mytimedelta(0)" if you use the Python
implementation:
---
import sys
#sys.modules['_datetime'] = None
import datetime

class mytimedelta(datetime.timedelta):
    pass

print(repr(mytimedelta()))
---

So I would like to fix this kind of issue.

Type names are mainly used for two purposes:

* format an error message
* obj.__repr__()

It's unclear to me if we should use the "short" or the "fully
qualified" name. It should maybe be decided on a case by case basis.

There is also a 3rd usage: to implement __reduce__, here backward
compatibility matters.

Note: The discussion evolved since my first implementation of %T which
just used the not well defined Py_TYPE(obj)->tp_name.

--

Petr asked me why not exposing functions to get these names. For
example, with my second PR (not merged), there are 3 (private)
functions:

/* type.__name__ */
const char* _PyType_Name(PyTypeObject *type);
/* type.__qualname__ */
PyObject* _PyType_QualName(PyTypeObject *type);
* type.__module__ "." type.__qualname__ (but type.__qualname__ for
builtin types) */
PyObject * _PyType_FullName(PyTypeObject *type);

My concern here is that each caller has to handler error:

  PyErr_Format(PyExc_TypeError, "must be str, not %.100s",
Py_TYPE(obj)->tp_name);

would become:

  PyObject *type_name = _PyType_FullName(Py_TYPE(obj));
  if (name == NULL) { /* do something with this error ... */
  PyErr_Format(PyExc_TypeError, "must be str, not %U", type_name);
  Py_DECREF(name);

When I report an error, I dislike having to handle *new* errors... I
prefer that the error handling is done inside PyErr_Format() for me,
to reduce the risk of additional bugs.

--

Serhiy also asked if we could expose the same feature at the *Python*
level: provide something to get the fully qualified name of a type.
It's not just f"{type(obj).__module}.{type(obj).__name__}", but you
have to skip the module for builtin types like "str" (not return
"builtins.str").

Maybe we can have "name: {0:t}, FQN: {0:T}".format(type(obj)). "t" for
name and "T" for fully qualfied name. We would only have to modify
type.__format__().

I'm not sure if we need to add new formatters to str % args.

Example of Python code:

   raise TypeError("must be str, not %s" % type(fmt).__name__)

I'm not sure about Python changes. My first concern was just to avoid
Py_TYPE(obj)->tp_name at the C level. But again, we should keep C and
Python consistent. If the behavior of C extensions change, Python
modules should be adapted as well, to get the same behavior.

Note: I reverted my change which added the %T formatter from
PyUnicode_FromFormatV() to clarify the status of this issue.

Victor

bpo-34595: How to format a type name?

Victor Stinner

Petr Viktorin

MRAB

Guido van Rossum

Barry Warsaw

Petr Viktorin

Ethan Furman

MRAB

Serhiy Storchaka

Victor Stinner

Walter Dörwald

Eric V. Smith

Larry Hastings

Victor Stinner

Petr Viktorin

Eric V. Smith

Victor Stinner

Eric V. Smith

Neil Schemenauer

tags

participants (11)