On 09/11/18 15:23, Victor Stinner wrote:
Hi,
Last week, I opened an issue to propose to add a new %T formatter to PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat() and PyErr_Format():
https://bugs.python.org/issue34595
I merged my change, but then Serhiy Storchaka asked if we can add something to get the "fully qualified name" (FQN) of a type, ex "datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name). I proposed a second pull request to add %t (short) in addition to %T (FQN).
But then Petr Viktorin asked me to open a thread on python-dev to get a wider discussion. So here I am.
The rationale for this change is to fix multiple issues:
* C extensions use Py_TYPE(obj)->tp_name which returns a fully qualified name for C types, but the name (without the module) for Python name. Python modules use type(obj).__name__ which always return the short name.
That might be a genuine problem, but I wonder if "%T" is fixing the symptom rather than the cause here. Or is this only an issue for PyUnicode_FromFormat()?
* currently, many C extensions truncate the type name: use "%.80s" instead of "%s" to format a type name
That's an orthogonal issue -- you can change "%.80s" to "%s", and presumably you could use "%.80t" as well.
* "%s" with Py_TYPE(obj)->tp_name is used more than 200 times in the C code, and I dislike this complex pattern. IMHO "%t" with obj would be simpler to read, write and maintain.
I consider `Py_TYPE(obj)->tp_name` much more understandable than "%t". It's longer to spell out, but it's quite self-documenting.
* I want C extensions and Python modules to have the same behavior: respect the PEP 399. Petr considers that error messages are not part of the PEP 399, but the issue is wider than only error messages.
The other major use is for __repr__, which AFAIK we also don't guarantee to be stable, so I don't think PEP 399 applies to it. Having the same behavior between C and Python versions of a module is nice, but PEP 399 doesn't prescribe it. There are other differences as well -- for example, `_datetime.datetime` is immutable, and that's OK. If error messages and __repr__s should be consistent between Python and the C accelerator, are you planning to write tests for all the affected modules when switching them to %T/%t?
The main issue is that at the C level, Py_TYPE(obj)->tp_name is "usually" the fully qualified name for types defined in C, but it's only the "short" name for types defined in Python.
For example, if you get the C accelerator "_datetime", PyTYPE(obj)->tp_name of a datetime.timedelta object gives you "datetime.timedelta", but if you don't have the accelerator, tp_name is just "timedelta".
Another example, this script displays "mytimedelta(0)" if you have the C accelerator, but "__main__.mytimedelta(0)" if you use the Python implementation: --- import sys #sys.modules['_datetime'] = None import datetime
class mytimedelta(datetime.timedelta): pass
print(repr(mytimedelta())) ---
So I would like to fix this kind of issue.
Type names are mainly used for two purposes:
* format an error message * obj.__repr__()
It's unclear to me if we should use the "short" or the "fully qualified" name. It should maybe be decided on a case by case basis.
There is also a 3rd usage: to implement __reduce__, here backward compatibility matters.
Note: The discussion evolved since my first implementation of %T which just used the not well defined Py_TYPE(obj)->tp_name.
--
Petr asked me why not exposing functions to get these names. For example, with my second PR (not merged), there are 3 (private) functions:
/* type.__name__ */ const char* _PyType_Name(PyTypeObject *type); /* type.__qualname__ */ PyObject* _PyType_QualName(PyTypeObject *type); * type.__module__ "." type.__qualname__ (but type.__qualname__ for builtin types) */ PyObject * _PyType_FullName(PyTypeObject *type);
My concern here is that each caller has to handler error:
PyErr_Format(PyExc_TypeError, "must be str, not %.100s", Py_TYPE(obj)->tp_name);
would become:
PyObject *type_name = _PyType_FullName(Py_TYPE(obj)); if (name == NULL) { /* do something with this error ... */ PyErr_Format(PyExc_TypeError, "must be str, not %U", type_name); Py_DECREF(name);
When I report an error, I dislike having to handle *new* errors... I prefer that the error handling is done inside PyErr_Format() for me, to reduce the risk of additional bugs.
--
Serhiy also asked if we could expose the same feature at the *Python* level: provide something to get the fully qualified name of a type. It's not just f"{type(obj).__module}.{type(obj).__name__}", but you have to skip the module for builtin types like "str" (not return "builtins.str").
Maybe we can have "name: {0:t}, FQN: {0:T}".format(type(obj)). "t" for name and "T" for fully qualfied name. We would only have to modify type.__format__().
I'm not sure if we need to add new formatters to str % args.
Example of Python code:
raise TypeError("must be str, not %s" % type(fmt).__name__)
I'm not sure about Python changes. My first concern was just to avoid Py_TYPE(obj)->tp_name at the C level. But again, we should keep C and Python consistent. If the behavior of C extensions change, Python modules should be adapted as well, to get the same behavior.
Note: I reverted my change which added the %T formatter from PyUnicode_FromFormatV() to clarify the status of this issue.
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/encukou%40gmail.com