Experiment an opt-in new C API for Python? (leave current API unchanged)
Victor Stinner wrote:
Moreover, I failed to find anyone who can explain me how the C API is used in the wild, which functions are important or not, what is the C API, etc.
In practice people desperately *have* to use whatever is there, including functions with underscores that are not even officially in the C-API. I have to use _PyFloat_Pack* in order to be compatible with CPython, I need PySlice_Unpack() etc., I need PyUnicode_KIND(), need PyUnicode_AsUTF8AndSize(), I *wish* there were PyUnicode_AsAsciiAndSize(). In general, in daily use of the C-API I wish it were *larger* and not smaller. I often want functions that return C instead of Python values ot functions that take C instead of Python values. The ideal situation for me would be a lower layer library, say libcpython.a that has all those functions like _PyFloat_Pack*. It would be an enormous amount of work though, especially since the status quo kind of works. Stefan Krah
Hi Stefan,
Le lun. 19 nov. 2018 à 13:18, Stefan Krah
In practice people desperately *have* to use whatever is there, including functions with underscores that are not even officially in the C-API.
I have to use _PyFloat_Pack* in order to be compatible with CPython,
Oh, I never used this function. These functions are private (name prefixed by "_") and excluded from the limited API. For me, the limited API should be functions available on all Python implementations. Does it make sense to provide PyFloat_Pack4() in MicroPython, Jython, IronPython and PyPy? Or is it something more specific to CPython? I don't know the answer. If yes, open an issue to propose to make this function public?
I need PyUnicode_KIND()
IMHO this one should not be part of the public API. The only usage would be to micro-optimize, but such API is very specific to one Python implementation. For example, PyPy doesn't use "compact string" but UTF-8 internally. If you use PyUnicode_KIND(), your code becomes incompatible with PyPy. What is your use case? I would prefer to expose the "_PyUnicodeWriter" API than PyUnicode_KIND().
need PyUnicode_AsUTF8AndSize(),
Again, that's a micro-optimization and it's very specific to CPython: result cached in the "immutable" str object. I don't want to put it in a public API. PyUnicode_AsUTF8String() is better since it doesn't require an internal cache.
I *wish* there were PyUnicode_AsAsciiAndSize().
PyUnicode_AsASCIIString() looks good to me. Sadly, it doesn't return the length, but usually the length is not needed. Victor
... For me, the limited API should be functions available on all Python implementations. Does it make sense to provide PyFloat_Pack4() in ..., Jython, ... ? Or is it something more specific to CPython? I don't know the answer. I'd say it's a CPython thing. It is helpful to copy a lot of things from
On 19/11/2018 15:08, Victor Stinner wrote: the reference implementation, but generally the lexical conventions of the C-API would seem ludicrous in Java, where scope is already provided by a class. And then there's the impossibility of a C-like pointer to byte. Names related to C-API have mnemonic value, though, in translation. Maybe "static void PyFloat.pack4(double, ByteBuffer, boolean)" would do the trick. It makes sense for JyNI to supply it by the exact C API name, and all other API that C extensions are likely to use. Jeff Allen
On Mon, Nov 19, 2018 at 04:08:07PM +0100, Victor Stinner wrote:
Le lun. 19 nov. 2018 à 13:18, Stefan Krah
a écrit : In practice people desperately *have* to use whatever is there, including functions with underscores that are not even officially in the C-API.
I have to use _PyFloat_Pack* in order to be compatible with CPython,
Oh, I never used this function. These functions are private (name prefixed by "_") and excluded from the limited API.
For me, the limited API should be functions available on all Python implementations. Does it make sense to provide PyFloat_Pack4() in MicroPython, Jython, IronPython and PyPy? Or is it something more specific to CPython? I don't know the answer. If yes, open an issue to propose to make this function public?
It depends on what the goal is: If PyPy wants to be able to use as many C extensions as possible, then yes. The function is just one example of what people have to use to be 100% compatible with CPython (or copy these functions and maintain them ...). Intuitively, it should probably not be part of a limited API, but I never quite understood the purpose of this API, because I regularly need any function that I can get my hands on.
I need PyUnicode_KIND()
IMHO this one should not be part of the public API. The only usage would be to micro-optimize, but such API is very specific to one Python implementation. For example, PyPy doesn't use "compact string" but UTF-8 internally. If you use PyUnicode_KIND(), your code becomes incompatible with PyPy.
What is your use case?
Reading typed strings directly into an array with minimal overhead.
I would prefer to expose the "_PyUnicodeWriter" API than PyUnicode_KIND().
need PyUnicode_AsUTF8AndSize(),
Again, that's a micro-optimization and it's very specific to CPython: result cached in the "immutable" str object. I don't want to put it in a public API. PyUnicode_AsUTF8String() is better since it doesn't require an internal cache.
I *wish* there were PyUnicode_AsAsciiAndSize().
PyUnicode_AsASCIIString() looks good to me. Sadly, it doesn't return the length, but usually the length is not needed.
Yes, these are all just examples. It's also very useful to be able to do PyLong_Type.tp_as_number->nb_multiply or grab as_integer_ratio from the float PyMethodDef. The latter two cases are for speed reasons but also because sometimes you *don't* want a method from a subclass (Serhiy was very good in finding corner cases :-). Most C modules that I've seen have some internals. Psycopg2: PyDateTime_DELTA_GET_MICROSECONDS PyDateTime_DELTA_GET_DAYS PyDateTime_DELTA_GET_SECONDS PyList_GET_ITEM Bytes_GET_SIZE Py_BEGIN_ALLOW_THREADS Py_END_ALLOW_THREADS floatobject.h and longintrepr.h are also popular. Stefan Krah
Le mar. 20 nov. 2018 à 23:08, Stefan Krah
Intuitively, it should probably not be part of a limited API, but I never quite understood the purpose of this API, because I regularly need any function that I can get my hands on. (...) Reading typed strings directly into an array with minimal overhead.
IMHO performance and hiding implementation details are exclusive. You should either use the C API with impl. details for best performances, or use a "limited" C API for best compatibility. Since I would like to not touch the C API with impl. details, you can imagine to have two compilation modes: one for best performances on CPython, one for best compatibility (ex: compatible with PyPy). I'm not sure how the "compilation mode" will be selected. Victor
On 11/20/2018 2:17 PM, Victor Stinner wrote:
Le mar. 20 nov. 2018 à 23:08, Stefan Krah
a écrit : Intuitively, it should probably not be part of a limited API, but I never quite understood the purpose of this API, because I regularly need any function that I can get my hands on. (...) Reading typed strings directly into an array with minimal overhead. IMHO performance and hiding implementation details are exclusive. You should either use the C API with impl. details for best performances, or use a "limited" C API for best compatibility.
The "limited" C API concept would seem to be quite sufficient for extensions that want to extend Python functionality to include new system calls, etc. (pywin32, pyMIDI, pySide, etc.) whereas the numpy and decimal might want best performance.
Since I would like to not touch the C API with impl. details, you can imagine to have two compilation modes: one for best performances on CPython, one for best compatibility (ex: compatible with PyPy). I'm not sure how the "compilation mode" will be selected.
The nicest interface from a compilation point of view would be to have two #include files: One to import the limited API, and one to import the performance API. Importing both should be allowed and should work. If you import the performance API, you have to learn more, and be more careful. Of course, there might be appropriate subsets of each API, having multiple include files, to avoid including everything, but that is a refinement.
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/v%2Bpython%40g.nevcal.com
On Tue, Nov 20, 2018 at 6:05 PM Glenn Linderman
On 11/20/2018 2:17 PM, Victor Stinner wrote:
IMHO performance and hiding implementation details are exclusive. You should either use the C API with impl. details for best performances, or use a "limited" C API for best compatibility.
The "limited" C API concept would seem to be quite sufficient for extensions that want to extend Python functionality to include new system calls, etc. (pywin32, pyMIDI, pySide, etc.) whereas the numpy and decimal might want best performance.
To make things more complicated: numpy and decimal are in a category of modules where if you want them to perform well on JIT-based VMs, then there's no possible C API that can achieve that. To get the benefits of a JIT on code using numpy or decimal, the JIT has to be able to see into their internals to do inlining etc., which means they can't be written in C at all [1], at which point the C API becomes irrelevant. It's not clear to me how this affects any of the discussion in CPython, since supporting JITs might not be part of the goal of a new C API, and I'm not sure how many packages fall between the numpy/decimal side and the pure-ffi side. -n [1] Well, there's also the option of teaching your Python JIT to handle LLVM bitcode as a source language, which is the approach that Graal is experimenting with. It seems completely wacky to me to hope you could write a C API emulation layer like PyPy's cpyext, and compile that + C extension code to LLVM bitcode, translate the LLVM bitcode to JVM bytecode, inline the whole mess into your Python JIT, and then fold everything away to produce something reasonable. But I could be wrong, and Oracle is throwing a lot of money at Graal so I guess we'll find out. -- Nathaniel J. Smith -- https://vorpus.org
On 11/20/2018 10:33 PM, Nathaniel Smith wrote:
On 11/20/2018 2:17 PM, Victor Stinner wrote:
IMHO performance and hiding implementation details are exclusive. You should either use the C API with impl. details for best performances, or use a "limited" C API for best compatibility. The "limited" C API concept would seem to be quite sufficient for extensions that want to extend Python functionality to include new system calls, etc. (pywin32, pyMIDI, pySide, etc.) whereas the numpy and decimal might want best performance. To make things more complicated: numpy and decimal are in a category of modules where if you want them to perform well on JIT-based VMs,
On Tue, Nov 20, 2018 at 6:05 PM Glenn Linderman
wrote: then there's no possible C API that can achieve that. To get the benefits of a JIT on code using numpy or decimal, the JIT has to be able to see into their internals to do inlining etc., which means they can't be written in C at all [1], at which point the C API becomes irrelevant. It's not clear to me how this affects any of the discussion in CPython, since supporting JITs might not be part of the goal of a new C API, and I'm not sure how many packages fall between the numpy/decimal side and the pure-ffi side.
-n
[1] Well, there's also the option of teaching your Python JIT to handle LLVM bitcode as a source language, which is the approach that Graal is experimenting with. It seems completely wacky to me to hope you could write a C API emulation layer like PyPy's cpyext, and compile that + C extension code to LLVM bitcode, translate the LLVM bitcode to JVM bytecode, inline the whole mess into your Python JIT, and then fold everything away to produce something reasonable. But I could be wrong, and Oracle is throwing a lot of money at Graal so I guess we'll find out.
Interesting, thanks for the introduction to wacky. I was quite content with the idea that numpy, and other modules that would choose to use the unlimited API, would be sacrificing portability to non-CPython implementations... except by providing a Python equivalent (decimal, and some others do that, IIRC). Regarding JIT in general, though, it would seem that "precompiled" extensions like numpy would not need to be re-compiled by the JIT. But if it does, then the JIT better understand/support C syntax, but JVM JITs probably don't! so that leads to the scenario you describe.
On Tue, 20 Nov 2018 23:17:05 +0100
Victor Stinner
Le mar. 20 nov. 2018 à 23:08, Stefan Krah
a écrit : Intuitively, it should probably not be part of a limited API, but I never quite understood the purpose of this API, because I regularly need any function that I can get my hands on. (...) Reading typed strings directly into an array with minimal overhead.
IMHO performance and hiding implementation details are exclusive. You should either use the C API with impl. details for best performances, or use a "limited" C API for best compatibility.
Since I would like to not touch the C API with impl. details, you can imagine to have two compilation modes: one for best performances on CPython, one for best compatibility (ex: compatible with PyPy). I'm not sure how the "compilation mode" will be selected.
You mean the same API can compile to two different things depending on a configuration? I expect it to be error-prone. For example, let's suppose I want to compile in a given mode, but I also use Numpy's C API. Will the compile mode "leak" to Numpy as well? What if a third-party header includes "Python.h" before I do the "#define" that's necessary? Regards Antoine.
Le mer. 21 nov. 2018 à 12:11, Antoine Pitrou
You mean the same API can compile to two different things depending on a configuration?
Yes, my current plan is to keep #include
I expect it to be error-prone. For example, let's suppose I want to compile in a given mode, but I also use Numpy's C API. Will the compile mode "leak" to Numpy as well?
For example, if we continue to use Py_LIMITED_API: I don't think that Numpy currently uses #ifdef Py_LIMITED_API, nor plans to do that. If we add a new define (ex: my current proof-of-concept uses Py_NEWCAPI), we can make sure that it's not already used by Numpy :-)
What if a third-party header includes "Python.h" before I do the "#define" that's necessary?
IMHO the define should be added by distutils directly, using -D
On 2018-11-19, 11:59 GMT, Stefan Krah wrote:
In practice people desperately *have* to use whatever is there, including functions with underscores that are not even officially in the C-API.
Yes, there are some functions which evaporated and I have never heard a reason why and how I am supposed to overcome their removal. E.g., when porting M2Crypto to Python3 I had to reimplement my own (bad) version of `FILE* PyFile_AsFile(PyObject *pyfile)` function (https://is.gd/tgQGDw). I think it is obvious why it is necessary for C bindings, and I have never found a way how to get the underlying FILE handle from the Python File object properly. Just my €0.02. Matěj -- https://matej.ceplovi.cz/blog/, Jabber: mcepl@ceplovi.cz GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 All of us could take a lesson from the weather. It pays no attention to criticism. -- somewhere on the Intenret
On Wed, Nov 21, 2018, at 06:53, Matěj Cepl wrote:
On 2018-11-19, 11:59 GMT, Stefan Krah wrote:
In practice people desperately *have* to use whatever is there, including functions with underscores that are not even officially in the C-API.
Yes, there are some functions which evaporated and I have never heard a reason why and how I am supposed to overcome their removal. E.g., when porting M2Crypto to Python3 I had to reimplement my own (bad) version of `FILE* PyFile_AsFile(PyObject *pyfile)` function (https://is.gd/tgQGDw). I think it is obvious why it is necessary for C bindings, and I have never found a way how to get the underlying FILE handle from the Python File object properly.
In Python 3, there is no underlying FILE* because the io module is implemented using fds directly rather than C stdio.
On 2018-11-21, 14:54 GMT, Benjamin Peterson wrote:
In Python 3, there is no underlying FILE* because the io module is implemented using fds directly rather than C stdio.
OK, so the proper solution is to kill all functions which expect FILE, and if you are anal retentive about stability of API, then you have to fake it by creating FILE structure around the underlying fd handler as I did in M2Crypto, right? Best, Matěj -- https://matej.ceplovi.cz/blog/, Jabber: mcepl@ceplovi.cz GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8 If you have a problem and you think awk(1) is the solution, then you have two problems. -- David Tilbrook (at least 1989, source of the later famous jwz rant on regular expressions).
On 11/21/18 4:11 PM, Matěj Cepl wrote:
On 2018-11-21, 14:54 GMT, Benjamin Peterson wrote:
In Python 3, there is no underlying FILE* because the io module is implemented using fds directly rather than C stdio.
OK, so the proper solution is to kill all functions which expect FILE
Indeed. This has another side to it: there are file-like objects that aren't backed by FILE*. In most case, being a "real" file is an unnecessary distinction, like that between the old `int` vs. `long`. "Fits in the machine register" is a detail from a level below Python, and so is "the kernel treats this as a file". Of course, this is not how C libraries work -- so, sadly, it makes wrappers harder to write. And a perfect solution might require adding more generic I/O to the C library.
and if you are anal retentive about stability of API, then you have to fake it by creating FILE structure around the underlying fd handler as I did in M2Crypto, right?
Yes, AFAIK that is the least bad solution. I did something very similar here: https://github.com/encukou/py3c/blob/master/include/py3c/fileshim.h
On 2018-11-22, 10:13 GMT, Petr Viktorin wrote:
Yes, AFAIK that is the least bad solution. I did something very similar here: https://github.com/encukou/py3c/blob/master/include/py3c/fileshim.h
Thank you. Matěj
Please open a bug report once you have such issue ;-)
Victor
Le mer. 21 nov. 2018 à 15:56, Matěj Cepl
On 2018-11-19, 11:59 GMT, Stefan Krah wrote:
In practice people desperately *have* to use whatever is there, including functions with underscores that are not even officially in the C-API.
Yes, there are some functions which evaporated and I have never heard a reason why and how I am supposed to overcome their removal. E.g., when porting M2Crypto to Python3 I had to reimplement my own (bad) version of `FILE* PyFile_AsFile(PyObject *pyfile)` function (https://is.gd/tgQGDw). I think it is obvious why it is necessary for C bindings, and I have never found a way how to get the underlying FILE handle from the Python File object properly.
Just my €0.02.
Matěj -- https://matej.ceplovi.cz/blog/, Jabber: mcepl@ceplovi.cz GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8
All of us could take a lesson from the weather. It pays no attention to criticism. -- somewhere on the Intenret
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com
participants (9)
-
Antoine Pitrou
-
Benjamin Peterson
-
Glenn Linderman
-
Jeff Allen
-
Matěj Cepl
-
Nathaniel Smith
-
Petr Viktorin
-
Stefan Krah
-
Victor Stinner