Add a PyObject_VaCallFunction to C API??

The C API has a function PyObject_CallFunction( PyObject*, const char* fmt, ... ). It is a variadic function hence I couldn't pass a va_list to it to invoke the call. My question is, is it technically possible to provide a companion PyObject_VaCallFunction which takes a va_list, just like Py_VaBuildValue is to Py_BuildValue? Here's what I have found so far. I read the Objects/call.c about PyObject_CallFunction there, and actually found a function _PyObject_CallFunctionVa, which is close to what I want, although it has two additional parameters. By taking a look at the implementation of PyObject_CallFunction, naively I was thinking a Va version of that could have been similarly implemented ( I'm ignoring the size_t issue ). I'd appreciate it if someone could enlighten me on this subject. Question rephrased: is there a technical reason why there is no PyObject_VaCallFunction in the API? If not, would it be possible to add it in? Motivation: I write scientific simulation code to be run on large clusters. The data generated can be huge. Although I know parallelization using C++, I don't know it using Python. A direct consequence is formidable data processing time with python, which is driving me nuts. I have two options, either parallelize python code or embed the python in C++. I'm favoring the latter, not just because I don't know about python parallelization, but more importantly because by embedding python in C++, I get to keep using the highly specialized C++ classes and some calculation routines, without having to duplicate essentially the same thing in Python, which is very time-saving. I could have just used the C API, but here is the all-time drawback of linking with Python libraries --- it messes up the linker runtime search path. Usually a Python installation has under its lib/ many common libraries such as libz.so, libhdf5.so ( or I probably should have said earlier that I primarily work on Linux ). They have different versions to, say, the libhdf5.so I'm using in the code. The upshot is that by linking my program with both the major library in which HDF5 links to one version, and the Python libraries in which HDF5 links to another, the runtime search path is always mixed up, which is not safe. So what I thought about doing is that I'm gonna create a C++ wrapper on Python, one that doesn't expose the raw Python to the client code at all. ( For example, there is no #include <Python.h> in any header of that library. Plus I could use C++ OOP to automate away keeping tracking of Py_INCREF/Py_DECREF. ) In fact it's very doable. Everything is straightforward, except variadic functions, such as PyObject_CallFunction, Py_BuildValue, and so on. Let me use PyObject_CallFunction to illustrate the problem. Let's say that I want to provide a wrapper to PyObject_CallFunction to the client, like this ( I'm taking the return type to void for simplicity ) void MyCallFunction(PyObject* obj, const char* fmt, ... ); I go ahead and put this line in the header "my_python_wrapper.h". Then in the "my_python_wrapper.c" file I would like to write the following implementation. void MyCallFunction(PyObject* obj, const char* fmt, ... ) { va_list args; va_start(args,fmt); PyObject(obj, fmt, args); // This function doesn't exist in the API ( yet? ) va_end(args); } This way, only the "my_python_wrapper.c" needs to link with Python Libraries. Any user of my_python_wrapper doesn't need to, which seems nice. ( In cmake lingo, I only need to target_link_libraries( my_python_wrapper PRIVATE ${PYTHON_LIBRARIES} ), instead of PUBLIC ) As one can see, the crux is that not all variadic functions in the API has a companion Va-ed version. So far I only found Py_VaBuildValue. I've worked out MyCallFunction() in my actual code in the same manner described above, but with Py_VaBuildValue. What I did was I send all variadic arguments to a MyBuildValue(PyObject*, const char*, ...), the in the .c file, MyBuildValue will generate a va_list and pass it onto Py_VaBuildValue, then I force the outcome to be a tuple and pass it to PyObject_CallObject and it works! Nontheless, this approach seems less straightforward to having a PyObject_VaCallFunction, so I'm guessing it may have performance penalty. I really appreciate it whoever takes their time to read this essay of mine! I apologize if I failed to use the idiomatic mark-up. Any comments, questions on this subject are welcome!

I'm wondering if the PyCXX project would help you. "PyCXX is a set of classes to help create extensions of Python in the C++ language. The first part encapsulates the Python C API taking care of exceptions and ref counting. The second part supports the building of Python extension modules in C++." Project page: https://sourceforge.net/projects/cxx/ <https://sourceforge.net/projects/cxx/> Python2 docs: http://cxx.sourceforge.net/PyCXX-Python3.html <http://cxx.sourceforge.net/PyCXX-Python3.html> Let me know if I can help, I maintain PyCXX. Barry

Thanks for the suggestion. I haven't looked at PyCXX yet. I'll definitely check it out sometime. In light of your and other people's replies, I probably will postpone the development till I think it through, especially after another day of trying and failing. I do have one question for you, and for anyone who is kind enough to read this: how can I know if a PyObject_Call(...) or PyObject_GetAttr(...) returns a new ref or a borrowed ref? I read online ( Python 3 ) that they are returning new references. But does everyone comply with it? If so, it seems safe for me to capture the return PyObject* into a wrapper class that automatically Py_DECREF() upon destruction. But if not, what documentations should I refer to for the behavior of a specific python module ( for example, matplotlib )? This is the first time I ever checked out the Python C API, I guess I'm in need of some conventions here. Your input is much appreciated.

On 2020-01-07 20:22, hrfudan@gmail.com wrote:
The docs will say if it returns a new reference or a borrowed reference. The general rule is that those functions that store an object in a container, e.g. PyList_Append, will incref in order to retain the object, but there are a few, such as PyList_SetItem, which don't incref, but instead "steal" the reference. A careful read of the docs is always advised. There's an occasional exception!

There are important and interesting discussions taking place both on
Discourse (https://discuss.python.org) and on this email list (Mailman?). Discourse has email integration ( https://discourse.gnome.org/t/interacting-with-discourse-via-email/46). I prefer the rich-text and thorough threading provided by Discourse. Could we switch the email discussions to Discourse email? Has this been considered earlier and rejected? Cheers, -- Juancarlo *Añez*

Juancarlo Añez writes:
Could we switch the email discussions to Discourse email? Has this been considered earlier and rejected?
Yes, of course we can. It has been brought up several times. It hasn't been rejected, but there's strong opposition. Eg, based on my experience with Discourse and its mail capabilities, I'd probably just stop participating (with no great loss to either Python-Ideas or to me ;-). Others are more invested, and have expressed their opposition in "oh gawd please NO!" terms. For the practical implications of Discourse vs. Mailman for people like me who are heavily invested in email-based workflows, you'll have to find those earlier threads. Steve

06.01.20 00:20, hrfudan@gmail.com пише:
I've worked out MyCallFunction() in my actual code in the same manner described above, but with Py_VaBuildValue. What I did was I send all variadic arguments to a MyBuildValue(PyObject*, const char*, ...), the in the .c file, MyBuildValue will generate a va_list and pass it onto Py_VaBuildValue, then I force the outcome to be a tuple and pass it to PyObject_CallObject and it works! Nontheless, this approach seems less straightforward to having a PyObject_VaCallFunction, so I'm guessing it may have performance penalty.
You should not worry much about performance penalty of creating a tuple if you use format string and variable arguments, because the latter have large overhead. In common cases using Py_VaBuildValue() should be enough. In performance critical code use PyObject_CallObject() or PyObject_Call(). Or even private C API like _PyObject_FastCallDict() on your risk.

For the most part, people find it more productive to call C++ from Python than the other way around. That is: the “management” code is in Python and the core computation in C or C++ or FORTRAN. Keep in mind that Python has the GIL, so if you are calling into the interpreter from a multi-processing code, it will all hit that bottleneck anyway. Unless you are calling separate Python processes from separate C++ processes, in which case it’s hard for me to imagine that would be easier to do (or high enough performance) than to manage your multiple processes from Python. And do take a look at tools that help you with the C:Python bridge: Cxx was mentioned, and there is also Cython, Boost Python, etc. I only have direct experience with Cython, but it does help a lot, and even provides parallelism for tight loops. -CHB On Mon, Jan 6, 2020 at 1:32 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

I fully agree that the workflow should be Python to C. Thanks for the suggested tools. I heard about Cython almost the same time I heard about Python. I will look at it for wisdom! My situation is exactly the "Unless" one you mentioned. Since the dataset can be huge, multiple computer nodes will be needed to crunch the numbers and form something ( hopefully much smaller ) to be visualized in Python ( via matplotlib ). I could save a dataset ( say in hdf5 ) after all the computers finished calculation, and read it into Python. But I was thinking since the processed dataset is already in the memory, maybe a direct call to Python for matplotlib functionality could be achieved?

I see! That definitely helps. What about a call to PyObject_CallMethodObjArgs()? Under the hood, is it really the same as a first call to PyObject_GetAttr() to get an attribute which is followed by another call to PyObject_Call()? I read Objects/call.c and seems to me this is the case. I'm asking because with Py_VaBuildValue() the only to simulate Py_CallMethodObjArgs() is the two-way call described above.

On 2020-01-07 20:37, Rui Hu wrote:
I see! That definitely helps. What about a call to PyObject_CallMethodObjArgs()? Under the hood, is it really the same as a first call to PyObject_GetAttr() to get an attribute which is followed by another call to PyObject_Call()? I read Objects/call.c and seems to me this is the case. I'm asking because with Py_VaBuildValue() the only to simulate Py_CallMethodObjArgs() is the two-way call described above.
It returns a new reference because it's returning a result, which might be a newly-created object. Consider, say, PyList_GetItem. The object it returns is in the list, so doing an incref would be wasteful (you're not expecting the object to disappear unexpectedly). It's a borrowed reference, and that's good enough. Now consider PyUnicode_Substring. The result is a substring, so some copying must take place. A new string is created and returned. It returns a new reference. Except that it doesn't always create a new string. If the substring happens to include all of the characters, there's no need to make a copy, because strings are immutable. It'll just do an incref and return the original string, and your code will be none the wiser.

I'm wondering if the PyCXX project would help you. "PyCXX is a set of classes to help create extensions of Python in the C++ language. The first part encapsulates the Python C API taking care of exceptions and ref counting. The second part supports the building of Python extension modules in C++." Project page: https://sourceforge.net/projects/cxx/ <https://sourceforge.net/projects/cxx/> Python2 docs: http://cxx.sourceforge.net/PyCXX-Python3.html <http://cxx.sourceforge.net/PyCXX-Python3.html> Let me know if I can help, I maintain PyCXX. Barry

Thanks for the suggestion. I haven't looked at PyCXX yet. I'll definitely check it out sometime. In light of your and other people's replies, I probably will postpone the development till I think it through, especially after another day of trying and failing. I do have one question for you, and for anyone who is kind enough to read this: how can I know if a PyObject_Call(...) or PyObject_GetAttr(...) returns a new ref or a borrowed ref? I read online ( Python 3 ) that they are returning new references. But does everyone comply with it? If so, it seems safe for me to capture the return PyObject* into a wrapper class that automatically Py_DECREF() upon destruction. But if not, what documentations should I refer to for the behavior of a specific python module ( for example, matplotlib )? This is the first time I ever checked out the Python C API, I guess I'm in need of some conventions here. Your input is much appreciated.

On 2020-01-07 20:22, hrfudan@gmail.com wrote:
The docs will say if it returns a new reference or a borrowed reference. The general rule is that those functions that store an object in a container, e.g. PyList_Append, will incref in order to retain the object, but there are a few, such as PyList_SetItem, which don't incref, but instead "steal" the reference. A careful read of the docs is always advised. There's an occasional exception!

There are important and interesting discussions taking place both on
Discourse (https://discuss.python.org) and on this email list (Mailman?). Discourse has email integration ( https://discourse.gnome.org/t/interacting-with-discourse-via-email/46). I prefer the rich-text and thorough threading provided by Discourse. Could we switch the email discussions to Discourse email? Has this been considered earlier and rejected? Cheers, -- Juancarlo *Añez*

Juancarlo Añez writes:
Could we switch the email discussions to Discourse email? Has this been considered earlier and rejected?
Yes, of course we can. It has been brought up several times. It hasn't been rejected, but there's strong opposition. Eg, based on my experience with Discourse and its mail capabilities, I'd probably just stop participating (with no great loss to either Python-Ideas or to me ;-). Others are more invested, and have expressed their opposition in "oh gawd please NO!" terms. For the practical implications of Discourse vs. Mailman for people like me who are heavily invested in email-based workflows, you'll have to find those earlier threads. Steve

06.01.20 00:20, hrfudan@gmail.com пише:
I've worked out MyCallFunction() in my actual code in the same manner described above, but with Py_VaBuildValue. What I did was I send all variadic arguments to a MyBuildValue(PyObject*, const char*, ...), the in the .c file, MyBuildValue will generate a va_list and pass it onto Py_VaBuildValue, then I force the outcome to be a tuple and pass it to PyObject_CallObject and it works! Nontheless, this approach seems less straightforward to having a PyObject_VaCallFunction, so I'm guessing it may have performance penalty.
You should not worry much about performance penalty of creating a tuple if you use format string and variable arguments, because the latter have large overhead. In common cases using Py_VaBuildValue() should be enough. In performance critical code use PyObject_CallObject() or PyObject_Call(). Or even private C API like _PyObject_FastCallDict() on your risk.

For the most part, people find it more productive to call C++ from Python than the other way around. That is: the “management” code is in Python and the core computation in C or C++ or FORTRAN. Keep in mind that Python has the GIL, so if you are calling into the interpreter from a multi-processing code, it will all hit that bottleneck anyway. Unless you are calling separate Python processes from separate C++ processes, in which case it’s hard for me to imagine that would be easier to do (or high enough performance) than to manage your multiple processes from Python. And do take a look at tools that help you with the C:Python bridge: Cxx was mentioned, and there is also Cython, Boost Python, etc. I only have direct experience with Cython, but it does help a lot, and even provides parallelism for tight loops. -CHB On Mon, Jan 6, 2020 at 1:32 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

I fully agree that the workflow should be Python to C. Thanks for the suggested tools. I heard about Cython almost the same time I heard about Python. I will look at it for wisdom! My situation is exactly the "Unless" one you mentioned. Since the dataset can be huge, multiple computer nodes will be needed to crunch the numbers and form something ( hopefully much smaller ) to be visualized in Python ( via matplotlib ). I could save a dataset ( say in hdf5 ) after all the computers finished calculation, and read it into Python. But I was thinking since the processed dataset is already in the memory, maybe a direct call to Python for matplotlib functionality could be achieved?

I see! That definitely helps. What about a call to PyObject_CallMethodObjArgs()? Under the hood, is it really the same as a first call to PyObject_GetAttr() to get an attribute which is followed by another call to PyObject_Call()? I read Objects/call.c and seems to me this is the case. I'm asking because with Py_VaBuildValue() the only to simulate Py_CallMethodObjArgs() is the two-way call described above.

On 2020-01-07 20:37, Rui Hu wrote:
I see! That definitely helps. What about a call to PyObject_CallMethodObjArgs()? Under the hood, is it really the same as a first call to PyObject_GetAttr() to get an attribute which is followed by another call to PyObject_Call()? I read Objects/call.c and seems to me this is the case. I'm asking because with Py_VaBuildValue() the only to simulate Py_CallMethodObjArgs() is the two-way call described above.
It returns a new reference because it's returning a result, which might be a newly-created object. Consider, say, PyList_GetItem. The object it returns is in the list, so doing an incref would be wasteful (you're not expecting the object to disappear unexpectedly). It's a borrowed reference, and that's good enough. Now consider PyUnicode_Substring. The result is a substring, so some copying must take place. A new string is created and returned. It returns a new reference. Except that it doesn't always create a new string. If the substring happens to include all of the characters, there's no need to make a copy, because strings are immutable. It'll just do an incref and return the original string, and your code will be none the wiser.
participants (10)
-
Barry Scott
-
Christopher Barker
-
David Mertz
-
Greg Ewing
-
hrfudan@gmail.com
-
Juancarlo Añez
-
MRAB
-
Rui Hu
-
Serhiy Storchaka
-
Stephen J. Turnbull