[capi-sig]Learning from JNI [was: Opaque handle API]
On 2019-02-28, Carl Shapiro wrote:
Because of all of the accumulated experience with handles in other systems, I think CPython is positioned to do much better than its predecessors.
I spent some time last night reading about JNI and I see that it solves many of the problems we are trying to solve. Certainly we should learn from it.
You can download a PDF copy of the JNI book here:
http://java.sun.com/docs/books/jni/
This looks like a useful article as well, outlining common mistakes when using the JNI:
https://www.ibm.com/developerworks/library/j-jni/index.html
The JNI book is pretty old so I'm not sure if JNI has evolved a lot since then. However, after only skimming the book last night, I find lots of interesting ideas.
First, JNI passes a JNIEnv pointer as the first argument of all native methods. I think it is similar to our threadstate structure. Explicitly passing it avoids some problems. Since Java doesn't have a GIL, smoothly handling threading is a big deal. I don't know if we should emulate that and pass threadstate (or something similar) as well.
At a recent core sprint, I recall discussing an idea like that with Dino and Carl. E.g. a new flag for extension modules that would make CPython pass the threadstate to extension functions. I'm pretty ignorant when it comes to multi-threading but I think those guys thought looking it up in thread local storage might be quick enough, rather explicitly passing it everywhere.
Rather than the JNI API being functions you can call, like PyObject_Something(x), they are implemented as a vtable on the JNIEnv structure. So, you do something like:
Java_do_something(JNIEnv *env, jobject obj)
{
(*env)->DoSomething(env, obj)
}
JNI provides strict binary compatiblity so you really can't have macros or inlined functions as part of the API. This vtable idea has some nice advantages. You can start the JVM with different command line parameters and a different vtable can be used. CPython does something like this for tracemalloc. The JNI way seems cleaner and maybe more powerful.
Using a macro would seem cleaner to me, e.g.
#define DoSomething(env, obj) ((*env)->DoSomething(env, ob))
Java_do_something(JNIEnv *env, jobject obj)
{
DoSomething(env, obj);
}
Maybe we could have it both ways (binary compatiblity or lower overhead). Use an inline function like the following:
static inline void
DoSomething(JNIEnv *env, jobject obj)
{
#ifdef STABLE_BINARY_INTERFACE ((*env)->DoSomething(env, ob)) #else ... inline implementation of env->DoSomething #fi }
There are three kinds of opaque object references (handles): local references, global references and weak global references. As I understand, local references are a handle that gets closed when your native method returns. That sounds useful and makes life easier for extension authors (harder to leak memory if forgetting to close handles). You are limited in the number of local references you can use (default 16?) but the limit can be increased. You can also explicitly close local handles so you don't run out or so you free large chunks of memory. E.g.
lref = ... /* a large Java object */
...
(*env)->DeleteLocalRef(env, lref);
Local references sound very much like what Carl Shapiro and Larry Hastings were suggesting as a way to deal with borrowed references in the CPython API. I.e. make them a local reference and then close them when native function returns.
Global references are what I was thinking of for the PyHandle API. They would live beyond your native function call and you have to remember to close them. Weak global references are pretty obviously. We would want to provide them too.
JNI uses a similar scheme to CPython to deal with errors. I.e. JNI methods typically return NULL on error and set something inside the JNIEnv structure to record the details of the error. They have a method that is like PyErr_Occurred(), e.g.
if ((*env)->ExceptionCheck(env)) {
return NULL // error case
}
They spell out explicitly which JNI methods are safe to call when an error has occurred. In the JNI book, they say:
It is extremely important to check, handle, and clear a pending
exception before calling any subsequent JNI functions.
I gather this is a source of many bugs. I wonder if it would be better to return an object that enforces correct error handling. One example I found was the LLVM Error class:
https://llvm.org/doxygen/classllvm_1_1Error.html#details
I don't know how you would implement something like that in C. Maybe returning NULL is okay as it is working for JNI and matches what CPython does internally.
The JNI has to solve a similar problem to Python and provide a rich set of accessor functions for object handles. The JNI approach works no matter how the Java virtual machine represents objects internally. This abstration has a cost and so they provide a faster way for repeated access to primitive data types, such as arrays and strings. E.g. a function that gets a "pinned" version of the array elements.
JNI provides native access to fields and methods of Java objects. The JNI identifies methods and fields by their symbolic names and type descriptors. A two-step process factors out the cost of locating the field or method from its name and descriptor. For example, to read an integer instance field i in class cls, native code first obtains a field ID, as follows:
jfieldID fid = env->GetFieldID(env, cls, "i", "I");
The native code can then use the field ID repeatedly, without the cost of field lookup, as follows:
jint value = env->GetIntField(env, obj, fid);
There are rules about how the field ID can be cached. The advantage of this design is that JNI does not impose any restrictions on how field and method IDs are implemented internally.
Regards,
Neil
On 2019-03-01, Neil Schemenauer wrote:
You can download a PDF copy of the JNI book here:
http://java.sun.com/docs/books/jni/
Sorry, broken link, you can use this one (found on the Wikipedia page):
https://web.archive.org/web/20120728074805/http://java.sun.com/docs/books/jni/
Sorry, broken link, you can use this one (found on the Wikipedia page):
https://web.archive.org/web/20120728074805/http://java.sun.com/docs/books/jni/
The Android JNI tips page that Carl provided also has a lot of useful information:
https://developer.android.com/training/articles/perf-jni
I just started reading the spec and it is quite readable as well. I assume it up-to-date whereas the book might not be.
https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/jniTOC.html
The objectives for the JNI match closely what I would like to achieve with a revised Python C API:
http://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/intro.html#wp16410
If we implement PyHandle in CPython in a simple way (e.g. as a typecast from PyObject), I think we will have similar problems to what the Android runtime encountered around the release of Ice Cream Sandwich. See:
https://android-developers.googleblog.com/2011/11/jni-local-reference-changes-in-ics.html
Again, a possible solution might be a debug build option CPython that uses indirect pointers for the handles and checks that the API is used correctly. Or, if we went with a JNIEnv* like vtable, you could have a command-line flag that switched CPython to use indirect pointers for handles.
On Sat, 2 Mar 2019 at 12:08, Neil Schemenauer <nas-python@arctrix.com> wrote:
If we implement PyHandle in CPython in a simple way (e.g. as a typecast from PyObject), I think we will have similar problems to what the Android runtime encountered around the release of Ice Cream Sandwich. See:
https://android-developers.googleblog.com/2011/11/jni-local-reference-changes-in-ics.html
Again, a possible solution might be a debug build option CPython that uses indirect pointers for the handles and checks that the API is used correctly. Or, if we went with a JNIEnv* like vtable, you could have a command-line flag that switched CPython to use indirect pointers for handles.
Given the long history of the existing CPython C API, I think any new handle-based design is still going to have to support the invariant that "c_handle_X == c_handle_Y" <=> "python_object_X is python_object_Y".
Anything else will be far too error prone (Consider that JNI has *never* worked that way, and people still get it wrong repeatedly based on assumption about how object identity should work in C/C++. How much worse would the problem be for us given that Python's C API *has* worked that way for the past ~30 years?)
That invariant doesn't require that Python level object IDs actually be C level memory addresses (even if CPython implements it that way), but it does require that there be a 1:1 mapping between live handles and the live objects they reference. While that does bring in the issues that Carl mentions with recycling of object IDs creating surprising false equivalences, those aren't new problems either (CPython's generous use of free lists for builtin types means that IDs get recycled all the time in the existing APIs).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 2019-03-03, Nick Coghlan wrote:
Given the long history of the existing CPython C API, I think any new handle-based design is still going to have to support the invariant that "c_handle_X == c_handle_Y" <=> "python_object_X is python_object_Y".
What is not clear to me yet is how difficult it would be for other Python VMs to implement a handle API that gives that invariant. Based on some dicussion with Armin, it sounds like PyPy takes a significant performance hit to do that. If your GC can move objects, how do you efficiently implement that invariant?
I guess different people have different ideas on the goals of a new API. To me, the goals are quite similar to what lead to Java's Native Interface. I want an API so extensions modules (native code, in Java speak) work efficiently with multiple Python VMs. The API for extensions should not unduly constrain the implementation choices of the VM. One critical point is that the VM should be able to use an object moving GC.
For the new API to displace the existing one, there are some further requirements:
must be easy to convert existing extension modules to new API
must be possible to use converted extension with older CPython versions (provide shim layer)
new API should not be lower performance or less capable than existing API
existing API is still available for modules that are not converted
Anything else will be far too error prone (Consider that JNI has *never* worked that way, and people still get it wrong repeatedly based on assumption about how object identity should work in C/C++. How much worse would the problem be for us given that Python's C API *has* worked that way for the past ~30 years?)
You might be correct but I wonder if the problem is so bad for Python. You already have to be very careful when comparing objects by id. E.g. interned strings compare by id but non-interned ones don't. Or, small integers compare by id but not if they are larger. So, maybe extension code would actually get better if we more strongly enforce the rule that you can't assume handles can be compared like that.
I worry that if we decide up front that the new API must work like the existing API in these ways, we are going to force ourselves into a design that is too CPython specific. Then, there will be no benefit to other VMs with the new API vs the existing one.
Regards,
Neil
On 03Mar.2019 1206, Neil Schemenauer wrote:
On 2019-03-03, Nick Coghlan wrote:
Given the long history of the existing CPython C API, I think any new handle-based design is still going to have to support the invariant that "c_handle_X == c_handle_Y" <=> "python_object_X is python_object_Y".
What is not clear to me yet is how difficult it would be for other Python VMs to implement a handle API that gives that invariant. Based on some dicussion with Armin, it sounds like PyPy takes a significant performance hit to do that. If your GC can move objects, how do you efficiently implement that invariant?
FWIW, COM solves this issue by requiring that you cast differently typed pointers to the same interface (typically IUnknown) before comparing for identity.
In my opinion, you can dislike many of the Windows-specific "enhancements" around COM (like DCOM, etc., and I do dislike them), but the core concepts are very well proven, including being used from JavaScript and .NET (fully GC languages). Perhaps moreso than JNI?
Cheers, Steve
On 2019-03-03, Neil Schemenauer wrote:
What is not clear to me yet is how difficult it would be for other Python VMs to implement a handle API that gives that [one-to-one] invariant.
Some more details on why I think this could be hard to implement. Maybe I'm missing a fast way to do it.
In the case that the VM moves objects, something like handles are necessary. Otherwise, when the GC moves an object, there is no way for it to update the pointers it has given out via the API. Handles fix that with another level of indirection.
To do handles without the one-to-one invariant is easy and fast. You can have a table of handles (lookup is O(1) based on the handle value) and you allocate by just using the next free value in the table. You can have a free-list like malloc uses if you want to fill holes. The Android indirect_reference code I linked is a fancier version that allowed the local reference behavior (stack-like deallocation of handles).
If you want the handles to correspond one-to-one with the managed objects, how can you do that? An obvious approach would be to use a hash table or simimlar O(1) data structure based on the pointer address of the managed object. I.e. if you need a new handle, lookup in the table if there is one already and then use that.
A moving GC has no problem to update the indirect references in the handle table. Just treat it as another set of GC roots. However, since the managed object is moving, you can no longer use the pointer address to lookup the handle (e.g. the hash value has changed). When moving, you could remove it and then add it back after the move. That would make the GC process a lot slower though.
You could give every managed object an ID field. Bad news is that you have doubled the storage size of small objects like floats and fixed ints (assuming the VM can store them unboxed). That could be used to make the Python id() function return a stable value. It seems one-to-one handles could use whatever solution the VM uses to implement id(). In the PyPy docs, they say this about id():
https://pypy.readthedocs.io/en/latest/cpython_differences.html
Using the default GC (called minimark), the built-in function
id() works like it does in CPython. With other GCs it returns
numbers that are not real addresses (because an object can move
around several times) and calling it a lot can lead to
performance problem.
Regards,
Neil
On 2019-03-04, Neil Schemenauer wrote:
It seems one-to-one handles could use whatever solution the VM uses to implement id().
I forgot about hash() as well. We might not like id() but I don't think we can consider messing with hash(). I found this on the PyPy site:
https://pypy.readthedocs.io/en/release-2.4.x/garbage_collection.html
Minimark GC
...
The objects move once only, so we can use a trick to implement
id() and hash(). If the object is not in the nursery, it won’t
move any more, so its id() and hash() are the object’s address,
cast to an integer. If the object is in the nursery, and we ask
for its id() or its hash(), then we pre-reserve a location in
the old stage, and return the address of that location. If the
object survives the next minor collection, we move it there, and
so its id() and hash() are preserved. If the object dies then
the pre-reserved location becomes free garbage, to be collected
at the next major collection.
It is a clever way to implement id() and hash(). However, if the API requires stable IDs for objects, it is as if you are calling id/hash on all of the objects passed over the extension API. I think that has the effect of copying all those objects out of the GC nursery. GC performance would be quite bad if you did it for a lot of objects.
Regards,
Neil
On 2019-03-01, Neil Schemenauer wrote:
Maybe we could have it both ways (binary compatiblity or lower overhead). Use an inline function like the following:
static inline void DoSomething(JNIEnv *env, jobject obj) {
#ifdef STABLE_BINARY_INTERFACE ((*env)->DoSomething(env, ob)) #else ... inline implementation of env->DoSomething #fi }
I was thinking about this and had a refinement idea. Can we have an extension API with most of the benefits of inline code but still retain binary compatibility if the extension is used with a different VM or a different version of the VM? Keep a flag in the 'env' struct to record if the inline code can be used (e.g. abi_flags). That flags can be set when the extension initializes itself. If the extension was compiled with a non-compatible VM, flags would be false. Then, the API functions can do the following:
static inline void
DoSomething(JNIEnv *env, jobject obj)
{
if ((*env)->abi_flags & ABI_INLINE_OKAY) {
... inline implementation of env->DoSomething
}
else {
((*env)->DoSomething(env, ob))
}
}
I think this could provide a performance boost. The value of abi_flags should be in L1 cache or maybe even a register. The branch prediction should have no trouble to predict the 'if'. We would need to write some benchmarks to determine if this is a really a win.
As a concrete example of a function like PyList_GET_ITEM():
// note that 'list' must be a list object, no type checking
static inline PyHandle
PyHandle_ListGetItem(PyEnv *env, PyHandle list, ssize_t i)
{
if ((*env)->abi_flags & ABI_INLINE_OKAY) {
PyObject *op = ((PyListObject *)list)->ob_item[i];
Py_INCREF(op); // unlike PyList_GET_ITEM(), not borrowed
return (PyHandle)op;
}
else {
return ((*env)->ListGetItem(env, list, i));
}
}
Regards,
Neil
participants (3)
-
Neil Schemenauer
-
Nick Coghlan
-
Steve Dower