Patching builtin_id to allow for C proxy objects?
Hi all. I'm writing a module to proxy C++ objects into Python for a large C++ application. There are hundreds of thousands of C++ objects, some of which are temporary while others are very long lived. Currently every time one of these objects is accessed from Python, a new "myproxy" instance is created. So if I were to access the same field of an object twice, I would receive two python objects proxying the same underlying C++ object. This messes up "id" and "is", and is causing me issues when, for example, I run into circular references when enoding json or otherwise attempt to determine whether two proxy objects refer to the same C++ object. I can't see how to cache the "myproxy" objects instead of returning new instances - due to the architecture of the C++ application, there's no weak reference support at all, and the number of objects is very large. My current plan would be for me to override the id builtin to return the underlying C++ object instance pointer instead of the PyObject instance pointer in the case of the "myproxy" object type, probably using a new type method slot tp_id or similar. The old behaviour would be unchanged for all other types, naturally. I'd also need to alter ceval.c to use builtin_id instead of the raw pointer for comparison when using PyCmp_IS and PyCmp_IS_NOT. I can see that there could very well be many other sites throughout the C source where the pointer was directly compared, and would cause interesting issues for me down the line. I'm just not sure what else to try. I'd like to know if I'm being laughably naive or not before I went about this plan, and whether it'd be worthwhile contributing the patch back, considering the number of potentially overridden-id-unaware areas throught the rest of the python source base. Thanks. Tom.
Tom Whittock, 27.06.2011 12:48:
I'm writing a module to proxy C++ objects into Python for a large C++ application. There are hundreds of thousands of C++ objects, some of which are temporary while others are very long lived.
Currently every time one of these objects is accessed from Python, a new "myproxy" instance is created. So if I were to access the same field of an object twice, I would receive two python objects proxying the same underlying C++ object. This messes up "id" and "is"
Note that "is" actually compares the addresses, not the id().
and is causing me issues when, for example, I run into circular references when enoding json or otherwise attempt to determine whether two proxy objects refer to the same C++ object.
I can't see how to cache the "myproxy" objects instead of returning new instances - due to the architecture of the C++ application, there's no weak reference support at all, and the number of objects is very large.
My current plan would be for me to override the id builtin to return the underlying C++ object instance pointer instead of the PyObject instance pointer in the case of the "myproxy" object type
Where would you get the proxy object from anyway? IMHO, there are two obvious way get what you want: map the C++ object address (integer!) to a proxy object using a dict, or use a backpointer from the C++ object to its proxy. The second is substantially faster, but may require changes to the C++ class struct. I don't see how changes to CPython's core can help you here. Stefan
Tom Whittock wrote:
Currently every time one of these objects is accessed from Python, a new "myproxy" instance is created. So if I were to access the same field of an object twice, I would receive two python objects proxying the same underlying C++ object.
Perhaps you could use a WeakValueDictionary to keep a mapping from a C++ object address to its Python proxy. Then as long as a proxy object is alive, accessing the same C++ object again will get you the same proxy object. When there are no longer any references to the proxy object from Python, it will go away. The next time you access that C++ object you'll get a new proxy, but that won't matter, because the original proxy is no longer around to compare it with. This depends on there being some way for the proxy object to ensure that the C++ object remains alive as long as it does. It also won't solve the problem of keeping the id of the proxy for longer than the proxy exists, but that's probably not something you should be relying on anyway. The id of *any* Python object is only valid while the object lives, and if it's still alive you have a real reference somewhere that you can use instead of the id for identity testing. -- Greg
Hi Greg thanks for your quick reply.
Perhaps you could use a WeakValueDictionary to keep a mapping from a C++ object address to its Python proxy.
Thank you, I'll implement this and see whether it works out. I'll certainly be better off if it does. I was avoiding holding weak references due to perhaps unfounded concerns about increasing the lifetime and speed/memory impact of certain temporary objects which are created at very high frequency. I'll test it and see before diving into messing with id. But now I'm thinking about it again, I can see a plan for not needing to affect that pathway at all. Seems I fell into the trap of making things too complicated for myself.
It also won't solve the problem of keeping the id of the proxy for longer than the proxy exists, but that's probably not something you should be relying on anyway. The id of *any* Python object is only valid while the object lives, and if it's still alive you have a real reference somewhere that you can use instead of the id for identity testing.
Thanks, yes. I'm actually kind of concerned about the usage of id in the markers set which the json library uses for circular referencing checks for exactly this reason. It seems to assume that the objects lifetime lasts for the duration of the encoding operation. I have no idea if that's actually the case in my situation, where data members are property getters producing probably very short lived proxies generated from C++. I guess I'll find out :) Thanks again, Tom.
Hi again. Just to let you know that Greg's suggestion worked beautifully - I guess my id idea was just me trying to make life hard for myself. My concerns over the json modules usage of id seem unjustified, as circular references are detected now that the weak reference dictionary is in place. Thanks for your help, and sorry for bothering dev with something which was a regular python programming issue after all. Tom. On 27 June 2011 13:31, Tom Whittock <tom.whittock@gmail.com> wrote:
Hi Greg thanks for your quick reply.
Perhaps you could use a WeakValueDictionary to keep a mapping from a C++ object address to its Python proxy.
Thank you, I'll implement this and see whether it works out. I'll certainly be better off if it does. I was avoiding holding weak references due to perhaps unfounded concerns about increasing the lifetime and speed/memory impact of certain temporary objects which are created at very high frequency. I'll test it and see before diving into messing with id. But now I'm thinking about it again, I can see a plan for not needing to affect that pathway at all.
Seems I fell into the trap of making things too complicated for myself.
It also won't solve the problem of keeping the id of the proxy for longer than the proxy exists, but that's probably not something you should be relying on anyway. The id of *any* Python object is only valid while the object lives, and if it's still alive you have a real reference somewhere that you can use instead of the id for identity testing.
Thanks, yes. I'm actually kind of concerned about the usage of id in the markers set which the json library uses for circular referencing checks for exactly this reason. It seems to assume that the objects lifetime lasts for the duration of the encoding operation. I have no idea if that's actually the case in my situation, where data members are property getters producing probably very short lived proxies generated from C++. I guess I'll find out :)
Thanks again, Tom.
participants (3)
-
Greg Ewing
-
Stefan Behnel
-
Tom Whittock