[C++-sig] Make threading support official (was: Re: Conversion of python files to C++ ostreams)

Sat Apr 10 16:14:41 CEST 2010

On 6 Apr 2010 at 12:14, troy d. straszheim wrote:

My apologies for the delay in replying.

> > I think, with hindsight, that we all were going at it wrong at that 
> > time. What we *ought* to have been doing is an upcall mechanism such 
> > that routine X is always called just before entering C++ space and 
> > routine Y is always called just before entering Python space. That 
> > way you get the best of all worlds.
> 
> Makes perfect sense to me, but I'm very new to the problem.  Help me out 
> a bit...  What code of yours should I be looking at?  I found this:
> 
> http://aspn.activestate.com/ASPN/Mail/Message/c++-sig/1865844

Yeah, that's old. Try 
http://github.com/ned14/tnfox/blob/master/Python/BoostPatches.zip 
which is about two years fresher.

> In that patch, why do you (un)lock in invoke.hpp instead of in
> 
>    static PyObject* function_call(...)
> 
> in function.cpp and the various other static C linkage functions that 
> are registered directly with the Python C/API via PyTypeObjects?  At a 
> glance, these seem the closest points to the language boundary.  

I am afraid that I do not remember why. It could have been out of 
ignorance, or simply because the patch I used as a base did so there 
as well. It could also have been that I had a very good reason such 
as forcing an ABI incompatibility to prevent accidental mixing of 
incompatible binaries or something, or perhaps it was a maintenance 
issue or some question of exception safety. It's old code I am happy 
to see replaced with something much better anyway.

> What 
> is supposed to happen when python calls cpp, which calls python again?

In my patch at least the GIL gets repeatedly locked and unlocked as 
necessary however deep you go, including if an exception gets thrown 
or something is being iterated. This isn't fast nor was I ever happy 
about it, but it meant behaviour consistent with people's 
expectations.

>   How about with multiple interpreters? 

Last time I checked this worked fine, though this was some time ago. 
Running multiple interpreters is one of the TnFOX test suite tests 
anyway.

> Would you also need to lock in e.g. object_protocol.cpp:
> 
> void setattr(object const& target, object const& key, object const& value)
> {
>      if (PyObject_SetAttr(target.ptr(), key.ptr(), value.ptr()) == -1)
>          throw_error_already_set();
> }

Maybe I am missing your point, but surely all accesses to Python must 
hold the GIL first, not least because the GIL also specifies the 
current interpreter to use? (I know that you can get away with some 
calls, but relying on this seems hardly prudent).

> If you could indeed use those C linkage functions, how about having 
> boost.python send boost::signals that cppland has been entered/left. 
> (sanity check?)  This would support multiple receivers, 
> connect/disconnect, all that stuff that comes with boost::signals.  This 
> could compile out for singlethreaded versions.   You could even send, 
> say, an enum value with the signal to indicate what was happening 
> (instance_get,  instance_new, function_call) and get some kind of 
> tracing ability out of it.  Thoughts?

I would have issue with boost::signals2 solely because it uses 
mutexes when a lock free implementation is not only entirely 
possible, but highly desirable given the typical use cases of a 
signals and slots implementation. It may however be possible to use 
boost:signals and the Python GIL to serialise around it - in this 
situation then yes, using boost::signals would be useful assuming 
that its overhead is minimal. Obviously enough if you're firing a 
signal in something as minor as your setattr() function then in the 
ideal case you want no code being executed if there are no slots 
registered for that particular signal.

In TnFOX I have a metaprogramming construct which assembles inline a 
jump table of specialisations of classes between which at run time 
can be dynamically chosen. Fairly amazingly, all major compilers 
correctly elide table entries which cannot be chosen such that they 
will remove the choice logic entirely if there is just one possible 
choice, or the whole construct if there are none. This particular 
construct is really useful in BPL actually because it lets you fake 
stuff like setting at runtime arbitrary python code as a C (not C++) 
API based sort function.

Hence it may well be that a static signals and slots implementation 
could be more appropriate in this situation. I guess I wouldn't know 
until I run benchmarks. Your thoughts?

Cheers,
Niall