I'm not quite sure which approach is state-of-the-art as of 2016. How would you do it if you had to make a C/C++ library available in Python right now?
In my case, I have a C library with some scientific functions on matrices and vectors. You will typically call a few functions to configure the computation, then hand over some pointers to existing buffers containing vector data, then start the computation, and finally read back the data. The library also can use MPI to parallelize.