My first impression is that making things faster and hiding implementation details in the ABI are contrary goals. I agree with hiding implementation details in the API but not in the ABI.
For example, you mention that you want to make Py_INCREF() a function call instead of a macro. But since Py_INCREF is very common, I would guess that this would make performance worse (not by much maybe but surely measurable).
Jeroen.