Re: Best approach for opaque PyObject
On Sat, Jul 4, 2020, at 11:59, William Pickard wrote:
CPython PR #21262, GitHub username is Wildcard65.
ok looking at that code, I'm not sure I understand how it addresses the PyObject_HEAD thing, unless there's something I'm not seeing I'll ask directly - how would an extension library define a type, and how would it access its own data at the end of objects of that type? I'm also skeptical that a static library would provide any savings vs treating the functions like every other function in the api... a precompiled header might, but while I don't fully understand how precompiled headers are used, I don't *think* it would allow extension libraries to ship as a dll/so file as they currently can.
For backwards compatibility, PyTypeObject will eventually have the flag: Py_TPFLAG_OMIT_PYOBJECT_SIZE, but from what I can tell, eventually all extensions should be using PyType_FromSpec/it's variants. the members I added are internal use only with external public API for extension users to grab their type's data structure (example: PyTupleObject for tuples') as well a properly allocation memory to the type. PyTypeObject.tp_obj_size is a member that holds the total size of the type's object structure (includes the size of it's immediate base type's structure), this will allow Python to properly allocate the correct amount of memory to contain the data. PyTypeObject.to_obj_offset is a member that holds the offset from the 'PyObject *' pointer to that type's internal data structure. A Python PEP has added the term "defining class" for C types that can make utilizing this member simpler, but it's scope will need to be expanded to include all possible C method definitions (It currently only supports fastcall). I am planning on replacing the two members with a private struct called "struct _type_memdata" to simplify readability of code using the members. I'm now leaning towards replacing the Static Library with a precompiled header (Those will not users from building DLLs as they're just a plain headers compiled to machine code used differently from obj files, I BELIEVE at least, haven't tested inlinability yet).
On Sat, Jul 4, 2020, at 12:48, William Pickard wrote:
For backwards compatibility, PyTypeObject will eventually have the flag: Py_TPFLAG_OMIT_PYOBJECT_SIZE, but from what I can tell, eventually all extensions should be using PyType_FromSpec/it's variants.
Er... I don't mean how do they create the PyType. I mean how do they create the actual data type they use? https://docs.python.org/3/extending/newtypes_tutorial.html#adding-data-and-m... What does the definition of CustomObject look like? How do I access the "number" member, given a PyObject *? Can I still cast the PyObject * to CustomObject *, or do I have to go through some conversion function?
I'm now leaning towards replacing the Static Library with a precompiled header (Those will not users from building DLLs as they're just a plain headers compiled to machine code used differently from obj files, I BELIEVE at least, haven't tested inlinability yet).
My understanding is that you can't combine the precompiled header with a dll or even obj file, you have to combine it with source code, which means requiring a precompiled header prevents extension libraries from being shipped as binaries. Requiring modules to be rebuilt from source code to support any changed definition of PyObject, as far as I can tell, defeats the entire purpose of making it opaque.
A precompiled header is a combination of a plain header with a companion special file (.pch/.gch). The companion file is generated from a source file that is designated as the Precompiled Header creator ('/Yc' on MSVC). Every other source file is told to use the special file ('/Yu' on MSVC), the source file compilation will fail if the special file is missing. CPython/third party runtimes will only need to ship this special file with the compiled code, the only downside is a burden of checking the checksum of the file before it's used in a compile process.
Oh and the other question: Yes, a conversion function will be required to utilize the value of the offset member, the conversion is sadly a one way affair BUT CPython's C API is handy enough that a sacrificial 'PyObject *' stack variable can exist, most compilers may end up just optimizing the variable away anyhow.
On Sat, Jul 4, 2020, at 13:19, William Pickard wrote:
A precompiled header is a combination of a plain header with a companion special file (.pch/.gch). The companion file is generated from a source file that is designated as the Precompiled Header creator ('/Yc' on MSVC).
Every other source file is told to use the special file ('/Yu' on MSVC), the source file compilation will fail if the special file is missing.
CPython/third party runtimes will only need to ship this special file with the compiled code, the only downside is a burden of checking the checksum of the file before it's used in a compile process.
But this file can only be used *with the source code* of extension libraries, so shipping this file isn't any better than shipping an ordinary text-based header file (which is, obviously, already done). You seem to believe that the pch/gch file can be used somehow to adapt already-compiled extension libraries to each implementation. I do not think this is true at all, certainly it doesn't seem to be implied by anything in the documentation at https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html .
On Sat, Jul 4, 2020, at 13:22, William Pickard wrote:
Oh and the other question:
Yes, a conversion function will be required to utilize the value of the offset member, the conversion is sadly a one way affair BUT CPython's C API is handy enough that a sacrificial 'PyObject *' stack variable can exist, most compilers may end up just optimizing the variable away anyhow.
hmm... If only single inheritance is still allowed, there's no reason you couldn't support the other direction by subtracting the offset. Supporting multiple inheritance [of two different base classes that both define structures, anyway - something that's not currently allowed in cpython, but might be interesting to add support for] would require knowing the runtime type of the object, but I don't think your code is currently doing anything to support that case.
My understanding of how tp_bases/tp_base is utilized is that it takes the best type object to fill tp_base. But thinking about it now, another issue about offsets that is derived from CPython as it stands. C based types + objects aren't inheritance friendly as it stands, more specifically, they're only really usable as the LAST base type in a base tuple, this offset system can be redone to make it more friendly, but as it stands, it mimics the existing behaviour fluently. The one-way conversion is only if you discard the initial "PyObject *" stack variable as it contains the runtime type of the object. This is a problem for if the runtime type is not the C based type object BUT a PEP accepted for Python 3.9 lays the framework that solves this issue.
There is only 2 ways an extension is distributed to people in the Python universe: As a SOURCE CODE distribution OR a COMPILED BINARY distribution. Wheels are generally the "compiled" distribution, these are also generally built for a specific runtime + python version. For Python 3.10 wheels, these would already be compiled against the precompiled header. For source distributions, they require "building" before being deployed, for Python 3.10, this will include the precompiled header. I'm aiming at least to not increase the API overhead for when invoking stuff like "Py_TYPE", "Py_INCREF", "Py_DECREF", etc. Py_INCREF and Py_DECREF are the worst offenders as they are generally HIGHLY INVOKED functions and are also the functions that are ALWAYS INLINE OPTIMIZED away by modern compilers.
On Sat, Jul 4, 2020, at 14:27, William Pickard wrote:
There is only 2 ways an extension is distributed to people in the Python universe: As a SOURCE CODE distribution OR a COMPILED BINARY distribution.
Wheels are generally the "compiled" distribution, these are also generally built for a specific runtime + python version. For Python 3.10 wheels, these would already be compiled against the precompiled header.
For source distributions, they require "building" before being deployed, for Python 3.10, this will include the precompiled header.
My point was that I still don't understand what the benefit of the precompiled header is, if extensions still have to distribute source to get the benefit of it. This is why I assumed you *didn't* think extensions would have to distribute source. So, how is it better than a normal header? If the point of an opaque PyObject isn't to allow the same compiled distribution of an extension to be used with different versions of python that have different implementations of PyObject then... well, what *is* the point? Why not just recompile normally, with the textual header?
I'm aiming at least to not increase the API overhead for when invoking stuff like "Py_TYPE", "Py_INCREF", "Py_DECREF", etc. Py_INCREF and Py_DECREF are the worst offenders as they are generally HIGHLY INVOKED functions and are also the functions that are ALWAYS INLINE OPTIMIZED away by modern compilers.
If we do go with sending methods like Py_TYPE to be exported from pythoncore, every extension (even first party ones) will end up requiring import and call overhead. The can get noisy and costly REAL FAST. Py_VISIT is a supreme example of this problem as it can noisily fill a stack trace.
participants (2)
-
Random832
-
William Pickard