Python initialization and embedded Python
Hi, The CPython internals evolved during Python 3.7 cycle. I would like to know if we broke the C API or not. Nick Coghlan and Eric Snow are working on cleaning up the Python initialization with the "on going" PEP 432: https://www.python.org/dev/peps/pep-0432/ Many global variables used by the "Python runtime" were move to a new single "_PyRuntime" variable (big structure made of sub-structures). See Include/internal/pystate.h. A side effect of moving variables from random files into header files is that it's not more possible to fully initialize _PyRuntime at "compilation time". For example, previously, it was possible to refer to local C function (functions declared with "static", so only visible in the current file). Now a new "initialization function" is required to must be called. In short, it means that using the "Python runtime" before it's initialized by _PyRuntime_Initialize() is now likely to crash. For example, calling PyMem_RawMalloc(), before calling _PyRuntime_Initialize(), now calls the function NULL: dereference a NULL pointer, and so immediately crash with a segmentation fault. I'm writing this email to ask if this change is an issue or not to embedded Python and the Python C API. Is it still possible to call "all" functions of the C API before calling Py_Initialize()? I was bitten by the bug while reworking the Py_Main() function to split it into subfunctions and cleanup the code to handle the command line arguments and environment variables. I fixed the issue in main() by calling _PyRuntime_Initialize() as soon as possible: it's now the first instruction of main() :-) (See Programs/python.c) To give a more concrete example: Py_DecodeLocale() is the recommanded function to decode bytes from the operating system, but this function calls PyMem_RawMalloc() which does crash before _PyRuntime_Initialize() is called. Is Py_DecodeLocale() used to initialize Python? For example, "void Py_SetProgramName(wchar_t *);" expects a text string, whereas main() gives argv as bytes. Calling Py_SetProgramName() from argv requires to decode bytes... So use Py_DecodeLocale()... Should we do something in Py_DecodeLocale()? Maybe crash if _PyRuntime_Initialize() wasn't called yet? Maybe, the minimum change is to expose _PyRuntime_Initialize() in the public C API? Victor
On 17Nov2017 1601, Victor Stinner wrote:
In short, it means that using the "Python runtime" before it's initialized by _PyRuntime_Initialize() is now likely to crash. For example, calling PyMem_RawMalloc(), before calling _PyRuntime_Initialize(), now calls the function NULL: dereference a NULL pointer, and so immediately crash with a segmentation fault.
I'm writing this email to ask if this change is an issue or not to embedded Python and the Python C API. Is it still possible to call "all" functions of the C API before calling Py_Initialize()?
I thought it was never possible to call most of the C API without initializing, except for certain APIs that are documented as being safe. I've certainly crashed many times calling C APIs before initialization. My intuition was that the only safe ones before were those that were used to initialize the runtime (Py_SetPath and such), which are also the ones being "upgraded" as part of this work. If we have a good idea of which ones are [un]safe now, perhaps we should tag them explicitly in the docs? Do we know which ones are [un]safe? Cheers, Steve
18.11.17 02:01, Victor Stinner пише:
Many global variables used by the "Python runtime" were move to a new single "_PyRuntime" variable (big structure made of sub-structures). See Include/internal/pystate.h.
A side effect of moving variables from random files into header files is that it's not more possible to fully initialize _PyRuntime at "compilation time". For example, previously, it was possible to refer to local C function (functions declared with "static", so only visible in the current file). Now a new "initialization function" is required to must be called.
In short, it means that using the "Python runtime" before it's initialized by _PyRuntime_Initialize() is now likely to crash. For example, calling PyMem_RawMalloc(), before calling _PyRuntime_Initialize(), now calls the function NULL: dereference a NULL pointer, and so immediately crash with a segmentation fault.
Wouldn't be better to revert (the part of) global variables moving? I still don't see a benefit of it.
To give a more concrete example: Py_DecodeLocale() is the recommanded function to decode bytes from the operating system, but this function calls PyMem_RawMalloc() which does crash before _PyRuntime_Initialize() is called. Is Py_DecodeLocale() used to initialize Python?
For example, "void Py_SetProgramName(wchar_t *);" expects a text string, whereas main() gives argv as bytes. Calling Py_SetProgramName() from argv requires to decode bytes... So use Py_DecodeLocale()...
Should we do something in Py_DecodeLocale()? Maybe crash if _PyRuntime_Initialize() wasn't called yet?
I think Py_DecodeLocale() should be usable before calling Py_Initialize(). In the example in Doc/extending/extending.rst it is used before Py_Initialize(). If the third-party code is based on this example, it will crash now.
On 18 November 2017 at 10:01, Victor Stinner
I'm writing this email to ask if this change is an issue or not to embedded Python and the Python C API. Is it still possible to call "all" functions of the C API before calling Py_Initialize()?
It isn't technically permitted to call any of them, unless their documentation specifically says that calling them before `Py_Initialize` is permitted (and that permission is only given for a select few configuration APIs in https://docs.python.org/3/c-api/init.html). While it's still PEP 432's intention to eventually expose a public multi-phase start-up API, it's *also* the case that we're not actually ready to do that yet - we're not sure we have the data model right, and we don't want to commit to a supported API until that's resolved. So for Python 3.7, I'd suggest pursuing one of the following options: 1. Add a variant of Py_DecodeLocale that accepts a memory allocation function directly and reports back both the allocated pointer and its size (allowing the calling program to manage that memory); or 2. Offer a new `Py_SetProgramNameFromString` API that accepts a `char *` directly. That way, CPython can take care of lazily decoding it after the decoding machinery has been fully set up, rather than expecting the embedding application to always do it; (While we could also make the promise that PyMem_RawMalloc and Py_DecodeLocale will be callable before Py_Initialize, I don't think we're far enough into the startup refactoring process to be making those kinds of promises). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
18.11.17 16:17, Nick Coghlan пише:
On 18 November 2017 at 10:01, Victor Stinner
wrote: I'm writing this email to ask if this change is an issue or not to embedded Python and the Python C API. Is it still possible to call "all" functions of the C API before calling Py_Initialize()?
It isn't technically permitted to call any of them, unless their documentation specifically says that calling them before `Py_Initialize` is permitted (and that permission is only given for a select few configuration APIs in https://docs.python.org/3/c-api/init.html).
The Py_Initialize() is not complete. It mentions only Py_SetProgramName(), Py_SetPythonHome() and Py_SetPath(). But in other places it is documented that Py_SetStandardStreamEncoding(), PyImport_AppendInittab(), PyImport_ExtendInittab() should be called before Py_Initialize(). And the embedding examples call Py_DecodeLocale() before Py_Initialize(). PyMem_RawMalloc(), PyMem_RawFree() and PyInitFrozenExtensions() are called before Py_Initialize() in Py_FrozenMain(). Also these functions call _PyMem_RawStrdup(). Hence, the minimal set of functions that can be called before Py_Initialize() is: * Py_SetProgramName() * Py_SetPythonHome() * Py_SetPath() * Py_SetStandardStreamEncoding() * PyImport_AppendInittab() * PyImport_ExtendInittab() * Py_DecodeLocale() * PyMem_RawMalloc() * PyMem_RawFree() * PyInitFrozenExtensions()
On 19 November 2017 at 01:45, Serhiy Storchaka
18.11.17 16:17, Nick Coghlan пише:
On 18 November 2017 at 10:01, Victor Stinner
wrote: I'm writing this email to ask if this change is an issue or not to embedded Python and the Python C API. Is it still possible to call "all" functions of the C API before calling Py_Initialize()?
It isn't technically permitted to call any of them, unless their documentation specifically says that calling them before `Py_Initialize` is permitted (and that permission is only given for a select few configuration APIs in https://docs.python.org/3/c-api/init.html).
The Py_Initialize() is not complete. It mentions only Py_SetProgramName(), Py_SetPythonHome() and Py_SetPath(). But in other places it is documented that Py_SetStandardStreamEncoding(), PyImport_AppendInittab(), PyImport_ExtendInittab() should be called before Py_Initialize(). And the embedding examples call Py_DecodeLocale() before Py_Initialize(). PyMem_RawMalloc(), PyMem_RawFree() and PyInitFrozenExtensions() are called before Py_Initialize() in Py_FrozenMain(). Also these functions call _PyMem_RawStrdup().
Hence, the minimal set of functions that can be called before Py_Initialize() is:
* Py_SetProgramName() * Py_SetPythonHome() * Py_SetPath() * Py_SetStandardStreamEncoding() * PyImport_AppendInittab() * PyImport_ExtendInittab() * Py_DecodeLocale() * PyMem_RawMalloc() * PyMem_RawFree() * PyInitFrozenExtensions()
OK, in that case I think the answer to Victor's question is: 1. Breaking calling Py_DecodeLocale() before calling Py_Initialize() is a compatibility break with the API implied by our own usage examples, and we'll need to revert the breakage for 3.7, and ensure at least one release's worth of DeprecationWarning before requiring either the use of an alternative API (where the caller controls the memory management), or else a new lower level pre-initialization API (i.e. making `PyRuntime_Initialize` a public API) 2. We should provide a consolidated list of these functions in the C API initialization docs 3. We should add more test cases to _testembed.c that ensure they all work correctly prior to Py_Initialize (some of them are already tested there, but definitely not all of them) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
19.11.17 04:17, Nick Coghlan пише:
1. Breaking calling Py_DecodeLocale() before calling Py_Initialize() is a compatibility break with the API implied by our own usage examples, and we'll need to revert the breakage for 3.7, and ensure at least one release's worth of DeprecationWarning before requiring either the use of an alternative API (where the caller controls the memory management), or else a new lower level pre-initialization API (i.e. making `PyRuntime_Initialize` a public API)
There is a way to to control the memory manager. The caller should just define their own PyMem_RawMalloc(), PyMem_RawFree(), etc. It seems to me that the reasons of introducing these functions were: 1. Get around the implementation detail when malloc(0) could return NULL. PyMem_RawMalloc() always should return an unique address (unless error). 2. Allow the caller to control the memory management by providing their own implementations. Let use existing possibilities and not expand the API. I don't think the deprecation and breaking compatibility are needed here.
Maybe we can find a compromise: revert the change on memory allocators.
They are too special to require to call PyRuntime_Init().
Currently, you cannot call PyMem_SetAllocators() before PyRuntime_Init().
Victor
Le 19 nov. 2017 08:55, "Serhiy Storchaka"
19.11.17 04:17, Nick Coghlan пише:
1. Breaking calling Py_DecodeLocale() before calling Py_Initialize() is a compatibility break with the API implied by our own usage examples, and we'll need to revert the breakage for 3.7, and ensure at least one release's worth of DeprecationWarning before requiring either the use of an alternative API (where the caller controls the memory management), or else a new lower level pre-initialization API (i.e. making `PyRuntime_Initialize` a public API)
There is a way to to control the memory manager. The caller should just define their own PyMem_RawMalloc(), PyMem_RawFree(), etc. It seems to me that the reasons of introducing these functions were:
1. Get around the implementation detail when malloc(0) could return NULL. PyMem_RawMalloc() always should return an unique address (unless error).
2. Allow the caller to control the memory management by providing their own implementations.
Let use existing possibilities and not expand the API. I don't think the deprecation and breaking compatibility are needed here.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor. stinner%40gmail.com
On 19 November 2017 at 18:52, Victor Stinner
Maybe we can find a compromise: revert the change on memory allocators. They are too special to require to call PyRuntime_Init().
Currently, you cannot call PyMem_SetAllocators() before PyRuntime_Init().
At least the raw allocators, anyway - that way, the developer facing documentation/comments can just say that the raw allocators can't have any prerequisites that aren't shared by regular malloc/calloc/realloc/free calls. If that's enough to get Py_DecodeLocale working again prior to _PyRuntime_Init(), then I'd suggest officially adding that to the "must work prior to Py_Initialize" list, otherwise we can re-examine it based on whatever's still broken after reverting the raw allocator changes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
To not lost track of the issue, I created this issue on the bpo:
https://bugs.python.org/issue32086
Victor
2017-11-20 7:54 GMT+01:00 Nick Coghlan
On 19 November 2017 at 18:52, Victor Stinner
wrote: Maybe we can find a compromise: revert the change on memory allocators. They are too special to require to call PyRuntime_Init().
Currently, you cannot call PyMem_SetAllocators() before PyRuntime_Init().
At least the raw allocators, anyway - that way, the developer facing documentation/comments can just say that the raw allocators can't have any prerequisites that aren't shared by regular malloc/calloc/realloc/free calls.
If that's enough to get Py_DecodeLocale working again prior to _PyRuntime_Init(), then I'd suggest officially adding that to the "must work prior to Py_Initialize" list, otherwise we can re-examine it based on whatever's still broken after reverting the raw allocator changes.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Nov 18, 2017 19:20, "Nick Coghlan"
2017-11-20 16:31 GMT+01:00 Eric Snow
That Py_DecodeLocale() can use PyMem_RawMalloc() pre-init is an implementation detail.
Py_DecodeLocale() uses PyMem_RawMalloc(), and so its result must be freed by PyMem_RawFree(). It's part the documentation. I'm not sure that I understood correctly. Do you agree to move "PyMem" globals back to Objects/obmalloc.c? (to allow to call PyMem_RawMalloc() before Py_Initialize()) Victor
On Mon, Nov 20, 2017 at 8:43 AM, Victor Stinner
2017-11-20 16:31 GMT+01:00 Eric Snow
: That Py_DecodeLocale() can use PyMem_RawMalloc() pre-init is an implementation detail.
Py_DecodeLocale() uses PyMem_RawMalloc(), and so its result must be freed by PyMem_RawFree(). It's part the documentation.
Ah, I'd missed that. Thanks for pointing it out.
I'm not sure that I understood correctly. Do you agree to move "PyMem" globals back to Objects/obmalloc.c? (to allow to call PyMem_RawMalloc() before Py_Initialize())
I'm okay with that if we can't find another way. However, shouldn't we be able to statically initialize the raw allocator in _PyRuntime, much as we were doing before in obmalloc.c? I have a rough PR up: https://github.com/python/cpython/pull/4481 Also, I opened https://bugs.python.org/issue32096 for the regression. Thanks for bringing it up. -eric
2017-11-20 22:35 GMT+01:00 Eric Snow
I'm okay with that if we can't find another way. However, shouldn't we be able to statically initialize the raw allocator in _PyRuntime, much as we were doing before in obmalloc.c? I have a rough PR up:
https://github.com/python/cpython/pull/4481
Also, I opened https://bugs.python.org/issue32096 for the regression. Thanks for bringing it up.
To statically initialize PyMemAllocatorEx fields, you need to export a lot of allocator functions. I would prefer to not do that. static void* _PyMem_DebugRawMalloc(void *ctx, size_t size); static void* _PyMem_DebugRawCalloc(void *ctx, size_t nelem, size_t elsize); static void* _PyMem_DebugRawRealloc(void *ctx, void *ptr, size_t size); static void _PyMem_DebugRawFree(void *ctx, void *ptr); static void* _PyMem_DebugMalloc(void *ctx, size_t size); static void* _PyMem_DebugCalloc(void *ctx, size_t nelem, size_t elsize); static void* _PyMem_DebugRealloc(void *ctx, void *ptr, size_t size); static void _PyMem_DebugFree(void *ctx, void *p); static void* _PyObject_Malloc(void *ctx, size_t size); static void* _PyObject_Calloc(void *ctx, size_t nelem, size_t elsize); static void _PyObject_Free(void *ctx, void *p); static void* _PyObject_Realloc(void *ctx, void *ptr, size_t size); The rules to choose the allocator to each domain are also complex depending if pymalloc is enabled, debug hooks are enabled by default, etc. The memory allocator is also linked to _PyMem_Debug which is not currently in Include/internals/ but Objects/obmalloc.c. I understand that moving global variables to _PyRuntime helps to clarify how these variables are initialized and then finalized, but memory allocators are a complex corner case. main(), Py_Main() and _PyRuntime_Initialize() now have to change temporary the allocators to make sure that their initialization and finalization use the same allocator. I prefer to revert the change on memory allocators, and retry later to fix it, once other initializations issues are fixed ;-) Victor
On Mon, Nov 20, 2017 at 3:03 PM, Victor Stinner
To statically initialize PyMemAllocatorEx fields, you need to export a lot of allocator functions. I would prefer to not do that.
[snip]
The rules to choose the allocator to each domain are also complex depending if pymalloc is enabled, debug hooks are enabled by default, etc. The memory allocator is also linked to _PyMem_Debug which is not currently in Include/internals/ but Objects/obmalloc.c.
I'm not suggesting supporting the full machinery. Rather, as my PR demonstrates, we can statically initialize the minimum needed to support pre-init use of PyMem_RawMalloc() and PyMem_RawFree(). The allocators will be fully initialized once the runtime is initialized (i.e. once Py_Initialize() is called), just as they are now. FWIW, I'm not sure that's the best approach. See my notes in https://bugs.python.org/issue32096.
I understand that moving global variables to _PyRuntime helps to clarify how these variables are initialized and then finalized, but memory allocators are a complex corner case.
Agreed. I spent a large portion of my time getting the allocators right when working on the original _PyRuntime patch. It's tricky code. -eric
2017-11-21 16:57 GMT+01:00 Eric Snow
I understand that moving global variables to _PyRuntime helps to clarify how these variables are initialized and then finalized, but memory allocators are a complex corner case.
Agreed. I spent a large portion of my time getting the allocators right when working on the original _PyRuntime patch. It's tricky code.
Oh, I forgot to notify you: when I worked on Py_Main(), I got crashes because PyMem_RawMalloc() wasn't usable before calling Py_Initialize(). This is what I call a regresion, and that's why I started this thread :-) I fixed the issue by calling _PyRuntime_Initialize() as the very first function in main(). I also had to add _PyMem_GetDefaultRawAllocator() to get a deterministic memory allocator, rather than depending on the allocator set an application embedding Python, we must be sure that the same allocator is used to initialize and finalize Python. Victor
On Wed, 22 Nov 2017 10:38:32 +0100
Victor Stinner
I fixed the issue by calling _PyRuntime_Initialize() as the very first function in main().
I also had to add _PyMem_GetDefaultRawAllocator() to get a deterministic memory allocator, rather than depending on the allocator set an application embedding Python, we must be sure that the same allocator is used to initialize and finalize Python.
This is a bit worrying. Do Python embedders have to go through the same dance? IMHO this really needs a simple solution documented somewhere. Also, hopefully when you do the wrong thing, you get a clear error message to know how to fix your code? Regards Antoine.
2017-11-22 12:04 GMT+01:00 Antoine Pitrou
IMHO this really needs a simple solution documented somewhere. Also, hopefully when you do the wrong thing, you get a clear error message to know how to fix your code?
Right now, calling PyMem_RawMalloc() before calling _PyRuntime_Initialize() calls the function at address NULL, so you get a segmentation fault. Documenting the new requirements is part of the discussion, it's one option how to fix this issue. Victor
On Wed, 22 Nov 2017 12:12:32 +0100
Victor Stinner
2017-11-22 12:04 GMT+01:00 Antoine Pitrou
: IMHO this really needs a simple solution documented somewhere. Also, hopefully when you do the wrong thing, you get a clear error message to know how to fix your code?
Right now, calling PyMem_RawMalloc() before calling _PyRuntime_Initialize() calls the function at address NULL, so you get a segmentation fault.
Can we get something more readable? For example: FATAL ERROR: PyMem_RawMalloc(): malloc function is NULL, did you call _PyRuntime_Initialize? Regards Antoine.
On 22 November 2017 at 21:12, Victor Stinner
2017-11-22 12:04 GMT+01:00 Antoine Pitrou
: IMHO this really needs a simple solution documented somewhere. Also, hopefully when you do the wrong thing, you get a clear error message to know how to fix your code?
Right now, calling PyMem_RawMalloc() before calling _PyRuntime_Initialize() calls the function at address NULL, so you get a segmentation fault.
Documenting the new requirements is part of the discussion, it's one option how to fix this issue.
My own recommendation is that we add Eric's new test case to the embedding test suite and just make sure it works: wchar_t *program = Py_DecodeLocale("spam", NULL); Py_SetProgramName(program); Py_Initialize(); Py_Finalize(); PyMem_RawFree(program); It does place some additional constraints on us in terms of handling static initialization of the allocator state, and ensuring we revert back to that state in Py_Finalize, but I think it's the only way we're going to be able to reliably replace all calls to malloc & free with PyMem_RawMalloc and PyMem_RawFree without causing weird problems. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 21 November 2017 at 01:31, Eric Snow
On Nov 18, 2017 19:20, "Nick Coghlan"
wrote: OK, in that case I think the answer to Victor's question is:
1. Breaking calling Py_DecodeLocale() before calling Py_Initialize() is a compatibility break with the API implied by our own usage examples, and we'll need to revert the breakage for 3.7,
+1
The break was certainly unintentional. :/ Fortunately, Py_DecodeLocale() should be the only "Process-wide parameter" needing repair. I suppose, PyMem_RawMalloc() and PyMem_RawFree() *could* be considered too, but my understanding is that they aren't really intended for direct use (especially pre-init).
PyMem_RawFree will need to continue working pre-initialize as well, since it's the specified cleanup function for Py_DecodeLocale. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 18.11.2017 01:01, Victor Stinner wrote:
Hi,
The CPython internals evolved during Python 3.7 cycle. I would like to know if we broke the C API or not.
Nick Coghlan and Eric Snow are working on cleaning up the Python initialization with the "on going" PEP 432: https://www.python.org/dev/peps/pep-0432/
Many global variables used by the "Python runtime" were move to a new single "_PyRuntime" variable (big structure made of sub-structures). See Include/internal/pystate.h.
A side effect of moving variables from random files into header files is that it's not more possible to fully initialize _PyRuntime at "compilation time". For example, previously, it was possible to refer to local C function (functions declared with "static", so only visible in the current file). Now a new "initialization function" is required to must be called.
In short, it means that using the "Python runtime" before it's initialized by _PyRuntime_Initialize() is now likely to crash. For example, calling PyMem_RawMalloc(), before calling _PyRuntime_Initialize(), now calls the function NULL: dereference a NULL pointer, and so immediately crash with a segmentation fault.
To prevent a complete crash, would it be possible to initialize the struct entries to a generic function (or set of such functions with the right signatures), which then issue a message to stderr hinting to the missing call to _PyRuntime_Initialize() before terminating ?
I'm writing this email to ask if this change is an issue or not to embedded Python and the Python C API. Is it still possible to call "all" functions of the C API before calling Py_Initialize()?
I was bitten by the bug while reworking the Py_Main() function to split it into subfunctions and cleanup the code to handle the command line arguments and environment variables. I fixed the issue in main() by calling _PyRuntime_Initialize() as soon as possible: it's now the first instruction of main() :-) (See Programs/python.c)
To give a more concrete example: Py_DecodeLocale() is the recommanded function to decode bytes from the operating system, but this function calls PyMem_RawMalloc() which does crash before _PyRuntime_Initialize() is called. Is Py_DecodeLocale() used to initialize Python?
For example, "void Py_SetProgramName(wchar_t *);" expects a text string, whereas main() gives argv as bytes. Calling Py_SetProgramName() from argv requires to decode bytes... So use Py_DecodeLocale()...
Should we do something in Py_DecodeLocale()? Maybe crash if _PyRuntime_Initialize() wasn't called yet?
Maybe, the minimum change is to expose _PyRuntime_Initialize() in the public C API?
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/mal%40egenix.com
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Nov 23 2017)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
On Thu, 23 Nov 2017 10:37:59 +0100
"M.-A. Lemburg"
On 18.11.2017 01:01, Victor Stinner wrote:
Hi,
The CPython internals evolved during Python 3.7 cycle. I would like to know if we broke the C API or not.
Nick Coghlan and Eric Snow are working on cleaning up the Python initialization with the "on going" PEP 432: https://www.python.org/dev/peps/pep-0432/
Many global variables used by the "Python runtime" were move to a new single "_PyRuntime" variable (big structure made of sub-structures). See Include/internal/pystate.h.
A side effect of moving variables from random files into header files is that it's not more possible to fully initialize _PyRuntime at "compilation time". For example, previously, it was possible to refer to local C function (functions declared with "static", so only visible in the current file). Now a new "initialization function" is required to must be called.
In short, it means that using the "Python runtime" before it's initialized by _PyRuntime_Initialize() is now likely to crash. For example, calling PyMem_RawMalloc(), before calling _PyRuntime_Initialize(), now calls the function NULL: dereference a NULL pointer, and so immediately crash with a segmentation fault.
To prevent a complete crash, would it be possible to initialize the struct entries to a generic function (or set of such functions with the right signatures), which then issue a message to stderr hinting to the missing call to _PyRuntime_Initialize() before terminating ?
+1. This sounds like a good idea. Regards Antoine.
Hi,
We are close to the 3.7a3 release and the bug is not fixed yet. I
propose to revert the changes on memory allocators right now, and take
time to design a proper fix which will respect all constraints.
https://github.com/python/cpython/pull/4532
Today, someone came to me on IRC to complain that calling
Py_DecodeLocale() does now crash on Python 3.7. He is doing tests to
embed Python on Android. Later he asks me about
PyImport_AppendInittab(), but I don't know this function. He told me
that it does crash in PyMem_Realloc()... But PyImport_AppendInittab()
must be called before Py_Initialize()...
It confirms that Python is embedded and that the C API is used before
Py_Initialize().
We don't know yet exactly how the the C API is used, which functions
are called before Py_Initialize(). Moreover, PEP 432 implementation is
still incomplete, and calling _PyRuntime_Initialize() is just not
possible, since it's a private API which is not exported...
Victor
2017-11-18 1:01 GMT+01:00 Victor Stinner
Hi,
The CPython internals evolved during Python 3.7 cycle. I would like to know if we broke the C API or not.
Nick Coghlan and Eric Snow are working on cleaning up the Python initialization with the "on going" PEP 432: https://www.python.org/dev/peps/pep-0432/
Many global variables used by the "Python runtime" were move to a new single "_PyRuntime" variable (big structure made of sub-structures). See Include/internal/pystate.h.
A side effect of moving variables from random files into header files is that it's not more possible to fully initialize _PyRuntime at "compilation time". For example, previously, it was possible to refer to local C function (functions declared with "static", so only visible in the current file). Now a new "initialization function" is required to must be called.
In short, it means that using the "Python runtime" before it's initialized by _PyRuntime_Initialize() is now likely to crash. For example, calling PyMem_RawMalloc(), before calling _PyRuntime_Initialize(), now calls the function NULL: dereference a NULL pointer, and so immediately crash with a segmentation fault.
I'm writing this email to ask if this change is an issue or not to embedded Python and the Python C API. Is it still possible to call "all" functions of the C API before calling Py_Initialize()?
I was bitten by the bug while reworking the Py_Main() function to split it into subfunctions and cleanup the code to handle the command line arguments and environment variables. I fixed the issue in main() by calling _PyRuntime_Initialize() as soon as possible: it's now the first instruction of main() :-) (See Programs/python.c)
To give a more concrete example: Py_DecodeLocale() is the recommanded function to decode bytes from the operating system, but this function calls PyMem_RawMalloc() which does crash before _PyRuntime_Initialize() is called. Is Py_DecodeLocale() used to initialize Python?
For example, "void Py_SetProgramName(wchar_t *);" expects a text string, whereas main() gives argv as bytes. Calling Py_SetProgramName() from argv requires to decode bytes... So use Py_DecodeLocale()...
Should we do something in Py_DecodeLocale()? Maybe crash if _PyRuntime_Initialize() wasn't called yet?
Maybe, the minimum change is to expose _PyRuntime_Initialize() in the public C API?
Victor
On 24 November 2017 at 09:19, Victor Stinner
Hi,
We are close to the 3.7a3 release and the bug is not fixed yet. I propose to revert the changes on memory allocators right now, and take time to design a proper fix which will respect all constraints.
https://github.com/python/cpython/pull/4532
Today, someone came to me on IRC to complain that calling Py_DecodeLocale() does now crash on Python 3.7. He is doing tests to embed Python on Android. Later he asks me about PyImport_AppendInittab(), but I don't know this function. He told me that it does crash in PyMem_Realloc()... But PyImport_AppendInittab() must be called before Py_Initialize()...
It confirms that Python is embedded and that the C API is used before Py_Initialize().
We don't know yet exactly how the the C API is used, which functions are called before Py_Initialize().
We do note some of them explicitly at https://docs.python.org/3/c-api/init.html (search for "before Py"). What we've been missing is a test case that ensures https://docs.python.org/3/extending/embedding.html#very-high-level-embedding actually works reliably (hence how we managed to break it by way of the internal state management refactoring). Once that core regression has been fixed, we can review the docs and the test suite and come up with: - a consolidated list of *all* the APIs that can safely be called before Py_Initialize - one or more new or updated test cases to ensure that any not yet tested pre-initialization APIs actually work as intended
Moreover, PEP 432 implementation is still incomplete, and calling _PyRuntime_Initialize() is just not possible, since it's a private API which is not exported...
Even after we reach the point of exposing the more fine-grained initialisation API (which I'm now thinking we may be able to do for 3.8 given Eric & Victor's work on it for 3.7), we're still going to have to ensure the existing configuration API keeps working as expected. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 24 November 2017 at 12:21, Glenn Linderman
On 11/23/2017 5:31 PM, Nick Coghlan wrote:
- a consolidated list of *all* the APIs that can safely be called before Py_Initialize
So it is interesting to know that list, of course, but the ones that are to be supported and documented might be a smaller list. Or might not.
Ah, sorry - "safely" was a bit ambiguous there. By "safely" I meant "CPython has a regression test that ensures that particular API will keep working before Py_Initialize(), regardless of any changes we may make to the way we handle interpreter initialization". We've long had a lot of other APIs that happen to work well enough for CPython itself to get away with using them during the startup process, but the official position on those is "Don't count on these APIs working prior to Py_Initialize() in the general case - we only get away with it because we can adjust the exact order in which we do things in order to account for any other changes that break it". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
I proposed a PR to explicitly list functions safe to be called before
Py_Initialize():
https://bugs.python.org/issue32124
https://github.com/python/cpython/pull/4540
I found more than 11 functions.. I also found variables ;-)
Victor
2017-11-24 5:01 GMT+01:00 Nick Coghlan
On 24 November 2017 at 12:21, Glenn Linderman
wrote: On 11/23/2017 5:31 PM, Nick Coghlan wrote:
- a consolidated list of *all* the APIs that can safely be called before Py_Initialize
So it is interesting to know that list, of course, but the ones that are to be supported and documented might be a smaller list. Or might not.
Ah, sorry - "safely" was a bit ambiguous there. By "safely" I meant "CPython has a regression test that ensures that particular API will keep working before Py_Initialize(), regardless of any changes we may make to the way we handle interpreter initialization".
We've long had a lot of other APIs that happen to work well enough for CPython itself to get away with using them during the startup process, but the official position on those is "Don't count on these APIs working prior to Py_Initialize() in the general case - we only get away with it because we can adjust the exact order in which we do things in order to account for any other changes that break it".
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.co...
24.11.17 04:21, Glenn Linderman пише:
On 11/23/2017 5:31 PM, Nick Coghlan wrote:
- a consolidated list of *all* the APIs that can safely be called before Py_Initialize So it is interesting to know that list, of course, but the ones that are to be supported and documented might be a smaller list. Or might not.
This is a small list, 11 functions.
participants (8)
-
Antoine Pitrou
-
Eric Snow
-
Glenn Linderman
-
M.-A. Lemburg
-
Nick Coghlan
-
Serhiy Storchaka
-
Steve Dower
-
Victor Stinner