Mailman 3 Trying to embed Py3.8 w frozen modules -- correct place to seek help? - capi-sig

Trying to embed Py3.8 w frozen modules -- correct place to seek help?

Katie Lucas

13 Feb 2020 13 Feb '20

8:29 a.m.

Hello. I'm trying to embed a Py3.8 interpreter into a C++ application to use as the "glue" for all the other components. We've come across some complexities -- revolving around the use of frozen modules and multiple interpreter states -- and I'm wondering if this is the correct list to seek some assistance?

We can get a single interpreter running. We can add our stream objects and capture the output, we can load our modules (giving the API access) into it and generally everything seems fairly happy in single interp mode.

When we try and run a Py_NewInterpreter() to get us more interps to get encapsulation between multiple scripts, it fails to launch (fails to get filesystem's codec). The problem seems to be that the frozen modules added to the launched interpreter are not cached after use. So the cloned interpreter fails to start (it wants either an encoder search path which is null for no filesystem OR to have the encodings module available). The modules don't seem to be being cached because when the import runs them with PyEval_EvalCode, they return nothing.

I suspect this is something I've missed about how to configure the original interpreter, but documentation & use-examples for the new split-phase initialisation are rather hard to find.

Cheers for any help you can provide, Katie

Show replies by date

Eric Snow

15 Feb 15 Feb

3:37 a.m.

Hi Katie,

This is probably as good a list for this as any. :)

On Thu, Feb 13, 2020, 01:31 Katie Lucas via capi-sig <capi-sig@python.org> wrote:

...

Hello. I'm trying to embed a Py3.8 interpreter into a C++ application to use as the "glue" for all the other components. We've come across some complexities -- revolving around the use of frozen modules and multiple interpreter states

Oh, that sounds quite interesting! What modules are you freezing?

We can get a single interpreter running. We can add our stream objects and

...

capture the output, we can load our modules (giving the API access) into it and generally everything seems fairly happy in single interp mode.

When we try and run a Py_NewInterpreter() to get us more interps to get encapsulation between multiple scripts, it fails to launch (fails to get filesystem's codec).

Is there some code you could point us at?

The problem seems to be that the frozen modules added

...

to the launched interpreter are not cached after use. So the cloned interpreter fails to start (it wants either an encoder search path which is null for no filesystem OR to have the encodings module available). The modules don't seem to be being cached because when the import runs them with PyEval_EvalCode, they return nothing.

I suspect this is something I've missed about how to configure the original interpreter, but documentation & use-examples for the new split-phase initialisation are rather hard to find.

It may help you to look at how CPython freezes importlib.

-eric

...

M.-A. Lemburg

17 Feb 17 Feb

10:28 a.m.

Hi Katie,

frozen modules have their byte code converted to C arrays, so they don't need to be cached by Python. Instead, the OS transparently maps these arrays into memory when loading the binary (or shared modules).

I guess the experts on multiple interpreter loading will need to help out here and perhaps also Victor and Nick, who have (IIRC) worked on the staged module imports. I've put them on CC.

Thanks,

Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Experts (#1, Feb 17 2020)

...

...
...
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/

::: We implement business ideas - efficiently in both time and costs :::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On 15.02.2020 04:37, Eric Snow wrote:

...

Hi Katie,

This is probably as good a list for this as any. :)

On Thu, Feb 13, 2020, 01:31 Katie Lucas via capi-sig <capi-sig@python.org> wrote:

...
Hello. I'm trying to embed a Py3.8 interpreter into a C++ application to use as the "glue" for all the other components. We've come across some complexities -- revolving around the use of frozen modules and multiple interpreter states

Oh, that sounds quite interesting! What modules are you freezing?

We can get a single interpreter running. We can add our stream objects and

...
capture the output, we can load our modules (giving the API access) into it and generally everything seems fairly happy in single interp mode.

When we try and run a Py_NewInterpreter() to get us more interps to get encapsulation between multiple scripts, it fails to launch (fails to get filesystem's codec).

Is there some code you could point us at?

The problem seems to be that the frozen modules added

...
to the launched interpreter are not cached after use. So the cloned interpreter fails to start (it wants either an encoder search path which is null for no filesystem OR to have the encodings module available). The modules don't seem to be being cached because when the import runs them with PyEval_EvalCode, they return nothing.

I suspect this is something I've missed about how to configure the original interpreter, but documentation & use-examples for the new split-phase initialisation are rather hard to find.

It may help you to look at how CPython freezes importlib.

-eric

...

capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org

Katie Lucas

18 Feb 18 Feb

8:38 a.m.

"What modules are you freezing?"

Quite a lot of them -- our plan was to make the system as disk-less as possible. The troublesome ones (currently) are the encoders.

We're also missing things like stdout, but actually that's OK, because I've replaced them with a device which captures the output which needed doing anyway.

"transparently maps these arrays into memory when loading the binary"

Yes, I've got that bit working. A single interpreter comes up successfully and we can execute simple test workloads sequentially.

The problem is that when I start a new interpreter (so that we can execute workloads in parallel), and while initialising it tries to load filesystem codecs based on the configuration data from the original interp... which it then can't find because it's looking for them as modules not frozen modules. There's no split-stage on cloning the interpreter to perform the ImportFrozen in.

"Is there some code you could point us at?"

There can be... I have to set up a new repo and clone some bits because the lawyers won't let me make this actual repo public due to IP restrictions.

On Mon, 17 Feb 2020 at 10:28, M.-A. Lemburg <mal@egenix.com> wrote:

...

Hi Katie,

frozen modules have their byte code converted to C arrays, so they don't need to be cached by Python. Instead, the OS transparently maps these arrays into memory when loading the binary (or shared modules).

I guess the experts on multiple interpreter loading will need to help out here and perhaps also Victor and Nick, who have (IIRC) worked on the staged module imports. I've put them on CC.

Thanks,

Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Experts (#1, Feb 17 2020)

...
...
...
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/

::: We implement business ideas - efficiently in both time and costs :::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On 15.02.2020 04:37, Eric Snow wrote:

...
Hi Katie,

This is probably as good a list for this as any. :)

On Thu, Feb 13, 2020, 01:31 Katie Lucas via capi-sig < capi-sig@python.org> wrote:

...
Hello. I'm trying to embed a Py3.8 interpreter into a C++ application to use as the "glue" for all the other components. We've come across some complexities -- revolving around the use of frozen modules and multiple interpreter states

Oh, that sounds quite interesting! What modules are you freezing?

We can get a single interpreter running. We can add our stream objects and

...
capture the output, we can load our modules (giving the API access) into it and generally everything seems fairly happy in single interp mode.

When we try and run a Py_NewInterpreter() to get us more interps to get encapsulation between multiple scripts, it fails to launch (fails to get filesystem's codec).

Is there some code you could point us at?

The problem seems to be that the frozen modules added

...
to the launched interpreter are not cached after use. So the cloned interpreter fails to start (it wants either an encoder search path which is null for no filesystem OR to have the encodings module available). The modules don't seem to be being cached because when the import runs them with PyEval_EvalCode, they return nothing.

I suspect this is something I've missed about how to configure the original interpreter, but documentation & use-examples for the new split-phase initialisation are rather hard to find.

It may help you to look at how CPython freezes importlib.

-eric

...

capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org

Katie Lucas

3:55 p.m.

I've pulled out what I hope is enough code to be legible and the main bit is at https://github.com/engineerkatie/infra-ml2/blob/master/python_engine/python_...

What we have is a core object which acts as a factory and should make evaluation contexts, in which we can run Python code. We'd like each of them to run their own interpreter context (hopefully so we can then multithread these).

So the core should set up all the basic stuff and this seems to work.

Creating the context works as well and we can run the tasks in parallel.

When we add in line 209 which calls Py_NewInterpreter(), that fails with: "failed to get the Python codec of the filesystem encoding" having done a PyImport_ImportModule which returns null.

I've been tracing this through the Python code and it seems that when PyImport_ImportFrozenModule imports encodings originally (in the core setup), it gets the module code, runs it with exec_code_in_module. The result is a null and therefore in import.c at ~L990, remove_module() is called and it's not in the PyCodecRegistry for later.

I have a suspicion that my actual setup process with the frozen modules is wrong-headed to start with but I was trying to adapt the code from the various versions of lifecycle in the Py38 code in lieu of finding any real examples.

Our real basic requirement is: we have a pile of Python tasks. We have a pile of threads waiting to do work. We want to throw the work at the threads and have them execute it in parallel, in isolated interpreters, without an on-disk execution environment. Data I/O will be provided by an interface object I want to inject into the tasks at startup -- I'm doing the same to capture stdout.

(When this DOES actually work, I'm happy to publish it as open source for other people to use so they don't have to do the same thing, the lawyers are looking at all the machine-learning stuff and working Python incantations aren't really in their remit)

Thank you for volunteering to look at this -- I apologise in advance for the state of the code. It's been a bit hacky getting even this far.

On Tue, 18 Feb 2020 at 08:38, Katie Lucas <katie.lucas@fetch.ai> wrote:

...

"What modules are you freezing?"

Quite a lot of them -- our plan was to make the system as disk-less as possible. The troublesome ones (currently) are the encoders.

We're also missing things like stdout, but actually that's OK, because I've replaced them with a device which captures the output which needed doing anyway.

"transparently maps these arrays into memory when loading the binary"

Yes, I've got that bit working. A single interpreter comes up successfully and we can execute simple test workloads sequentially.

The problem is that when I start a new interpreter (so that we can execute workloads in parallel), and while initialising it tries to load filesystem codecs based on the configuration data from the original interp... which it then can't find because it's looking for them as modules not frozen modules. There's no split-stage on cloning the interpreter to perform the ImportFrozen in.

"Is there some code you could point us at?"

There can be... I have to set up a new repo and clone some bits because the lawyers won't let me make this actual repo public due to IP restrictions.

On Mon, 17 Feb 2020 at 10:28, M.-A. Lemburg <mal@egenix.com> wrote:

...
Hi Katie,

frozen modules have their byte code converted to C arrays, so they don't need to be cached by Python. Instead, the OS transparently maps these arrays into memory when loading the binary (or shared modules).

I guess the experts on multiple interpreter loading will need to help out here and perhaps also Victor and Nick, who have (IIRC) worked on the staged module imports. I've put them on CC.

Thanks,

Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Experts (#1, Feb 17 2020)

...
...
...
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/

::: We implement business ideas - efficiently in both time and costs :::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On 15.02.2020 04:37, Eric Snow wrote:

...
Hi Katie,

This is probably as good a list for this as any. :)

On Thu, Feb 13, 2020, 01:31 Katie Lucas via capi-sig < capi-sig@python.org> wrote:

...
Hello. I'm trying to embed a Py3.8 interpreter into a C++ application to use as the "glue" for all the other components. We've come across some complexities -- revolving around the use of frozen modules and multiple interpreter states

Oh, that sounds quite interesting! What modules are you freezing?

We can get a single interpreter running. We can add our stream objects and

...
capture the output, we can load our modules (giving the API access) into it and generally everything seems fairly happy in single interp mode.

When we try and run a Py_NewInterpreter() to get us more interps to get encapsulation between multiple scripts, it fails to launch (fails to get filesystem's codec).

Is there some code you could point us at?

The problem seems to be that the frozen modules added

...
to the launched interpreter are not cached after use. So the cloned interpreter fails to start (it wants either an encoder search path which is null for no filesystem OR to have the encodings module available). The modules don't seem to be being cached because when the import runs them with PyEval_EvalCode, they return nothing.

I suspect this is something I've missed about how to configure the original interpreter, but documentation & use-examples for the new split-phase initialisation are rather hard to find.

It may help you to look at how CPython freezes importlib.

-eric

...

capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org

Petr Viktorin

19 Feb 19 Feb

1:51 p.m.

Hello, Before I find time to dive into C++ and boost, I'd like to make sure you know the caveats:

In current CPython, all interpreters share one global interpreter lock. Subinterpreters won't give you performance improvements over threading.
Many extension modules have poor support for subinterpreters. If you use complex machine-learning libraries with subinterpreters, you might need to spend significant time debugging and improving them. (In particular, a module compiled with Cython can currently only be loaded in a single interpreter in a process.)

There are efforts to change the situation. You're very welcome to help with error reporting, debugging and improving subinterpreter support, and this list is the perfect place for that. But if you need a production-ready system soon, threading or multiprocessing will be a better choice.

On 2020-02-18 16:55, Katie Lucas via capi-sig wrote:

...

I've pulled out what I hope is enough code to be legible and the main bit is at https://github.com/engineerkatie/infra-ml2/blob/master/python_engine/python_...

What we have is a core object which acts as a factory and should make evaluation contexts, in which we can run Python code. We'd like each of them to run their own interpreter context (hopefully so we can then multithread these).

So the core should set up all the basic stuff and this seems to work.

Creating the context works as well and we can run the tasks in parallel.

When we add in line 209 which calls Py_NewInterpreter(), that fails with: "failed to get the Python codec of the filesystem encoding" having done a PyImport_ImportModule which returns null.

I've been tracing this through the Python code and it seems that when PyImport_ImportFrozenModule imports encodings originally (in the core setup), it gets the module code, runs it with exec_code_in_module. The result is a null and therefore in import.c at ~L990, remove_module() is called and it's not in the PyCodecRegistry for later.

I have a suspicion that my actual setup process with the frozen modules is wrong-headed to start with but I was trying to adapt the code from the various versions of lifecycle in the Py38 code in lieu of finding any real examples.

Our real basic requirement is: we have a pile of Python tasks. We have a pile of threads waiting to do work. We want to throw the work at the threads and have them execute it in parallel, in isolated interpreters, without an on-disk execution environment. Data I/O will be provided by an interface object I want to inject into the tasks at startup -- I'm doing the same to capture stdout.

(When this DOES actually work, I'm happy to publish it as open source for other people to use so they don't have to do the same thing, the lawyers are looking at all the machine-learning stuff and working Python incantations aren't really in their remit)

Thank you for volunteering to look at this -- I apologise in advance for the state of the code. It's been a bit hacky getting even this far.

On Tue, 18 Feb 2020 at 08:38, Katie Lucas <katie.lucas@fetch.ai> wrote:

...
"What modules are you freezing?"

Quite a lot of them -- our plan was to make the system as disk-less as possible. The troublesome ones (currently) are the encoders.

We're also missing things like stdout, but actually that's OK, because I've replaced them with a device which captures the output which needed doing anyway.

"transparently maps these arrays into memory when loading the binary"

Yes, I've got that bit working. A single interpreter comes up successfully and we can execute simple test workloads sequentially.

The problem is that when I start a new interpreter (so that we can execute workloads in parallel), and while initialising it tries to load filesystem codecs based on the configuration data from the original interp... which it then can't find because it's looking for them as modules not frozen modules. There's no split-stage on cloning the interpreter to perform the ImportFrozen in.

"Is there some code you could point us at?"

There can be... I have to set up a new repo and clone some bits because the lawyers won't let me make this actual repo public due to IP restrictions.

On Mon, 17 Feb 2020 at 10:28, M.-A. Lemburg <mal@egenix.com> wrote:

...
Hi Katie,

frozen modules have their byte code converted to C arrays, so they don't need to be cached by Python. Instead, the OS transparently maps these arrays into memory when loading the binary (or shared modules).

I guess the experts on multiple interpreter loading will need to help out here and perhaps also Victor and Nick, who have (IIRC) worked on the staged module imports. I've put them on CC.

Thanks,

Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Experts (#1, Feb 17 2020)

...
...
...
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/

::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
        Registered at Amtsgericht Duesseldorf: HRB 46611
            https://www.egenix.com/company/contact/
                  https://www.malemburg.com/
On 15.02.2020 04:37, Eric Snow wrote:

...
Hi Katie,

This is probably as good a list for this as any. :)

On Thu, Feb 13, 2020, 01:31 Katie Lucas via capi-sig < capi-sig@python.org> wrote:

...
Hello. I'm trying to embed a Py3.8 interpreter into a C++ application to use as the "glue" for all the other components. We've come across some complexities -- revolving around the use of frozen modules and multiple interpreter states

Oh, that sounds quite interesting! What modules are you freezing?

We can get a single interpreter running. We can add our stream objects and

...
capture the output, we can load our modules (giving the API access) into it and generally everything seems fairly happy in single interp mode.

When we try and run a Py_NewInterpreter() to get us more interps to get encapsulation between multiple scripts, it fails to launch (fails to get filesystem's codec).

Is there some code you could point us at?

The problem seems to be that the frozen modules added

...
to the launched interpreter are not cached after use. So the cloned interpreter fails to start (it wants either an encoder search path which is null for no filesystem OR to have the encodings module available). The modules don't seem to be being cached because when the import runs them with PyEval_EvalCode, they return nothing.

I suspect this is something I've missed about how to configure the original interpreter, but documentation & use-examples for the new split-phase initialisation are rather hard to find.

It may help you to look at how CPython freezes importlib.

-eric

...

capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org
capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org

Nick Coghlan

23 Feb 23 Feb

2:05 p.m.

On Wed, 19 Feb 2020 at 01:56, Katie Lucas <katie.lucas@fetch.ai> wrote:

...

I've been tracing this through the Python code and it seems that when PyImport_ImportFrozenModule imports encodings originally (in the core setup), it gets the module code, runs it with exec_code_in_module. The result is a null and therefore in import.c at ~L990, remove_module() is called and it's not in the PyCodecRegistry for later.

While Petr is unfortunately correct that subinterpreters probably aren't mature enough today to deliver the outcome you're looking for, we'd still expect at least this initial setup part to be working (we just wouldn't expect you to get true parallel execution with the current state of the implementation - we'd expect you to end up blocking on the GIL, and hence still ending up restricted to one core despite the use of multiple subinterpreters. While having want you're trying to do work the way you expect it to work is an active design goal, I think we're still at least a few releases away from achieving it).

One possibility that *could* potentially work for you today is the Cython "nogil" section marker, where code that is able to compile down to pure C library calls can be told to release the GIL, and hence spread out across all available CPUs. Failing that, the new shared memory support in the multiprocessing module is designed to allow the use of multiple processes to get around the GIL, without the overhead of having to serialise potentially large datasets to pass information between processes (potentially in combination with pickle protocol 5's out-of-band buffer support).

For your reported start-up problem, though, looking through the code and your list of frozen modules at https://github.com/engineerkatie/infra-ml2/blob/master/python_engine/frozen_..., my best guesses as to potential sources of problems:

encodings is being given a positive size, so it will be processed as a module rather than as a package (make the size negative to say "this can have frozen submodules" [1])
encodings.aliases isn't being frozen, but the encodings __init__ contains a "from . import aliases" statement (I'm guessing this is where your exec_code_in_module is failing)
encodings.latin_1 isn't being frozen, so that eager import will fail later on (both it and utf-8 get imported eagerly during interpreter initialization)

Cheers, Nick.

[1] I have no idea if that is actually documented anywhere, as I got it from the comments in https://github.com/python/cpython/blob/3.8/Python/frozen.c#L31

P.S. An aspect of this that looks like it has potentially regressed since Python 3.6 is that whereas Py_NewInterpreter used to consistently print the exception and return NULL on failure [2], there are now several cases where we will instead generate an initialisation status code internally, and exit immediately [3] (even though the parent interpreter is presumably still in a valid state). However, that shouldn't matter too much in your case, as you should at least still be getting the full chained tracebacks for the subinterpreter initialisation failures.

[2] https://github.com/python/cpython/blob/3.6/Python/pylifecycle.c#L845 [3] https://github.com/python/cpython/blob/3.8/Python/pylifecycle.c#L1562

Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Victor Stinner

29 Feb 29 Feb

10:08 p.m.

Le dim. 23 févr. 2020 à 15:06, Nick Coghlan <ncoghlan@gmail.com> a écrit :

...

P.S. An aspect of this that looks like it has potentially regressed since Python 3.6 is that whereas Py_NewInterpreter used to consistently print the exception and return NULL on failure [2], there are now several cases where we will instead generate an initialisation status code internally, and exit immediately [3] (even though the parent interpreter is presumably still in a valid state).

Py_NewInterpreter() still displays an error and returns NULL when it can.

Extract of Py_NewInterpreter() code of the master branch:

PyStatus status = new_interpreter(&tstate);
if (_PyStatus_EXCEPTION(status)) {
    Py_ExitStatusException(status);
}

Py_ExitStatusException() is only called when new_interpreter() fails to handle a regular Python exception. I added PyStatus at part of the PEP 587 implementation.

In Python 3.6, *many* helper function called by Py_NewInterpreter() called instead Py_FatalError()... which also exits immediately the process. Example:

static void import_init(PyInterpreterState *interp, PyObject *sysmod) { ... /* Import _importlib through its frozen version, _frozen_importlib. */ if (PyImport_ImportFrozenModule("_frozen_importlib") <= 0) { Py_FatalError("Py_Initialize: can't import _frozen_importlib"); } ... }

It would be "nice" to not exit the process if the status is an exception, but that would require to refactor a lot of code :-/ Or a new variant of Py_NewInterpreter() should be added which would return a PyStatus instead. So the caller is free to decide how to handle a PyStatus exception.

Python initialization is quite complex :-/

Victor

Night gathers, and now my watch begins. It shall not end until my death.

Daniel Scott

1 Mar 1 Mar

7:26 p.m.

Sent from Yahoo Mail on Android

On Sat, Feb 29, 2020 at 5:09 PM, Victor Stinner<vstinner@python.org> wrote: Le dim. 23 févr. 2020 à 15:06, Nick Coghlan <ncoghlan@gmail.com> a écrit :

...

P.S. An aspect of this that looks like it has potentially regressed since Python 3.6 is that whereas Py_NewInterpreter used to consistently print the exception and return NULL on failure [2], there are now several cases where we will instead generate an initialisation status code internally, and exit immediately [3] (even though the parent interpreter is presumably still in a valid state).

Py_NewInterpreter() still displays an error and returns NULL when it can.

Extract of Py_NewInterpreter() code of the master branch:

PyStatus status = new_interpreter(&tstate); if (_PyStatus_EXCEPTION(status)) { Py_ExitStatusException(status); }

Py_ExitStatusException() is only called when new_interpreter() fails to handle a regular Python exception. I added PyStatus at part of the PEP 587 implementation.

In Python 3.6, *many* helper function called by Py_NewInterpreter() called instead Py_FatalError()... which also exits immediately the process. Example:

Python initialization is quite complex :-/

Victor

Night gathers, and now my watch begins. It shall not end until my death.

capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org

Daniel Scott

7:28 p.m.

Sent from Yahoo Mail on Android

On Sun, Mar 1, 2020 at 2:26 PM, Daniel Scott via capi-sig<capi-sig@python.org> wrote:

Sent from Yahoo Mail on Android

On Sat, Feb 29, 2020 at 5:09 PM, Victor Stinner<vstinner@python.org> wrote: Le dim. 23 févr. 2020 à 15:06, Nick Coghlan <ncoghlan@gmail.com> a écrit :

...

P.S. An aspect of this that looks like it has potentially regressed since Python 3.6 is that whereas Py_NewInterpreter used to consistently print the exception and return NULL on failure [2], there are now several cases where we will instead generate an initialisation status code internally, and exit immediately [3] (even though the parent interpreter is presumably still in a valid state).

Py_NewInterpreter() still displays an error and returns NULL when it can.

Extract of Py_NewInterpreter() code of the master branch:

PyStatus status = new_interpreter(&tstate); if (_PyStatus_EXCEPTION(status)) { Py_ExitStatusException(status); }

Py_ExitStatusException() is only called when new_interpreter() fails to handle a regular Python exception. I added PyStatus at part of the PEP 587 implementation.

In Python 3.6, *many* helper function called by Py_NewInterpreter() called instead Py_FatalError()... which also exits immediately the process. Example:

Python initialization is quite complex :-/

Victor

Night gathers, and now my watch begins. It shall not end until my death.

capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org

Daniel Scott

7:28 p.m.

New subject: Trying to embed Py3.8 w frozen modules -- cor

Sent from Yahoo Mail on Android

On Sun, Mar 1, 2020 at 2:26 PM, Daniel Scott via capi-sig<capi-sig@python.org> wrote:

Sent from Yahoo Mail on Android

On Sat, Feb 29, 2020 at 5:09 PM, Victor Stinner<vstinner@python.org> wrote: Le dim. 23 févr. 2020 à 15:06, Nick Coghlan <ncoghlan@gmail.com> a écrit :

...

P.S. An aspect of this that looks like it has potentially regressed since Python 3.6 is that whereas Py_NewInterpreter used to consistently print the exception and return NULL on failure [2], there are now several cases where we will instead generate an initialisation status code internally, and exit immediately [3] (even though the parent interpreter is presumably still in a valid state).

Py_NewInterpreter() still displays an error and returns NULL when it can.

Extract of Py_NewInterpreter() code of the master branch:

PyStatus status = new_interpreter(&tstate); if (_PyStatus_EXCEPTION(status)) { Py_ExitStatusException(status); }

Py_ExitStatusException() is only called when new_interpreter() fails to handle a regular Python exception. I added PyStatus at part of the PEP 587 implementation.

In Python 3.6, *many* helper function called by Py_NewInterpreter() called instead Py_FatalError()... which also exits immediately the process. Example:

Python initialization is quite complex :-/

Victor

Night gathers, and now my watch begins. It shall not end until my death.

capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org

Nick Coghlan

7 Mar 7 Mar

6:31 a.m.

On Sun, 1 Mar 2020 at 08:08, Victor Stinner <vstinner@python.org> wrote:

...

Le dim. 23 févr. 2020 à 15:06, Nick Coghlan <ncoghlan@gmail.com> a écrit :

...
P.S. An aspect of this that looks like it has potentially regressed since Python 3.6 is that whereas Py_NewInterpreter used to consistently print the exception and return NULL on failure [2], there are now several cases where we will instead generate an initialisation status code internally, and exit immediately [3] (even though the parent interpreter is presumably still in a valid state).

Py_NewInterpreter() still displays an error and returns NULL when it can.

Extract of Py_NewInterpreter() code of the master branch:
PyStatus status = new_interpreter(&tstate);
if (_PyStatus_EXCEPTION(status)) {
    Py_ExitStatusException(status);
}
Py_ExitStatusException() is only called when new_interpreter() fails to handle a regular Python exception. I added PyStatus at part of the PEP 587 implementation.

In Python 3.6, *many* helper function called by Py_NewInterpreter() called instead Py_FatalError()... which also exits immediately the process.

Ah, I didn't remember noticing this getting any worse in any of the PR reviews, and that's why - the subinterpreter setup was already prone to exiting immediately when "this should never fail" operations failed.

So the appearance of a regression was only the fact that when I read the code again now, it wasn't side-by-side with the old Py_FatalError code, so I forgot how it used to behave :)

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Victor Stinner

1:33 p.m.

Obviously, there is always room for improvement, to let the Py_NewInterpreter() caller handle the error instead of exiting the process ;-)

Victor

Le sam. 7 mars 2020 à 07:31, Nick Coghlan <ncoghlan@gmail.com> a écrit :

...

On Sun, 1 Mar 2020 at 08:08, Victor Stinner <vstinner@python.org> wrote:

...
Le dim. 23 févr. 2020 à 15:06, Nick Coghlan <ncoghlan@gmail.com> a écrit :

...
P.S. An aspect of this that looks like it has potentially regressed since Python 3.6 is that whereas Py_NewInterpreter used to consistently print the exception and return NULL on failure [2], there are now several cases where we will instead generate an initialisation status code internally, and exit immediately [3] (even though the parent interpreter is presumably still in a valid state).

Py_NewInterpreter() still displays an error and returns NULL when it can.

Extract of Py_NewInterpreter() code of the master branch:
PyStatus status = new_interpreter(&tstate);
if (_PyStatus_EXCEPTION(status)) {
    Py_ExitStatusException(status);
}
Py_ExitStatusException() is only called when new_interpreter() fails to handle a regular Python exception. I added PyStatus at part of the PEP 587 implementation.

In Python 3.6, *many* helper function called by Py_NewInterpreter() called instead Py_FatalError()... which also exits immediately the process.
Ah, I didn't remember noticing this getting any worse in any of the PR reviews, and that's why - the subinterpreter setup was already prone to exiting immediately when "this should never fail" operations failed.

So the appearance of a regression was only the fact that when I read the code again now, it wasn't side-by-side with the old Py_FatalError code, so I forgot how it used to behave :)

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

-- Night gathers, and now my watch begins. It shall not end until my death.

Robert Steckroth

2:01 p.m.

And meanwhile, it is simple to print bottle return slips with any barcode and use them at the walmart stores. Boy, thank goodness for all of these high priced software folks.

On Sat, Mar 7, 2020 at 8:34 AM Victor Stinner <vstinner@python.org> wrote:

...

Obviously, there is always room for improvement, to let the Py_NewInterpreter() caller handle the error instead of exiting the process ;-)

Victor

Le sam. 7 mars 2020 à 07:31, Nick Coghlan <ncoghlan@gmail.com> a écrit :

...
On Sun, 1 Mar 2020 at 08:08, Victor Stinner <vstinner@python.org> wrote:

...
Le dim. 23 févr. 2020 à 15:06, Nick Coghlan <ncoghlan@gmail.com> a

...
...
...
P.S. An aspect of this that looks like it has potentially regressed since Python 3.6 is that whereas Py_NewInterpreter used to consistently print the exception and return NULL on failure [2],

écrit : there

...
...
...
are now several cases where we will instead generate an initialisation status code internally, and exit immediately [3] (even though the parent interpreter is presumably still in a valid state).

Py_NewInterpreter() still displays an error and returns NULL when it can.

Extract of Py_NewInterpreter() code of the master branch:
PyStatus status = new_interpreter(&tstate);
if (_PyStatus_EXCEPTION(status)) {
    Py_ExitStatusException(status);
}
Py_ExitStatusException() is only called when new_interpreter() fails to handle a regular Python exception. I added PyStatus at part of the PEP 587 implementation.

In Python 3.6, *many* helper function called by Py_NewInterpreter() called instead Py_FatalError()... which also exits immediately the process.
Ah, I didn't remember noticing this getting any worse in any of the PR reviews, and that's why - the subinterpreter setup was already prone to exiting immediately when "this should never fail" operations failed.

So the appearance of a regression was only the fact that when I read the code again now, it wasn't side-by-side with the old Py_FatalError code, so I forgot how it used to behave :)

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-- Night gathers, and now my watch begins. It shall not end until my death.

capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org

Katie Lucas

5 Mar 5 Mar

10:04 a.m.

I've finally made it back to this (after spending a bunch of time fiddling with websocket systems for some other elements). We've removed this from our critical path so it's less vital we get it working, but it would still be nice to have the option...

I've added the encodings modules and fixed the sizes to be negative to indicate modules and so on. Boringly, I'm still seeing the encodings module returning null and subsequent modules fail unmarshalling. Interestingly the failure mode is now slightly different -- it's failing to get the codec for stdio... I'm going to try configuring that to be something meaningful.

On Sun, 23 Feb 2020 at 14:05, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Wed, 19 Feb 2020 at 01:56, Katie Lucas <katie.lucas@fetch.ai> wrote:

...
I've been tracing this through the Python code and it seems that when PyImport_ImportFrozenModule imports encodings originally (in the core setup), it gets the module code, runs it with exec_code_in_module. The result is a null and therefore in import.c at ~L990, remove_module() is called and it's not in the PyCodecRegistry for later.

While Petr is unfortunately correct that subinterpreters probably aren't mature enough today to deliver the outcome you're looking for, we'd still expect at least this initial setup part to be working (we just wouldn't expect you to get true parallel execution with the current state of the implementation - we'd expect you to end up blocking on the GIL, and hence still ending up restricted to one core despite the use of multiple subinterpreters. While having want you're trying to do work the way you expect it to work is an active design goal, I think we're still at least a few releases away from achieving it).

One possibility that *could* potentially work for you today is the Cython "nogil" section marker, where code that is able to compile down to pure C library calls can be told to release the GIL, and hence spread out across all available CPUs. Failing that, the new shared memory support in the multiprocessing module is designed to allow the use of multiple processes to get around the GIL, without the overhead of having to serialise potentially large datasets to pass information between processes (potentially in combination with pickle protocol 5's out-of-band buffer support).

For your reported start-up problem, though, looking through the code and your list of frozen modules at

https://github.com/engineerkatie/infra-ml2/blob/master/python_engine/frozen_... , my best guesses as to potential sources of problems:

encodings is being given a positive size, so it will be processed as a module rather than as a package (make the size negative to say "this can have frozen submodules" [1])

encodings.aliases isn't being frozen, but the encodings __init__ contains a "from . import aliases" statement (I'm guessing this is where your exec_code_in_module is failing)

encodings.latin_1 isn't being frozen, so that eager import will fail later on (both it and utf-8 get imported eagerly during interpreter initialization)

Cheers, Nick.

[1] I have no idea if that is actually documented anywhere, as I got it from the comments in https://github.com/python/cpython/blob/3.8/Python/frozen.c#L31

P.S. An aspect of this that looks like it has potentially regressed since Python 3.6 is that whereas Py_NewInterpreter used to consistently print the exception and return NULL on failure [2], there are now several cases where we will instead generate an initialisation status code internally, and exit immediately [3] (even though the parent interpreter is presumably still in a valid state). However, that shouldn't matter too much in your case, as you should at least still be getting the full chained tracebacks for the subinterpreter initialisation failures.

[2] https://github.com/python/cpython/blob/3.6/Python/pylifecycle.c#L845 [3] https://github.com/python/cpython/blob/3.8/Python/pylifecycle.c#L1562

Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

1708

Age (days ago)

1731

Last active (days ago)

List overview

Download

14 comments

8 participants

participants (8)

Daniel Scott
Eric Snow
Katie Lucas
M.-A. Lemburg
Nick Coghlan
Petr Viktorin
Robert Steckroth
Victor Stinner

Trying to embed Py3.8 w frozen modules -- correct place to seek help?

Thanks,

Thanks,

Thanks,

Thanks,

[2] https://github.com/python/cpython/blob/3.6/Python/pylifecycle.c#L845 [3] https://github.com/python/cpython/blob/3.8/Python/pylifecycle.c#L1562

Victor

Victor

Victor

Victor

[2] https://github.com/python/cpython/blob/3.6/Python/pylifecycle.c#L845 [3] https://github.com/python/cpython/blob/3.8/Python/pylifecycle.c#L1562

tags

participants (8)