Mailman 3 Improving sys.executable for embedded Python scenarios - Python-ideas

1 May 2021

      The way it works today, if you have an application embedding Python, your
sys.argv[0] is (likely) your main executable and sys.executable is probably
None or the empty string (per the stdlib docs which say not to set
sys.executable if there isn't a path to a known `python` executable).

Unfortunately, since sys.executable is a str, the executable it points to
must behave as `python` does. This means that your application embedding
and distributing its own Python must provide a `python` or `python`-like
standalone executable and use it for sys.executable and this executable
must be independent from your main application because the run-time
behavior is different. (Yes, you can employ symlink hacks and your
executable can sniff argv[0] and dispatch to your app or `python`
accordingly. But symlinks aren't reliable on Windows and this still
requires multiple files/executables.) **This limitation effectively
prevents the existence of single file application binaries who also want to
expose a full `python`-like environment, as there's no standard way to
advertise a mechanism to invoke `python` that isn't a standalone executable
with no arguments.**

While applications embedding Python may not have an explicit `python`
executable, they do likely have the capability to instantiate a
`python`-like environment at run-time: they have the interpreter after all,
they "just" need to provide a mechanism to invoke Py_RunMain() with an
interpreter config initialized using the "python" profile.

**I'd like to propose a long-term replacement to sys.executable that
enables applications embedding Python to advertise a mechanism for invoking
the same executable such that they get a `python` experience.**

The easiest way to do this is to introduce a list[str] variant. Let's call
it sys.python_interpreter. Here's how it would work.

Say I've produced myapp.exe, a Windows application. If you run `myapp.exe
python --`, the executable behaves like `python`. e.g. `myapp.exe python --
-c 'print("hello, world")'` would be equivalent to `python -c
'print("hello, world")'`. The app would set `sys.python_interpreter =
["myapp.exe", "python", "--"]`. Then Python code wanting to invoke a Python
interpreter would do something like
`subprocess.run(sys.python_interpreter)` and automatically dispatch through
the same executable.

For applications not wanting to expose a `python`-like capability, they
would simply set sys.python_interpreter to None or [], just like they do
with sys.executable today. In fact, I imagine Python's initialization would
automatically set sys.python_interpreter to [sys.executable] by default and
applications would have to opt in to a more advanced PyConfig field to make
sys.python_interpreter different. This would make sys.python_interpreter
behaviorally backwards compatible, so code bases could use
sys.python_interpreter as a modern substitute for sys.executable, if
available, without that much risk.

Some applications may want more advanced mechanisms than command line
arguments to dispatch off of. For example, maybe you want to key off an
environment variable to activate "Python mode."  This scenario is a bit
harder to implement, as it would require yet another advertisement on how
to invoke `python`. If subprocess had a "builder" interface for iteratively
constructing a process invocation, we could expose a stdlib function to
return a builder preconfigured to invoke `python`. But since such an
interface doesn't exist, there's not as clean a solution for cases that
require something more advanced than additional process arguments. Maybe we
could make sys.python_interpreter a tuple[list[str], dict[str, str]] where
that dict is environment variables to set. Doable. But I'm unconvinced the
complexity is warranted, especially since the application has full control
over interpreter initialization and can set most of the settings that
they'd want to set through environment variables (e.g. PYTHONHOME) as part
of initializing the `python`-like environment.

Yes, there will be a long tail of applications needing to adapt to the
reality that sys.python_interpreter exists and is a list. Checks like `if
sys.executable == sys.argv[0]` will need to become more complicated. Maybe
we could expose a simple "am I a Python interpreter process" in the stdlib?
(The inverse "am I not a Python interpreter executable" question could also
benefit from stdlib standardization, as there are unofficial mechanisms
like sys.frozen and sys.meipass attempting to answer this question.)

Anyway, as it stands, sys.executable just doesn't work for applications
embedding Python who want to expose a full `python`-like environment from
single executable distributions. I think the introduction of a new API to
allow applications to "self-dispatch" to a Python interpreter could
eventually lead to significant ergonomic wins for embedded Python
applications. This would make Python a more attractive target for
embedding, which benefits the larger Python ecosystem.

Thoughts?

(I rarely post here. So if this idea is actionable, please inform me of
next steps to make it become a reality.)

Improving sys.executable for embedded Python scenarios

Gregory Szorc

Gregory P. Smith

tags

participants (2)