[Python-ideas] Re: Improving sys.executable for embedded Python scenarios

2 May 2021

      On Sat, May 1, 2021 at 10:49 AM Gregory Szorc <gregory.szorc@gmail.com>
wrote:
...
The way it works today, if you have an application embedding Python, your
sys.argv[0] is (likely) your main executable and sys.executable is probably
None or the empty string (per the stdlib docs which say not to set
sys.executable if there isn't a path to a known `python` executable).
Unfortunately, since sys.executable is a str, the executable it points to
must behave as `python` does. This means that your application embedding
and distributing its own Python must provide a `python` or `python`-like
standalone executable and use it for sys.executable and this executable
must be independent from your main application because the run-time
behavior is different. (Yes, you can employ symlink hacks and your
executable can sniff argv[0] and dispatch to your app or `python`
accordingly. But symlinks aren't reliable on Windows and this still
requires multiple files/executables.) **This limitation effectively
prevents the existence of single file application binaries who also want to
expose a full `python`-like environment, as there's no standard way to
advertise a mechanism to invoke `python` that isn't a standalone executable
with no arguments.**
minor nit: I wouldn't use the words "must behave" above... Since using
sys.executable = None at work for the past five years.  The issues we run
into are predominantly in unit tests that try to launch an interpreter via
subprocess of sys.executable.  and the bulk of that is in CPython's own
test suite (which I vote "doesn't really count").

regardless, not needing to tweak even those would be a convenience and it
could open up doors for some more application frameworks that make such
environment assumptions and are thus hard to distribute stand-alone.  ex:
It'd open the door for multiprocessing spawn mode within stand alone
embedded binaries.

While applications embedding Python may not have an explicit `python`
...
executable, they do likely have the capability to instantiate a
`python`-like environment at run-time: they have the interpreter after all,
they "just" need to provide a mechanism to invoke Py_RunMain() with an
interpreter config initialized using the "python" profile.
**I'd like to propose a long-term replacement to sys.executable that
enables applications embedding Python to advertise a mechanism for invoking
the same executable such that they get a `python` experience.**
The easiest way to do this is to introduce a list[str] variant. Let's call
it sys.python_interpreter. Here's how it would work.
Say I've produced myapp.exe, a Windows application. If you run `myapp.exe
python --`, the executable behaves like `python`. e.g. `myapp.exe python --
-c 'print("hello, world")'` would be equivalent to `python -c
'print("hello, world")'`. The app would set `sys.python_interpreter =
["myapp.exe", "python", "--"]`. Then Python code wanting to invoke a Python
interpreter would do something like
`subprocess.run(sys.python_interpreter)` and automatically dispatch through
the same executable.
yep, that seems reasonable.  unfortunately the command line arguments are a
global namespace, but choosing a unique "launch me as a standalone python
interpreter" arg when building a standalone python executable app that will
never conflict with an application, at build time, is doable.  Nobody's
application wants this specific unique per build ---$(uuid) flag in argv[1]
right? ;) ...

There's still an API challenge to decide on here: people using
sys.executable also expect to pass flags to the python interpreter.  Do we
make an API guarantee that the final flag in sys.python_interpreter is
always a terminator that separates python flags from application flags (--
or otherwise)?

For applications not wanting to expose a `python`-like capability, they
...
would simply set sys.python_interpreter to None or [], just like they do
with sys.executable today.
Yep.  Though that should be done at stand alone python application build
time to avoid any command line of the binary possibly launching as a plain
interpreter.  (this isn't security, anyone with access to read the stand
alone executable can figure out how to construct a raw interpreter usable
in their environment from that)
...
In fact, I imagine Python's initialization would automatically set
sys.python_interpreter to [sys.executable] by default and applications
would have to opt in to a more advanced PyConfig field to make
sys.python_interpreter different. This would make sys.python_interpreter
behaviorally backwards compatible, so code bases could use
sys.python_interpreter as a modern substitute for sys.executable, if
available, without that much risk.
+1

-gps
...
Some applications may want more advanced mechanisms than command line
arguments to dispatch off of. For example, maybe you want to key off an
environment variable to activate "Python mode."  This scenario is a bit
harder to implement, as it would require yet another advertisement on how
to invoke `python`. If subprocess had a "builder" interface for iteratively
constructing a process invocation, we could expose a stdlib function to
return a builder preconfigured to invoke `python`. But since such an
interface doesn't exist, there's not as clean a solution for cases that
require something more advanced than additional process arguments. Maybe we
could make sys.python_interpreter a tuple[list[str], dict[str, str]] where
that dict is environment variables to set. Doable. But I'm unconvinced the
complexity is warranted, especially since the application has full control
over interpreter initialization and can set most of the settings that
they'd want to set through environment variables (e.g. PYTHONHOME) as part
of initializing the `python`-like environment.
Yes, there will be a long tail of applications needing to adapt to the
reality that sys.python_interpreter exists and is a list. Checks like `if
sys.executable == sys.argv[0]` will need to become more complicated. Maybe
we could expose a simple "am I a Python interpreter process" in the stdlib?
(The inverse "am I not a Python interpreter executable" question could also
benefit from stdlib standardization, as there are unofficial mechanisms
like sys.frozen and sys.meipass attempting to answer this question.)
Anyway, as it stands, sys.executable just doesn't work for applications
embedding Python who want to expose a full `python`-like environment from
single executable distributions. I think the introduction of a new API to
allow applications to "self-dispatch" to a Python interpreter could
eventually lead to significant ergonomic wins for embedded Python
applications. This would make Python a more attractive target for
embedding, which benefits the larger Python ecosystem.
Thoughts?
(I rarely post here. So if this idea is actionable, please inform me of
next steps to make it become a reality.)
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/O66N56...
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Improving sys.executable for embedded Python scenarios

Gregory P. Smith