[Python-ideas] Move more parts of interpreter core to stdlib

Draic Kin drekin at gmail.com
Mon Aug 26 16:36:53 CEST 2013

Hello, it would be nice if reference pure Python implementation existed for
more parts of interpreter core and the core actually used them. This was
done for import machinery in Python 3.3 making importlib library.

One potential target for this would be moving the logic of what python.exe
does – parsing its arguments, finding the script to run, compiling its code
and running as __main__ module, running REPL if it's in interactive mode
afterwards. There could be a stdlib module exposing this logic, using
argparse to parse the arguments, overhauled version of runpy to run the
__main__ module and overhauled version of code to run the REPL. Python.exe
would be just thin wrapper which bootstraps the interpreter and runs this
runner module.

What brought me to this idea. I just wanted to use unicode in REPL on
Windows. It doesn't work because sys.stdin.buffer.raw.read just doesn't
read unicode characters on Windows, similar for
sys.stdout.buffer.raw.write. See http://bugs.python.org/issue1602 . There
is a workaround, one can write custom sys.stdin and sys.stdout objects
which use winapi functions ReadConsoleW and WriteConsoleW called via
ctypes. Setting these objects seems to solve the problem – input() and
print() work during execution of a script. There is however problem in
interactive mode since Python REPL actually doesn't use sys.stdin for
input. It takes input from real STDIN, but it uses encoding of sys.stdin,
which doesn't seem to make sense, see http://bugs.python.org/issue17620 .
So I have written my own REPL based on stdlib module 'code', but I needed
some hook which runs it just after the execution of a script and before
standard REPL starts. There is just PYTHONSTARTUP environment variable
which works only for bare Python console, not running any script. So I
needed a script run.py such that "py <options> run.py [<somescript>
[<args>]]" did almost the same thing as "py <options> [<somescript>
[<args>]]". It would run my REPL at the right time.

Writing things like run.py is difficult since there are many details one
should handle so the inner script being run behaves the same way as if it
was run directly. It is also difficult to test it (e.g.
http://bugs.python.org/issue18838). It would be easy if there where
reference implementation how Python itself does it. The script like run.py
has more use cases, for example Ned Batchelder's coverage.py implements its
own version.

Generally if more parts of interpreter core were exposed via stdlib, issues
like the ones mentioned could be handled more easily. Another example:
there are some issues when one hits Ctrl-C on input on Windows, it seems
that one should detect the condition and wait for signal to arrive (see
http://bugs.python.org/issue18597 , http://bugs.python.org/issue17619 ). I
thought that input() is just a thin wrapper around sys.stdin.readline() so
it could be easily implemented in pure Python (there was even idea that
input() won't be in Python 3000). So it surprises me that input() is
implemented in very different way and provides alternative codepath to
low-level reading function that the codepath sys.stdin -> sys.stdin.buffer
-> sys.stdin.buffer.raw. If there was only one path, it would be easier to
fix issues like that.

Thank you for response, Drekin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130826/69c53754/attachment.html>

More information about the Python-ideas mailing list