It seems that there will be some refactoring of the tokenizer code. Regarding this, I'd like to recall my proposal on readline hooks. It would be nice if char* based PyOS_Readline API was replaced by a Python str based hook customizable by Python code. I propose to add function sys.readlinehook accepting optional prompt and returning a line read interactively from a user. There would also be sys.__readlinehook__ containing the original value of sys.readlinehook (similarly to sys.(__)displayhook(__), sys.(__)excepthook(__) and sys.(__)std(in/out/err)(__)).
Currently, the input is read from C stdin even if sys.stdin is changed (see
http://bugs.python.org/issue17620). This complicates fixing
http://bugs.python.org/issue1602 – the standard sys.std* streams are not capable of communicating in Unicode with Windows console, and replacing the streams with custom ones is not enough – one has also to install a custom readline hook, which is currently complicated. And even after installing a custom readine hook one finds out that Python tokenizer cannot handle UTF-16, so he has to wrap the custom stream objects just to let their encoding attribute have a different value, because readlinehook currently returns char* rather than a Python string. For more details see the documentation of my package:
https://github.com/Drekin/win-unicode-console.
The pyreadline package also sets up a custom readline so it would benefit if doing so would be easier. Moreover, the two consumers of PyOS_Readline API – the input function and the tokenizer – assume a different encoding of the bytes returned by the readlinehook. Effectively, one assumes sys.stdout.encoding and the other sys.stdin.encoding, so if these two are different, there is no way to implement a correct readline hook.