[Python-Dev] Generalised String Coercion

M.-A. Lemburg mal at egenix.com
Mon Aug 8 13:06:31 CEST 2005

Michael Hudson wrote:
> "M.-A. Lemburg" <mal at egenix.com> writes:
>>Set the external encoding for stdin, stdout, stderr:
>>(also an example for adding encoding support to an
>>existing file object):
>>def set_sys_std_encoding(encoding):
>>    # Load encoding support
>>    (encode, decode, streamreader, streamwriter) = codecs.lookup(encoding)
>>    # Wrap using stream writers and readers
>>    sys.stdin = streamreader(sys.stdin)
>>    sys.stdout = streamwriter(sys.stdout)
>>    sys.stderr = streamwriter(sys.stderr)
>>    # Add .encoding attribute for introspection
>>    sys.stdin.encoding = encoding
>>    sys.stdout.encoding = encoding
>>    sys.stderr.encoding = encoding
>>Example session:
>>>>>print 'hello'
>>Note that the interactive session bypasses the sys.stdin
>>redirection, which is why you can still enter Python
>>commands in ASCII - not sure whether there's a reason
>>for this, or whether it's just a missing feature.
> Um, I'm not quite sure how this would be implemented.  Interactive
> input comes via PyOS_Readline which deals in FILE*s... this area of
> the code always confuses me :(

Me too.

It appears that this part of the Python code
has undergone so many iterations and patches, that the
structure has suffered a lot, e.g. the main() functions calls
PyRun_AnyFileFlags(stdin, "<stdin>", &cf),
but the fp argument stdin is then subsequently
ignored if the tok_nextc() function finds that
a prompt is set.

Anyway, hacking along the same lines, I think
the above can be had by changing tok_stdin_decode()
to use a possibly available sys.stdin.decode()
method for the decoding of the data read by
PyOS_Readline(). This would then return Unicode
which tok_stdin_decode() could then encode to
UTF-8 which is the encoding that the tokenizer
can work on.

