Faulty encoding settings

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Tue Oct 17 13:24:26 EDT 2006


In <slrnej9ogv.nk.horpner at FIAD06.norwich.edu>, Neil Cerutti wrote:

> I'm writing an application that needs all internal character data
> to be stored in iso-8859-1. It also must allow input and output
> using stdin and stdout.
> 
> This works just fine with the Windows binary of Python.
> sys.stdin.encoding is correctly set to the encoding of the
> current terminal ('cp437').
> 
> s = sys.stdin.readline()
> # Convert to iso-8859-1.
> s = s.decode(sys.stdin.encoding).encode('iso-8859-1')
> 
> Granted, users are constrained to entering characters in the
> cp437 charset, but that's better than the following.
> 
> The Cygwin binary I have (2.4.3) reports sys.stdin.encoding as
> 'US-ASCII', which is quite wrong. A Cygwin terminal uses, as far
> as I can tell, iso-8859-1. This renders the above construction
> useless if the user enters any character codes above 128.
> Using raw_input instead of readline addresses the problem by making
> it impossible to enter non-ascii text.
> 
> Please advise.

Give the user the ability to explicitly give an encoding.  Using the
encoding attribute of files is quite fragile.  If you redirect stdin or
stdout the encoding is set to None for example because the interpreter
can't tell what encoding the "other side" of the redirection produces or
expects.

BTW the US-ASCII isn't wrong but just limiting as everything in the ASCII
range is the same in ISO-8859-1.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list