Proposal: new environment variable PYTHONSTDOUTENCODING
I'd like to propose a new environment variable PYTHONSTDOUTENCODING. This is meant to solve various problems that people had with Python not detecting their terminal encoding correctly; it would override any detection that Python would use for determining the encoding of stdout (and stdin - but that's less relevant in 2.x). In particular, setting this environment variable would also disable the detection of whether stdout is a terminal. This is desirable for cases as the pydev eclipse plugin, where Python currently fails to detect that the output is a terminal (and technically, what Eclipse provides is not a terminal, but just a pipe, as you can't do pseudoterms in Java). This would have the additional effect that the encoding also gets in effect when redirecting stdout to a file. Whether or not this is a good thing might be debatable; giving the user the control over it (to set or clear that variable) is a good thing, IMO. Naming contest: it probably would be the longest of the PYTHON* variables. I would not want to call it PYTHONENCODING, or PYTHONSTDENCODING, though, because people might infer that it affects sys.getdefaultencoding(), which it shouldn't. Regards, Martin
On 2008-05-20 10:22, Martin v. Löwis wrote:
I'd like to propose a new environment variable PYTHONSTDOUTENCODING. This is meant to solve various problems that people had with Python not detecting their terminal encoding correctly; it would override any detection that Python would use for determining the encoding of stdout (and stdin - but that's less relevant in 2.x).
How is this relevant for 2.x ? In 2.x, stdin and stdout are just files without any io wrappers around them. Writing Unicode to stdout will still use the default encoding ASCII to convert it to an 8-bit string. All other 8-bit strings will be passed to stdout as-is. For 3.x, I'd like to see a PYTHONSTDINENCODING, because the current way of relying on the terminal encoding does work well... it then falls back to ASCII, which prevents entering e.g. German Umlauts.
In particular, setting this environment variable would also disable the detection of whether stdout is a terminal. This is desirable for cases as the pydev eclipse plugin, where Python currently fails to detect that the output is a terminal (and technically, what Eclipse provides is not a terminal, but just a pipe, as you can't do pseudoterms in Java).
This would have the additional effect that the encoding also gets in effect when redirecting stdout to a file. Whether or not this is a good thing might be debatable; giving the user the control over it (to set or clear that variable) is a good thing, IMO.
Naming contest: it probably would be the longest of the PYTHON* variables. I would not want to call it PYTHONENCODING, or PYTHONSTDENCODING, though, because people might infer that it affects sys.getdefaultencoding(), which it shouldn't.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 20 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
On Tue, May 20, 2008 at 10:41 AM, M.-A. Lemburg <mal@egenix.com> wrote:
On 2008-05-20 10:22, Martin v. Löwis wrote:
I'd like to propose a new environment variable PYTHONSTDOUTENCODING. This is meant to solve various problems that people had with Python not detecting their terminal encoding correctly; it would override any detection that Python would use for determining the encoding of stdout (and stdin - but that's less relevant in 2.x).
How is this relevant for 2.x ?
In 2.x, stdin and stdout are just files without any io wrappers around them.
Writing Unicode to stdout will still use the default encoding ASCII to convert it to an 8-bit string. All other 8-bit strings will be passed to stdout as-is.
You're forgetting about print; in Python 2.x, when stdout is connected to a terminal, the locale settings (typically the LANG, LC_ALL and LC_CTYPE environment variables) are taken into account when 'print' writes to sys.stdout. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Tue, May 20, 2008 at 12:16 PM, Thomas Wouters <thomas@python.org> wrote:
You're forgetting about print; in Python 2.x, when stdout is connected to a terminal, the locale settings (typically the LANG, LC_ALL and LC_CTYPE environment variables) are taken into account when 'print' writes to sys.stdout.
Isn't it then enough to make sure your locale setting are correct? (Never had any problems myself, if works great in Ubuntu). -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64
You're forgetting about print; in Python 2.x, when stdout is connected to a terminal, the locale settings (typically the LANG, LC_ALL and LC_CTYPE environment variables) are taken into account when 'print' writes to sys.stdout.
Isn't it then enough to make sure your locale setting are correct? (Never had any problems myself, if works great in Ubuntu).
It's much more difficult on OS X, for example, which doesn't really support the concept of locales (at least prior to 10.5). There are other odd cases, like the Eclipse one I mentioned. Setting the locale doesn't help there. Regards, Martin
On 2008-05-20 12:16, Thomas Wouters wrote:
On Tue, May 20, 2008 at 10:41 AM, M.-A. Lemburg <mal@egenix.com> wrote:
On 2008-05-20 10:22, Martin v. Löwis wrote:
I'd like to propose a new environment variable PYTHONSTDOUTENCODING. This is meant to solve various problems that people had with Python not detecting their terminal encoding correctly; it would override any detection that Python would use for determining the encoding of stdout (and stdin - but that's less relevant in 2.x).
How is this relevant for 2.x ?
In 2.x, stdin and stdout are just files without any io wrappers around them.
Writing Unicode to stdout will still use the default encoding ASCII to convert it to an 8-bit string. All other 8-bit strings will be passed to stdout as-is.
You're forgetting about print; in Python 2.x, when stdout is connected to a terminal, the locale settings (typically the LANG, LC_ALL and LC_CTYPE environment variables) are taken into account when 'print' writes to sys.stdout.
Thanks for reminding me. I had forgotten about that special case. So "sys.stdout.write(unicode)" will always use the default encoding, whereas "print unicode" uses the sys.stdout.encoding, correct ? Hmm, wouldn't it be better to always use .encoding and also make it adjustable from Python (it is adjustable from C) ?! PYTHONSTDOUTENCODING could then provide the default to sys.stdout.encoding. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 20 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
On 2008-05-20 20:23, Martin v. Löwis wrote:
Writing Unicode to stdout will still use the default encoding ASCII to convert it to an 8-bit string.
That's not true.
Are you sure ?
setenv LC_ALL de_DE.utf8 python2.5 Python 2.5 (r25:51908, May 9 2007, 00:53:06)
u = u'äöü' sys.stdout.write(u) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) print u äöü
Only "print" will set the Py_PRINT_RAW flag to trigger the conversion from Unicode to 8-bit strings using .encoding in PyFile_WriteObject(). If not set, the default encoding is used. I'm not exactly sure why, since using .encoding would be useful in all cases. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 20 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
On 2008-05-20 21:34, Martin v. Löwis wrote:
I'm not exactly sure why, since using .encoding would be useful in all cases.
Right, I think it should use the file's encoding also for .write.
Could you add that to the proposal ?! Please also add the ability to set the .encoding on file objects. This would make adjusting e.g. sys.stdout.encoding easier from within Python. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 20 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
On Tue, May 20, 2008 at 10:22:37AM +0200, "Martin v. L?wis" wrote:
I'd like to propose a new environment variable PYTHONSTDOUTENCODING. This is meant to solve various problems that people had with Python not detecting their terminal encoding correctly; it would override any detection that Python would use for determining the encoding of stdout (and stdin - but that's less relevant in 2.x).
Is it to override locale settings in case the user wants a different encoding? for such cases as redirected stdout, or windows console (which has an "OEM" encoding that differs from the locale encoding)?
Naming contest: it probably would be the longest of the PYTHON* variables. I would not want to call it PYTHONENCODING, or PYTHONSTDENCODING, though, because people might infer that it affects sys.getdefaultencoding(), which it shouldn't.
PYTHONIOENCODING? Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
On Tue, May 20, 2008 at 12:48 PM, Oleg Broytmann <phd@phd.pp.ru> wrote:
On Tue, May 20, 2008 at 10:22:37AM +0200, "Martin v. L?wis" wrote:
I'd like to propose a new environment variable PYTHONSTDOUTENCODING. This is meant to solve various problems that people had with Python not detecting their terminal encoding correctly; it would override any detection that Python would use for determining the encoding of stdout (and stdin - but that's less relevant in 2.x).
Is it to override locale settings in case the user wants a different encoding? for such cases as redirected stdout, or windows console (which has an "OEM" encoding that differs from the locale encoding)?
Naming contest: it probably would be the longest of the PYTHON* variables. I would not want to call it PYTHONENCODING, or PYTHONSTDENCODING, though, because people might infer that it affects sys.getdefaultencoding(), which it shouldn't.
PYTHONIOENCODING?
What about PYTHONLANG ? or something that tries to reflect which environment variables are used for this ? (LC_CTYPE -> PYTHONCTYPE ? if the code uses just LC_CTYPE) http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html#tag_0... Just for my own knowledge: why it has to be one word ? can't it be PYTHON_LANG ? Tarek
Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
What about PYTHONLANG ?
or something that tries to reflect which environment variables are used for this ?
(LC_CTYPE -> PYTHONCTYPE ? if the code uses just LC_CTYPE)
It's not meant to name a locale, but an encoding. In fact, that the encoding is tied to the locale is IMO a misconception in the POSIX locale machinery.
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html#tag_0...
Just for my own knowledge: why it has to be one word ? can't it be PYTHON_LANG ?
No technical reason - just because PYTHONPATH, PYTHONHOME, PYTHONUNBUFFERED, PYTHONVERBOSE, PYTHONSTARTUP, and PYTHONCASEOK don't have underscores, either. Regards, Martin
Is it to override locale settings in case the user wants a different encoding? for such cases as redirected stdout, or windows console (which has an "OEM" encoding that differs from the locale encoding)?
On Windows, the setlocale mechanism isn't used at all, since it doesn't support nl_langinfo (let alone CODESET). And yes, that is to override whatever determination Python would make on its own.
Naming contest: it probably would be the longest of the PYTHON* variables. I would not want to call it PYTHONENCODING, or PYTHONSTDENCODING, though, because people might infer that it affects sys.getdefaultencoding(), which it shouldn't.
PYTHONIOENCODING?
Unprecise in a different way (as it is meant to apply only to stdout, not to all IO), but shorter. Regards, Martin
On Tue, May 20, 2008 at 10:22:03PM +0200, "Martin v. L?wis" wrote:
PYTHONIOENCODING?
Unprecise in a different way (as it is meant to apply only to stdout, not to all IO), but shorter.
I don't think you can make it both precise and short. If you want to be precise and have both PYTHON and STDOUT - shorten ENCODING to ENC. If you agree to sacrifice PYTHON - make it PYSTDOUTENCODING. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Hi Martin, On Tue, May 20, 2008 at 10:22:37AM +0200, "Martin v. Löwis" wrote:
In particular, setting this environment variable would also disable the detection of whether stdout is a terminal.
In this case, it seems to me that existing programs that start python as a non-interactive subprocess, via a mecanism like os.popen2() or the equivalent in other languages, will receive bogus data (the python banner) and/or hang in unexpected ways (the subprocess waiting for more input after the prompt). A bientot, Armin
participants (7)
-
"Martin v. Löwis"
-
Armin Rigo
-
Lennart Regebro
-
M.-A. Lemburg
-
Oleg Broytmann
-
Tarek Ziadé
-
Thomas Wouters