[Pythonmac-SIG] Getting Terminal input and output encoding

Nir Soffer nirs at freeshell.org
Thu Feb 16 14:00:56 CET 2006


I'm trying to find a way to get the user encoding used for example for  
command line arguments e.g.:

	# creating an Hebrew file name...
	touch עברית
	./foo.py *

 From my experience with Mac OS X 10.0-3, I know the foo.py will always  
get hebrew-name using utf-8.

You can also see this when you type non-ASCII in the Terminal:

	$ touch \327\242\327\221\327\250\327\231\327\252

Will create the file named "עברית"

I noticed that it does not matter what encoding you set in the Terminal  
window setting, anything you type will use utf-8 encoding.

Anyway, I could not find any documentation about this issue, expect  
this:

	"All BSD system functions expect their string parameters to be in  
UTF-8 encoding
	and nothing else. Code that calls BSD system routines should ensure  
that the contents of all const *char parameters are in canonical UTF-8  
encoding."
	<http://developer.apple.com/documentation/MacOSX/Conceptual/ 
BPInternational/Articles/FileEncodings.html#//apple_ref/doc/uid/ 
20002137-DontLinkElementID_4>


On Linux people are getting the encoding with:

	import locale
	locale.getpreferredencoding()

But on OS X getpreferredencoding() returns useless results, at least  
for decoding command line arguments or printing readable output. For  
example:

	1. Choose "Window Settings..." in the Terminal and set the Character  
Set Encoding to Unicode (UTF-8)
  	2. Try:
	>>> import locale
	>>> locale.getpreferredencoding()
	'mac-roman'

I have found this code trying to correct the behavior (from bzrlib):

	# work around egregious python 2.4 bug
	>>> import sys
	>>> sys.platform = 'posix'
	>>> import locale
	>>> locale.getpreferredencoding()
	'US-ASCII'
	>>> sys.platform = 'darwin'

Obviously this workaround does not work around this problem :-)	
So my conclusion is that Mac OS X uses always utf-8 for input to the  
shell. Unless I am missing something?

Next, how can you get the Terminal output encoding? For example, what  
if a user changed the Character Set Encoding to Western (Mac OS Roman)  
- how can you detect this setting from Python?


Best Regards,

Nir Soffer


More information about the Pythonmac-SIG mailing list