[Tutor] sys.getfilesystemencoding()

Albert-Jan Roskam fomcl at yahoo.com
Tue Dec 18 14:13:58 CET 2012


Hi,
 
I am trying to write a file with a 'foreign' unicode name (I am aware that this is a highly western-o-centric way of putting it). In Linux, I can encode it to utf-8 and the file name is displayed correctly. In windows xp, the characters can, apparently, not be represented in this encoding called 'mbcs'. How can I write file names that are always encoded correctly on any platform? Or is this a shortcoming of Windows?
 
# Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32
import sys

 
def _encodeFileName(fn):
    """Helper function to encode unicode file names into system file names.
    http://effbot.org/pyref/sys.getfilesystemencoding.htm"""
    isWindows = sys.platform.startswith("win")
    isUnicode = isinstance(fn, unicode)
    if isUnicode:  # and not isWindows
        encoding = sys.getfilesystemencoding()  # 'mbcs' on Windows, 'utf-8' on Linux
        encoding = "utf-8" if not encoding else encoding
        return fn.encode(encoding)
    return fn
 
fn = u'\u0c0f\u0c2e\u0c02\u0c21\u0c40' + '.txt'   # Telugu language
with open(_encodeFileName(fn), "wb") as w:
    w.write("yaay!\n")   # the characters of the FILE NAME can not be represented in the encoding (squares/tofu)
    print "written: ", w.name
 
Thank you very much in advance!

Regards,
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a 
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


More information about the Tutor mailing list