[Python-ideas] Introduce some obvious way to encode and decode filenames from Python code

Sven Marnach sven at marnach.net
Mon Jul 16 16:49:52 CEST 2012

Currently, there is no obvious way to encode a filename in the default
filesystem encoding.  To pipe some filenames to the stdin of a
subprocess, I effectively used

    encoded_name = file_name.encode(sys.getfilesystemencoding())

which mostly worked.  There are cases where this fails, though: on
Linux with LANG=C and filenames that contain non-ASCII characters, for
example, or in any situation where the default filesystem encoding
can't decode a filename.

The correct way to do this seems to be something like

    if sys.platform == "nt":
        errors = "strict"
        errors = "surrogateescape"
    encoded_name = file_name.encode(sys.getfilesystemencoding()

I think there should be (1) some documentation on the issue and (2) a
more obvious way to do encode filenames.

1. The most useful reference I could find in the docs is


   and there is a short paragraph at


   The filename encoding applies to basically all Python library
   functions (including built-ins like `open()`) and should probably
   be documented at a more prominent spot.  The "surrogateescape"
   error handler isn't mentioned here


2. There should be some way to access the C API functions for decoding
   and encoding filenames from Python.  I don't have a good idea how
   to do this – maybe by adding a meta-encoding "filesystem", or by
   adding functions to the standard library.

Did I miss something?  Any thoughts?


More information about the Python-ideas mailing list