[Python-ideas] Introduce some obvious way to encode and decode filenames from Python code

Sven Marnach sven at marnach.net
Mon Jul 16 16:49:52 CEST 2012


Currently, there is no obvious way to encode a filename in the default
filesystem encoding.  To pipe some filenames to the stdin of a
subprocess, I effectively used

    encoded_name = file_name.encode(sys.getfilesystemencoding())

which mostly worked.  There are cases where this fails, though: on
Linux with LANG=C and filenames that contain non-ASCII characters, for
example, or in any situation where the default filesystem encoding
can't decode a filename.

The correct way to do this seems to be something like

    if sys.platform == "nt":
        errors = "strict"
    else:
        errors = "surrogateescape"
    encoded_name = file_name.encode(sys.getfilesystemencoding()
                                    errors=errors)

I think there should be (1) some documentation on the issue and (2) a
more obvious way to do encode filenames.

1. The most useful reference I could find in the docs is

       http://docs.python.org/dev/c-api/unicode.html#file-system-encoding

   and there is a short paragraph at

       http://docs.python.org/dev/library/os.html#file-names-command-line-arguments-and-environment-variables

   The filename encoding applies to basically all Python library
   functions (including built-ins like `open()`) and should probably
   be documented at a more prominent spot.  The "surrogateescape"
   error handler isn't mentioned here

       http://docs.python.org/dev/howto/unicode.html#unicode-filenames

2. There should be some way to access the C API functions for decoding
   and encoding filenames from Python.  I don't have a good idea how
   to do this – maybe by adding a meta-encoding "filesystem", or by
   adding functions to the standard library.

Did I miss something?  Any thoughts?

Cheers,
    Sven



More information about the Python-ideas mailing list