[Python-ideas] Introduce some obvious way to encode and decode filenames from Python code
Sven Marnach
sven at marnach.net
Mon Jul 16 16:49:52 CEST 2012
Currently, there is no obvious way to encode a filename in the default
filesystem encoding. To pipe some filenames to the stdin of a
subprocess, I effectively used
encoded_name = file_name.encode(sys.getfilesystemencoding())
which mostly worked. There are cases where this fails, though: on
Linux with LANG=C and filenames that contain non-ASCII characters, for
example, or in any situation where the default filesystem encoding
can't decode a filename.
The correct way to do this seems to be something like
if sys.platform == "nt":
errors = "strict"
else:
errors = "surrogateescape"
encoded_name = file_name.encode(sys.getfilesystemencoding()
errors=errors)
I think there should be (1) some documentation on the issue and (2) a
more obvious way to do encode filenames.
1. The most useful reference I could find in the docs is
http://docs.python.org/dev/c-api/unicode.html#file-system-encoding
and there is a short paragraph at
http://docs.python.org/dev/library/os.html#file-names-command-line-arguments-and-environment-variables
The filename encoding applies to basically all Python library
functions (including built-ins like `open()`) and should probably
be documented at a more prominent spot. The "surrogateescape"
error handler isn't mentioned here
http://docs.python.org/dev/howto/unicode.html#unicode-filenames
2. There should be some way to access the C API functions for decoding
and encoding filenames from Python. I don't have a good idea how
to do this – maybe by adding a meta-encoding "filesystem", or by
adding functions to the standard library.
Did I miss something? Any thoughts?
Cheers,
Sven
More information about the Python-ideas
mailing list