
Sorry, I notice I didn't answer your specific question:
Also, what would os.listdir() return ? Unicode strings or 8-bit strings ?
This would not change. This is what my testing shows: * I can switch to a German locale, and create a file using the keystrokes "`atest`o". The "`" is the dead-char so I get an umlaut over the first and last characters. * os.listdir() returns '\xe0test\xf2' for this file. * That same string can be passed to "open" etc to open the file. * The only way to get that string to a Unicode object is to use the encodings "Latin1" or "mbcs". Of them, "mbcs" would have to be safer, as at least it has a hope of handling non-latin characters :) So - assume I am passed a Unicode object that represents this filename. At the moment we simply throw that exception if we pass that Unicode object to open(). I am proposing that "mbcs" be used in this case instead of the default "ascii" If nothing else, my idea could be considered a "short-term" solution. If ever it is found to be a problem, we can simply move to the unicode APIs, and nothing would break - just possibly more things _would_ work :) Mark.