
Hello, Harry! I just noticed this thread. I opened a ticket for this a while back: https://twistedmatrix.com/trac/ticket/5203# FilePath.children() should return FilePath objects with unicodes in them instead of strs There is some discussion on that ticket. For what it is worth, I agree with Itamar that porting to Python3 shouldn't be combined with changing the functionality or API, but I also agree with Harry (at least what Harry originally said) that FilePath objects should not carry around a "path" that is just bytes and doesn't specify what encoding those bytes are in. I know this is a subtle topic, in the sense that I can see the argument on the other side, too, and I don't think either approach can satisfy all users, but I still think it is a better idea to require unicode-only, and so I'd like to try to explain why a little bit, below, in addition to the discussion that is recorded on #5203. Here's my basic argument: a sequence of bytes without an accompanying encoding is an *insufficiently typed* thing. That is, there is no way to use it safely without first restoring a type, and that being the *correct* type. The traditional way to handle pathnames in Linux has been to let them be under-typed, and then restore the type heuristically. This traditionally worked most of the time, because the most common thing you would do with a sequence of bytes like that is plug it back into the same filesystem from which it came. However, I make two claims: 1. In the modern world, it is very common to send it over the network instead of to plug it back into the same filesystem from which it came, and 2. there's not very much need for this "forget what type it was, guess the type later, and guess correctly" hack! We can instead *require* the user to supply a type with the bytestring originally, and then remember the type that the user supplied. This breaks only a few use cases that are probably very rare, and in fact might be unfixable anyway, but it prevents failures which are very common, which is what happens when you guess the wrong type during the restore. This is what we've done in Tahoe-LAFS, and we've had few or no complaints from users about it. Certainly if there were any, it was in the early days, of Tahoe-LAFS, around 5 years ago, when ill-typed Linux filesystems hadn't quite finished dying out (i.e. the bytes on there are actually encoded in iso8859, but sys.getfilesystemencoding() returns 'utf-8'). We wrote unit tests and did careful code-review when we converted Tahoe-LAFS from bytes to unicode-only a few years ago, and so I'd be happy to share the knowledge I gleaned from that experience. Regards, Zooko