On Wed, Mar 11, 2020 at 7:19 AM Christopher Barker <pythonchb@gmail.com> wrote:
Getting a bit OT, but I *think* this is the story:
I've heard it argued, by folks that want to write Python software that uses bytes for filenames, that:
A file path on a *nix system can be any string of bytes, except two special values:
b'\x00' : null b'\x2f' : slash
(consistent with this SO post, among many other sources: https://unix.stackexchange.com/questions/39175/understanding-unix-file-name-...)
So any encoding will work, as long as those two values mean the right thing. Practically, null is always null, so that leaves the slash
So any encoding that uses b'\x2f' for the slash would work. Which seems to include, for instance, UTF-16:
In [31]: "/".encode('utf-16') Out[31]: b'\xff\xfe/\x00'
Nope, see above about b'\x00' :)
In practice, maybe knowing that it's ascii compatible in the first 127 bytes will get pretty far...
That's exactly what "ASCII compatible" means. Since ASCII is a seven-bit encoding, an encoding is ASCII-compatible if (a) every ASCII character is represented by the corresponding byte value, and (b) every seven-bit value represents that ASCII character. ChrisA