Andrew Barnert wrote:
On May 11, 2020, at 00:40, Steve Jorgensen stevej@stevej.name wrote:
Proposal: Add a new function (possibly os.path.sanitizepart) to sanitize a value for use as a single component of a path. In the default case, the value must also not be a reference to the current or parent directory ("." or "..") and must not contain control characters. <snip> If not: the result can contain the path separator, illegal characters that aren’t control characters, nonprinting characters that aren’t control characters, and characters whose bytes (in the filesystem’s encoding) are ASCII control characters? And it can be a reserved name, or even something like C:; as long as it’s not the Unix . or ..?
Are there non-printing characters outside of those in the Unicode general category of "C" that make sense to omit? There are combining characters and such that do not have glyphs but are visible in the sense that they modify the glyphs displayed for the characters that they combine with. Regarding names like "C:", you are absolutely right to point that out. When the platform is Windows, certainly, "<letter>:" should not be allowed, and perhaps colon should not be allowed at all. I'll need to research that a bit. This matters because if the path part is used without explicit "./" prefixed to it, then it will refer to a root path, so same problem as allowing a name starting with "/" in *NIX. That should be unconditionally disallowed in the case of WIN or GENERAL systems.