On Sun, 10 May 2020 00:34:43 -0000 "Steve Jorgensen" <stevej@stevej.name> wrote:
I believe the Python standard library should include a means of sanitizing a filesystem entry, and this should not be something requiring a 3rd party package.
I'm not disagreeing.
What I am envisioning is a function (presumably in `os.path` with a signature roughly like {{{ sanitizepart(name, permissive=False, mode=ESCAPE, system=None) }}}
When `permissive` is `False`, characters that are generally unsafe are rejected. When `permissive` is `True`, only path separator characters are rejected. Generally unsafe characters besides path separators would include things like a leading ".", any non-printing character, any wildcard, piping and redirection characters, etc.
Okay, now I'm disagreeing. ;-) I know what sanitize means (in English and in the technical sense I believe you intend here), but can you provide some context and actual use cases? Sanitize on input so that your application code doesn't "accidentally" spit out the contents of /etc/shadow? Sanitize on output so that your code doesn't produce syntactically broken links in an HTML document or weird results in an xterm? Sanitize in both directions for safe round tripping to a database server? All of those use cases potentially require separate handling, especially in terms of quoting and escaping. For another example, suppose I'm writing a command line utility on a POSIX system to compute a hash of the contents of a file. There's nothing wrong with ".profile" as a file name. Why are you rejecting leading "." characters? What about leading "-"s, or embedded "|"s? Yes, certain shells and shell commands can make them "difficult" to deal with in one way or another, but they're not "generally unsafe." A very, very, very long time ago, we wrote some software for a customer who liked to "editing" our data files to make minor corrections instead of using our software. Our solution was to use "illegal" filenames that the shell rejected, but that an application could access directly anyway. I guess the point is that "sanitize" can mean different things to different parts of a system. Dan -- “Atoms are not things.” – Werner Heisenberg Dan Sommers, http://www.tombstonezero.net/dan