Steve Jorgensen wrote:
Steve Jorgensen wrote:
Andrew Barnert wrote: On May 9, 2020, at 17:35, Steve Jorgensen stevej@stevej.name wrote: I believe the Python standard library should include a means of sanitizing a filesystem entry, and this should not be something requiring a 3rd party package. One of reasons I think this should be in the standard lib is because that provides a common, simple means for code reviewers and static analysis services such as Veracode to recognize that a value is sanitized in an accepted manner. This does seem like a good idea. People who do this themselves get it wrong all the time, occasionally with disastrous consequences, so if Python can solve that, that would be great. But, at least historically, this has been more complicated than what you’re suggesting here. For example, don’t you have to catch things like directories named “Con” or files whose 8.3 representation has “CON” as the 8 part? I don’t think you can hang an entire Windows system by abusing those anymore, but you can still produce filenames that some APIs, and some tools (possibly including Explorer, cmd, powershell, Cygwin, mingw/native shells, Python itself…) can’t access (or can only access if the user manually specified a .\ absolute path, or whatever). Yes. I am aware of some of the unsafe names in DOS and older Windows. As I mentioned in my other reply, there is a distinction between the ones that are merely invalid and those that are actually unsafe. In researching existing Linux tools just now, I was reminded that a leading dash is frequently unsafe because many tools will treat an argument starting with dash as an option argument. Is there an established algorithm/rule that lots of people in the industry trust that Python can just reference, instead of having to research or invent it? Because otherwise, we run the risk of making things worse instead of better. An excellent point! I just started digging into that and found references to detox and Glindra. Neither of those seems to be well maintained though. The documentation pages for Glindra no longer exist and detox is not in standard package repositories for CentOS later than 6 (and only in EPEL for that. Still digging. Extremely apropos to the question of what charters might be problematic and/or unsafe: https://dwheeler.com/essays/fixing-unix-linux-filenames.html
That article links to another by the same author that is specific to vulnerabilities caused by file names. https://dwheeler.com/secure-programs/Secure-Programs-HOWTO/file-names.html