(Is it almost always better to just use a hash of the provided filename (maybe in a p/a/ir/tree234 implementation to avoid the max files in a directory limit of whichever filesystem) instead of the user-supplied filename string?) On Mon, May 11, 2020 at 4:48 PM Wes Turner <wes.turner@gmail.com> wrote:
FWIW, here are some of the CWE codes for related vulnerabilities/weaknesses in implementations:
CWE-73: External Control of File Name or Path https://cwe.mitre.org/data/definitions/73.html
CWE-707: Improper Neutralization https://cwe.mitre.org/data/definitions/707.html
CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal') https://cwe.mitre.org/data/definitions/22.html
Because this behavior of os.path.join is documented, it's not a vuln in Python, it's a vuln in every downstream component that (1) uses os.path.join with user supplied input; and that (2) doesn't strip a leading '/' from path parts before joining them with os.path.join.
[...] If a component is an absolute path, all previous components are
https://docs.python.org/3/library/os.path.html#os.path.join thrown away and joining continues from the absolute path component.
[quoting from "part 2"] What does sanitizepart do with a leading slash?
assert os.path.join("a", "/b") == "/b"
A new safejoin() or joinsafe() or join(safe='True') could call sanitizepart() such that:
assert joinsafe("a\n", "/b") == "a\\n/b"
On Sun, May 10, 2020 at 5:36 AM Steve Jorgensen <stevej@stevej.name> wrote:
Steve Jorgensen wrote:
Andrew Barnert wrote: On May 9, 2020, at 17:35, Steve Jorgensen stevej@stevej.name wrote: I believe the Python standard library should include a means of sanitizing a filesystem entry, and this should not be something requiring a 3rd party package. One of reasons I think this should be in the standard lib is because
common, simple means for code reviewers and static analysis services such as Veracode to recognize that a value is sanitized in an accepted manner. This does seem like a good idea. People who do this themselves get it wrong all the time, occasionally with disastrous consequences, so if Python can solve that, that would be great. But, at least historically, this has been more complicated than what you’re suggesting here. For example, don’t you have to catch things like directories named “Con” or files whose 8.3 representation has “CON” as the 8 part? I don’t think you can hang an entire Windows system by abusing those anymore, but you can still produce filenames that some APIs, and some tools (possibly including Explorer, cmd, powershell, Cygwin, mingw/native shells, Python itself…) can’t access (or can only access if the user manually specified a .\ absolute path, or whatever). Yes. I am aware of some of the unsafe names in DOS and older Windows. As I mentioned in my other reply, there is a distinction between the ones
invalid and those that are actually unsafe. In researching existing Linux tools just now, I was reminded that a leading dash is frequently unsafe because many tools will treat an argument starting with dash as an option argument. Is there an established algorithm/rule that lots of people in the industry trust that Python can just reference, instead of having to research or invent it? Because otherwise, we run the risk of making things worse instead of better. An excellent point! I just started digging into that and found references to detox and Glindra. Neither of those seems to be well maintained
pages for Glindra no longer exist and detox is not in standard
CentOS later than 6 (and only in EPEL for that. Still digging. Extremely apropos to the question of what charters might be
Steve Jorgensen wrote: that provides a that are merely though. The documentation package repositories for problematic
and/or unsafe: https://dwheeler.com/essays/fixing-unix-linux-filenames.html
That article links to another by the same author that is specific to vulnerabilities caused by file names. https://dwheeler.com/secure-programs/Secure-Programs-HOWTO/file-names.html _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FDZOXS... Code of Conduct: http://python.org/psf/codeofconduct/