[issue40763] zipfile.extractall is safe by now
New submission from Va <d.python.dc54@indigo.re>: In documentation of all Python 3 versions, [ZipFile.extractall](https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.extractall) states with a big red warning:
Warning Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of path, e.g. members that have absolute filenames starting with "/" or filenames with two dots "..". This module attempts to prevent that. See extract() note.
However, when looking at the implementation, it calls _extract_member() which seems to sanitize filenames. So the warning might not be relevant anymore. Furthermore, when looking at [Python 2](https://docs.python.org/2/library/zipfile.html#zipfile.ZipFile.extractall) documentation, we can see the same warning, along with a change note:
Changed in version 2.7.4: The zipfile module attempts to prevent that. See extract() note.
So, the big red warning in Python 3 documentation might be relevant only for Python < 2.7.4, not for any Python 3 version. ---------- assignee: docs@python components: Documentation messages: 369854 nosy: VA, docs@python priority: normal severity: normal status: open title: zipfile.extractall is safe by now type: behavior versions: Python 3.10, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40763> _______________________________________
Hi, On Mon, May 25, 2020 at 10:18 AM Va <report@bugs.python.org> wrote:
So, the big red warning in Python 3 documentation might be relevant only for Python < 2.7.4, not for any Python 3 version.
You may be on to something. It does appear to be what was discussed in msg181646 on issue6972. What I see is that from CPython 3.4 (https://docs.python.org/3.4/library/zipfile.html#zipfile.ZipFile.extractall) while the security warning is still there they add the following line in it:
This module attempts to prevent that. See extract() note.
The extract() note goes into some detail to explain what and how they attempt to prevent it. It is not obvious to me that zipfile._extract_member() together with (for windows) zipfile._sanitize_windows_name() have handled everything that could happen. May I suggest that out of caution we leave it as it is?
It is not obvious to me that zipfile._extract_member() together with (for windows) zipfile._sanitize_windows_name() have handled everything
Va <d.python.dc54@indigo.re> added the comment: that could happen. What hasn't been handled then? What is the safe way to use it? I think documenting "this function is unsafe" without suggesting a replacement or a safe way to use it isn't very constructive: as a developer, I want to extract a zip archive, but the only function supposed to do the job tells me "this is unsafe". Ok, so what am I supposed to do to be safe? That's what documentation should tell me, not let me puzzled with doubt.
May I suggest that out of caution we leave it as it is?
I don't think the situation should stay like this. - either the documentation should be more precise on what are the problems that can occur, and how to handle those problems - or better, the function should be fixed and made fully safe, so all programs using it are safe (and the warning can be removed) ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40763> _______________________________________
Ama Aje My Fren <amaajemyfren@gmail.com> added the comment: On Tue, May 26, 2020 at 2:47 PM Va <report@bugs.python.org> wrote:
What hasn't been handled then?
The rules for naming files in Windows is long (https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file). It is e.g. possible to create files under WSL within Windows that break these rules. In my case it was to add the colon (:) to a file name. In Python for windows this would fail because the underlying system API would stop it from happening (and in zip, it will be changed to an underscore (_)) - but it is unclear what would actually happen if you do so. In the old days just trying to open C:\Con\Con (which did not exist) caused a BSOD.
What is the safe way to use it?
The Security message suggests _with care_ - to wit - "Never extract archives from untrusted sources without prior inspection." There may be no absolutely safe way if the zipfile was crafted maliciously. Just like there are inherent vulnerabilities in using XML ... (https://docs.python.org/3/library/xml.html#xml-vulnerabilities). If a zipped file had a tree starting at C:\ and replaced a dll in C:\Windows (and was running as Admin), a lot of caveats I know, but it could be a problem.
I think documenting "this function is unsafe" without suggesting a replacement or a safe way to use it isn't very constructive: as a developer, I want to extract a zip archive, but the only function supposed to do the job tells me "this is unsafe". Ok, so what am I supposed to do to be safe?
Does it say that unzipping a file is unsafe? It looks to me like it says that in special conditions the extraction of a zipped file tree may be unsafe and it is important to use caution. It is the case in a lot of programming, is it not, that there are instances of security vulnerabilities entering ordinary looking code? It happens in sql (https://xkcd.com/327/) and many places within Python's Standard Library (https://hackernoon.com/10-common-security-gotchas-in-python-and-how-to-avoid...) even something as innocuous as using the new-style string format (https://lucumr.pocoo.org/2016/12/29/careful-with-str-format/).
That's what documentation should tell me, not let me puzzled with doubt.
This is an interesting point. What is the scope of Python Library Documentation? I disagree with your view on scope. In my view the Library Documentation should focus on what is exposed in the library for ordinary use. So e.g. implementation details may not be expected to be shown in the Documentation (like there is no documentation for zipfile._extract_member()). It does have a duty of care - especially to well known gotchas - but it is _not_ security documentation. I think (this is my view, it is not god given) that in many cases it is fair to assume that if one told a developer to be careful with her code it is enough in so far as library documentation is concerned. Thanks. ---------- title: zipfile.extractall is safe by now? -> zipfile.extractall is safe by now _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40763> _______________________________________
Change by Va <d.python.dc54@indigo.re>: ---------- components: +Library (Lib) title: zipfile.extractall is safe by now -> zipfile.extractall is safe by now? type: behavior -> security _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40763> _______________________________________
Gregory P. Smith <greg@krypto.org> added the comment: amaajemyfren is correct (and thanks for the pointers to the original issue and discussion). The warning remains out of caution. ---------- nosy: +gregory.p.smith resolution: -> not a bug stage: -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40763> _______________________________________
participants (4)
-
Ama Aje My Fren
-
Ama Aje My Fren
-
Gregory P. Smith
-
Va