One specific pain point with zipfile is that if you zip a directory that contains the target zip file you end up trying to add the target file to itself which leads to a rapidly growing archive.
From: Python-ideas <python-ideas-bounces+gadgetsteve=live.co.uk@python.org>
On Behalf Of Robert Collins
Sent: 03 June 2019 21:12
To: Daniel Hillier <dhilliercode@gmail.com>
Cc: Python-Ideas <python-ideas@python.org>
Subject: Re: [Python-ideas] zipfile refactor and AES
This sounds like a valuable refactoring to me.
Is it API compatible with the current zipfile module docs?
On Mon, 3 Jun 2019, 20:23 Daniel Hillier, <dhilliercode@gmail.com> wrote:
Hi,
I've written a package that can read and write zip files encrypted with
Winzip's AES encryption scheme (https://github.com/danifus/pyzipper/). It is
based on Python's zipfile module which I refactored to facilitate adding the
AES code as subclasses of the classes defined in the zipfile module.
I would like to explore integrating some of the refactoring effort into Python
if they are wanted. While the AES specfic code probably isn't suitable for
integration due to its dependence on a crypto package, my hope is that the
refactor of the zipfile project may be beneficial and, in particular, help
development on other features of the zip spec. I'm happy to rework the
changes for inclusion in Python.
The general goals of the refactor were:
- Keep the AES implementation changes in a separate file so changes to
zipfile.py could potentially be merged into Python (the cryptically named
"minimal_pep8" branch contains minimal cosmetic changes to the code
unless the lines around it were changed. The master branch contains all the
changes in the minimal_pep8 plus pep8 and a few other changes).
- Add hooks for extending the way zipfile works to enable the addition of AES
encryption without having to duplicate most of the zipfile module. This
included adding hooks to:
- Select and call encrypt and decrypt methods.
- Read and write new "extra" data records in the central file directory
and local header records.
- Provide a mechanism to substitute ZipInfo, ZipExtFile and ZipWriteFile
classes used in a subclass of ZipFile to ease use of subclassed ZipInfo,
ZipExtFile or ZipWriteFile. This avoids having to rewrite large parts of
the zipfile module if we only want to change the behaviour of a small
part of one of those classes.
- Contain all code that reads the header, contents and tail of a file in the
archive to within ZipExtFile. Previously reading the header and some other
things were done in the ZipFile class before handing the rest of the
processing to ZipExtFile.
- Contain all code that writes the header, contents and tail of a file in the
archive to within ZipWriteFile. Previously reading the header and some other
things were done in the ZipFile class before handing the rest of the
processing to ZipExtFile.
- Move generation of local file header and central directory record content to
the ZipInfo class to be alongside the data that it is packing.
- Add comments to provide context from the zip spec. Replace explicit numbers
to variables with explanatory names or adding comments.
Concerns:
- The change set is not small. I've attempted to keep each commit in the
minimal_pep8 branch focused on a single change to simplify review and kept
the unrelated changes, like pep8, to a minimum. I'm happy to put in
the work to
get these changes into a patch for the Python project, if it is deemed useful.
- This could move the internals of zipfile towards a public API (maybe it
would be?) which brings additional complexity in managing future changes.
- Are the hooks inline with Python's coding style? Is there a different
approach to extensibility preferred within the Python project?
Let me know any improvements, suggestions or concerns to this refactoring
approach you may have. Happy for feedback even if it isn't about integrating the
code into Python.
Thanks,
Dan
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/