[issue40301] zipfile module: new feature (two lines of code), useful for test, security and forensics

Sat Apr 18 10:35:44 EDT 2020

Massimo Sala <massimo.sala.71 at gmail.com> added the comment:

I choosed to use the internal variable *concat* because
- if I recollect correctly, it is calculated before successive routines;
- I didn't see your solution (!), there is a very nice computed variable in
front of my eyes.

Mmh
1) Reliability
Cannot be sure this always run with malformed files :
        for zinfo in zf.infolist():

We can try / except but we loose the computation.
If *concat* is already computed (unless completely damaged files), IMHO my
solution is better.

2) Performance
What are the performance for big files?
Are there file seeks due to traversing zf.infolist() ?

> Daniel wrote:
> the advantage is that it already works in python 2.7 so there is no need
to patch Python

Yes, indeed.

If I am right about the pros of my patch, I stand for it.

Many thanks for you attention.

On Sat, 18 Apr 2020 at 15:45, Daniel Hillier <report at bugs.python.org> wrote:

>
> Daniel Hillier <daniel.hillier at gmail.com> added the comment:
>
> Hi Massimo,
>
> Unless I'm missing something about your requirements, the advantage is that
> it already works in python 2.7 so there is no need to patch Python. Just
> bundle the above function with your analysis tool and you're good to go.
>
> Cheers,
> Dan
>
> On Sat, Apr 18, 2020 at 11:36 PM Massimo Sala <report at bugs.python.org>
> wrote:
>
> >
> > Massimo Sala <massimo.sala.71 at gmail.com> added the comment:
> >
> > Hi Daniel
> >
> > Could you please elaborate the advantages of your loop versus my two
> lines
> > of code?
> > I don't grasp...
> >
> > Thanks, Massimo
> >
> > On Sat, 18 Apr 2020 at 03:26, Daniel Hillier <report at bugs.python.org>
> > wrote:
> >
> > >
> > > Daniel Hillier <daniel.hillier at gmail.com> added the comment:
> > >
> > > Could something similar be achieved by looking for the earliest file
> > > header offset?
> > >
> > > def find_earliest_header_offset(zf):
> > >     earliest_offset = None
> > >     for zinfo in zf.infolist():
> > >         if earliest_offset is None:
> > >             earliest_offset = zinfo.header_offset
> > >         else:
> > >             earliest_offset = min(zinfo.header_offset, earliest_offset)
> > >     return earliest_offset
> > >
> > >
> > > You could also adapt this using
> > >
> > >     zinfo.compress_size + len(zinfo.FileHeader())
> > >
> > > to see if there were any sections inside the archive which were not
> > > referenced from the central directory. Not sure if zip files with
> > arbitrary
> > > bytes inside the archive would be valid everywhere, but I think they
> are
> > > using zipfile.
> > >
> > > You can also have zipped content inside an archive which has a valid
> > > fileheader but no reference from the central directory. Those entries
> are
> > > discoverable by implementations which process content serially from the
> > > start of the file but not implementations which rely on the central
> > > directory.
> > >
> > > ----------
> > > nosy: +dhillier
> > >
> > > _______________________________________
> > > Python tracker <report at bugs.python.org>
> > > <https://bugs.python.org/issue40301>
> > > _______________________________________
> > >
> >
> > ----------
> >
> > _______________________________________
> > Python tracker <report at bugs.python.org>
> > <https://bugs.python.org/issue40301>
> > _______________________________________
> >
>
> ----------
>
> _______________________________________
> Python tracker <report at bugs.python.org>
> <https://bugs.python.org/issue40301>
> _______________________________________
>

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40301>
_______________________________________