Inconsistencies between zipfile and tarfile APIs

Corey Richardson kb1pkl at
Fri Jul 22 09:19:33 CEST 2011

Excerpts from rantingrick's message of Fri Jul 22 02:40:51 -0400 2011:
> On Jul 22, 12:45am, Terry Reedy <tjre... at> wrote:
> > On 7/22/2011 12:48 AM, rantingrick wrote:
> > > On Jul 21, 11:13 pm, Corey Richardson<kb1... at> wrote:
> > Hmm. Archives are more like directories than files. Windows, at least,
> > seems to partly treat zipfiles as more or less as such.
> Yes but a zipfile is just a file not a directory. This is not the
> first time Microsoft has "mislead" people you know. ;-)

Ehh...yes and no. Physically, it is a file and nothing more. But its actual
use and contents could reflect that of a directory. Are files and directories
that different, after all? I don't believe so. They are both an expression
of the same thing. Both contain data, one just contains others of itself.
Of course, treating a zipfile as a directory will certainly have a performance
cost. But here in Linux-land (and elsewhere I'm sure) I can mount, for example,
a disk image to a mountpoint anywhere. It's a useful thing to do!

> > Certainly, 7zip
> > present a directory interface. So opening a zipfile/tarfile would be
> > like opening a directory, which we normally do not do. On the other
> > hand, I am not sure I like python's interface to directories that much.
> I don't think we should make comparisons between applications and
> API's.

Ehh...yes and no again. Maybe the applications are on to something? Whether
the filesystem is physically on disk or is just a representation of a
filesystem on a file in a filesystem on disk, treating them both as a
filesystem is a useful abstraction (NOT the only one available?)

> > It would be more sensible to open files within the archives. Certainly,
> > it would be nice to have the result act like file objects as much as
> > possible.
> Well you still need to start at the treetop (which is the zip/tar
> file) because lots of important information is exposed at that level:
>  * compressed file listing
>  * created, modified times
>  * adding / deleting
>  * etc.
> I'll admit you could think of it as a directory but i would not want
> to do that. People need to realize that tar and zip files are FILES
> and NOT folders.

I think it's a useful abstraction to think if an archive as a directory.
They ARE files, yes. But must their physical representation impact their
semantics? I think not! It doesn't matter if Python's list object is a
linked-list down under or if it isn't. Or any sequence, for that matter!
It's a useful abstraction to treat them all as sequences, uniform interface
etc, even though one sequence might be a linked list in a C module, or
a row from a database, or whatever!

> > Seaching open issues for 'tarfile' or 'zipfile' returns about 40 issues
> > each. So I think some people would care more about fixing bugs than
> > adjusting the interfaces. Of course, some of the issues may be about the
> > interface and increasing consistency where it can be done without
> > compatibility issues.
> Yes i agree! If we can at least do something as meager as this it
> would be a step forward. However i still believe the current API is
> broken beyond repair so we must introduce a new "archive" module.
> That's my opinion anyway.

Checking if such a thing exists already may be more useful. I saw someone
mention a project similar?

> > However, I do not think there are any active
> > developers focued on those two modules.
> We need some fresh blood infused into Python-dev. I have been trying
> to get involved for a long time. We as a community need to realize
> that this community is NOT a homogeneous block. We need to be a little
> more accepting of new folks and new ideas. I know this language would
> evolve much quicker if we did.

> > > Rick: But what about Python 3000?
> > > PTB: " Oh, well, umm, lets see. Well that was then and this is now!
> >
> > The changes made for 3.0 were more than enough for some people to
> > discourage migration to Py3. And we *have* made additional changes
> > since. So the resistance to incompatible feature changes has increased.
> Yes i do understand these changes have been very painful for some
> folks (me included). However there is only but one constant in this
> universe and that constant is change. I believe we can improve many of
> these API's starting with zip/tar modules. By the time Python 4000
> gets here (and it will be much sooner than you guys realize!) we need
> to have this stdlib in pristine condition. That means:
>  * Removing style guide violations.
>  * Removing inconsistencies in existing API's.
>  * Making sure doc strings and comments are everywhere.
>  * Cleaning up the IDLE library (needs a complete re-write!)
>  * Cleaning up Tkinter.
>  * And more

All noble goals. I think the fact that everyone* knows that the stdlib is
a mess and not the epitome of Good Python is kinda sad...

* for some definition of "everyone"
Corey Richardson
  "Those who deny freedom to others, deserve it not for themselves"
     -- Abraham Lincoln
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <>

More information about the Python-list mailing list