[PyWart 1001] Inconsistencies between zipfile and tarfile APIs

Lars Gustäbel lars at gustaebel.de
Fri Jul 22 10:26:11 CEST 2011

On Thu, Jul 21, 2011 at 08:46:05PM -0700, rantingrick wrote:
> I may have found the mother of all inconsitency warts when comparing
> the zipfile and tarfile modules. Not only are the API's different, but
> the entry and exits are differnet AND zipfile/tarfile do not behave
> like proper file objects should.

There is a reason why these two APIs are different. When I wrote tarfile
zipfile had already been existing for maybe 8 years and I didn't like its
interface very much. So, I came up with a different one for tarfile that in my
opinion was more general and better suited the format and the kind of things I
wanted to do with it. In the meantime the zipfile API got a lot of attention
and some portions of tarfile's API were ported to zipfile.

> As you can see, the tarfile modules exports an open function and
> zipfile does not. Actually i would prefer that neither export an open
> function and instead only expose a class for instantion.

So that is your preference.

> Since a zipfile object is a file object then asking for the tf object
> after the object after the file is closed should show a proper
> message!

It is no file object.

> Tarfile is missing the attribute "fp" and instead exposes a boolean
> "closed". This mismatching API is asinine! Both tarfile and zipfile
> should behave EXACTLY like file objects

No, they don't. Because they have not much in common with file objects. I am
not sure what you are trying to prove here. And although I must admit that you
have a point overall you seem to get the details wrong. If tarfile and zipfile
objects behave "EXACTLY" like file objects, what does the read() method return?
What does seek() do? And readline()?

What do you prove when you say that tarfile has no "fp" attribute? You're not
supposed to use the tarfile's internal file object, there is nothing productive
you could do with it.

> As you can see, unlike tarfile zipfile cannot handle a passed path.

Hm, I don't know what you mean.

> zf.namelist() -> tf.getnames()
> zf.getinfo(name) -> tf.getmenber(name)
> zf.infolist() -> tf.getmembers()
> zf.printdir() -> tf.list()
> Would it have been too difficult to make these names match? Really?

As I already stated above, I didn't want to adopt the zipfile API because I
found it unsuitable. So I came up with an entirely new one. I thought that
being incompatible was better than using an API that did not fit exactly.

> Note the inconsistencies in naming conventions of the zipinfo methods.
> Not only is modified time named different between zipinfo and tarinfo,
> they even return completely different values of time.

See above.

> It is very obvious that these modules need some consistency between
> not only themselves but also collectively. People, when emulating a
> file type always be sure to emulate the built-in python file type as
> closely as possible.

See above.

> PS: I will be posting more warts very soon. This stdlib is a gawd
> awful mess!

I do not agree. Although I come across one or two odd things myself from time
to time, I think the stdlib as a whole is great, usable and powerful.

The stdlib surely needs our attention. Instead of answering your post, I should
have been writing code and fixing bugs ...

Lars Gustäbel
lars at gustaebel.de

Seek simplicity, and distrust it.
(Alfred North Whitehead)

More information about the Python-list mailing list