Inconsistencies between zipfile and tarfile APIs

rantingrick rantingrick at
Fri Jul 22 13:11:21 EDT 2011

On Jul 22, 3:26 am, Lars Gustäbel <l... at> wrote:

> There is a reason why these two APIs are different. When I wrote tarfile
> zipfile had already been existing for maybe 8 years and I didn't like its
> interface very much. So, I came up with a different one for tarfile that in my
> opinion was more general and better suited the format and the kind of things I
> wanted to do with it. In the meantime the zipfile API got a lot of attention
> and some portions of tarfile's API were ported to zipfile.

Well i'll admit that i do like like the tarfile's API much better; so
kudos to you kind sir.

> > As you can see, the tarfile modules exports an open function and
> > zipfile does not. Actually i would prefer that neither export an open
> > function and instead only expose a class for instantion.
> So that is your preference.

WWrong! It is more that just a MERE preference. Tarfile and zipfile
are BOTH archive modules and as such should present a consistent API.
I really don't care so much about the actual details AS LONG AS THE

> > Since a zipfile object is a file object then asking for the tf object
> > after the object after the file is closed should show a proper
> > message!
> It is no file object.

Then why bother to open and close it like a file object? If we are not
going to treat it as a file object then we should not have API methods
open and close.

> > Tarfile is missing the attribute "fp" and instead exposes a boolean
> > "closed". This mismatching API is asinine! Both tarfile and zipfile
> > should behave EXACTLY like file objects
> If tarfile and zipfile
> objects behave "EXACTLY" like file objects, what does the read() method return?
> What does seek() do? And readline()?

I am not suggesting that these methods become available. What i was
referring to is the fact that the instance does not return its current
state like a true file object would. But just for academic sake we
could apply these three methods in the following manner:

 * read() -> extract the entire archive.
 * readline() -> extract the N'ith archive member.
 * seek() -> move to the N'ith archive member.

Not that i think we should however.

> What do you prove when you say that tarfile has no "fp" attribute?

My point is that the API's between tarfile and zipfile should be
consistent. "fp" is another example of inconsistency. If we are going
to have an "fp" method in one, we should have it in the other.

> > As you can see, unlike tarfile zipfile cannot handle a passed path.
> Hm, I don't know what you mean.

Sorry that comment was placed in the wrong position. I also eulogizer
for sending the message three times; it seems my finger was a little
shaky that night. What i was referring to is that tarfile does not
allow a path to be passed to the constructor whereas zipfile does:

 >>> import tarfile, zipfile
 >>> tf = tarfile.TarFile('c:\\tar.tar')
 Traceback (most recent call last):
   File "<pyshell#1>", line 1, in <module>
     tf = tarfile.TarFile('c:\\tar.tar')
   File "C:\Python27\lib\", line 1572, in __init__
     self.firstmember =
   File "C:\Python27\lib\", line 2335, in next
     raise ReadError(str(e))
 ReadError: invalid header
 >>> zf = zipfile.ZipFile('C:\\')
 >>> zf
 <zipfile.ZipFile instance at 0x02C6CE18>

> > zf.namelist() -> tf.getnames()
> > zf.getinfo(name) -> tf.getmenber(name)
> > zf.infolist() -> tf.getmembers()
> > zf.printdir() -> tf.list()
> > Would it have been too difficult to make these names match? Really?
> As I already stated above, I didn't want to adopt the zipfile API because I
> found it unsuitable. So I came up with an entirely new one. I thought that
> being incompatible was better than using an API that did not fit exactly.

I agree with you. Now if we can ONLY change the zipfile API to match
then we would be golden!

> > PS: I will be posting more warts very soon. This stdlib is a gawd
> > awful mess!
> I do not agree. Although I come across one or two odd things myself from time
> to time, I think the stdlib as a whole is great, usable and powerful.

And that's why we find ourselves in this current dilemma. This stdlib
IS a mess and yours and everyone else's denials about it is not
helping the situation.

> The stdlib surely needs our attention. Instead of answering your post, I should
> have been writing code and fixing bugs ...

Will you be starting with the zipfile API migration?

More information about the Python-list mailing list