[PyWart 1001] Inconsistencies between zipfile and tarfile APIs

rantingrick rantingrick at gmail.com
Thu Jul 21 23:46:05 EDT 2011


I may have found the mother of all inconsitency warts when comparing
the zipfile and tarfile modules. Not only are the API's different, but
the entry and exits are differnet AND zipfile/tarfile do not behave
like proper file objects should.

>>> import zipfile, tarfile
>>> import os
>>> os.path.exists('C:\\zip.zip')
True
>>> os.path.exists('C:\\tar.tar')
True
>>> tarfile.is_tarfile('C:\\tar.tar')
True
>>> zipfile.is_zipfile('C:\\zip.zip')
True
>>> ZIP_PATH = 'C:\\zip.zip'
>>> TAR_PATH = 'C:\\tar.tar'

--------------------------------------------------
1. Zipfile and tarfile entry exit.
--------------------------------------------------
>>> zf = zipfile.open(ZIP_PATH)

Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    zf = zipfile.open(ZIP_PATH)
AttributeError: 'module' object has no attribute 'open'
>>> tf = tarfile.open(TAR_PATH)
>>> tf
<tarfile.TarFile object at 0x02B3B850>
>>> tf.close()
>>> tf
<tarfile.TarFile object at 0x02B3B850>

*COMMENT*
As you can see, the tarfile modules exports an open function and
zipfile does not. Actually i would prefer that neither export an open
function and instead only expose a class for instantion.

*COMMENT*
Since a zipfile object is a file object then asking for the tf object
after the object after the file is closed should show a proper
message!

>>> tf = tarfile.TarFile(TAR_PATH)
Traceback (most recent call last):
  File "<pyshell#72>", line 1, in <module>
    tf = tarfile.TarFile(TAR_PATH)
  File "C:\Python27\lib\tarfile.py", line 1572, in __init__
    self.firstmember = self.next()
  File "C:\Python27\lib\tarfile.py", line 2335, in next
    raise ReadError(str(e))
ReadError: invalid header
>>> tf = tarfile.TarFile.open(TAR_PATH)
>>> tf
<tarfile.TarFile object at 0x02C251D0>
>>> tf.fp
Traceback (most recent call last):
  File "<pyshell#75>", line 1, in <module>
    tf.fp
AttributeError: 'TarFile' object has no attribute 'fp'
>>> tf
<tarfile.TarFile object at 0x02C251D0>
>>> tf.close()
>>> tf
<tarfile.TarFile object at 0x02C251D0>
>>> tf.fileobj
<bz2.BZ2File object at 0x02C24458>
>>> tf.closed
True

*COMMENT*
Tarfile is missing the attribute "fp" and instead exposes a boolean
"closed". This mismatching API is asinine! Both tarfile and zipfile
should behave EXACTLY like file objects

>>> f = open('C:\\text.txt', 'r')
>>> f.read()
''
>>> f
<open file 'C:\text.txt', mode 'r' at 0x02B26F98>
>>> f.close()
>>> f
<closed file 'C:\text.txt', mode 'r' at 0x02B26F98>

--------------------------------------------------
2. Zipfile SPECIFIC entry exit
--------------------------------------------------
>>> zf
<zipfile.ZipFile instance at 0x02B2C6E8>
>>> zf.fp
>>> zf = zipfile.ZipFile(ZIP_PATH)
>>> zf
<zipfile.ZipFile instance at 0x02B720A8>
>>> zf.fp
<open file 'C:\zip.zip', mode 'rb' at 0x02B26F98>
>>> zf.close()
>>> zf
<zipfile.ZipFile instance at 0x02B720A8>
>>> zf.fp
>>> print repr(zf.fp)
None

*COMMENT*
As you can see, unlike tarfile zipfile cannot handle a passed path.

--------------------------------------------------
 3. Zipfile and Tarfile obj API differences.
--------------------------------------------------

zf.namelist() -> tf.getnames()
zf.getinfo(name) -> tf.getmenber(name)
zf.infolist() -> tf.getmembers()
zf.printdir() -> tf.list()

*COMMENT*
Would it have been too difficult to make these names match? Really?

--------------------------------------------------
 4. Zipfile and Tarfile infoobj API differences.
--------------------------------------------------

zInfo.filename -> tInfo.name
zInfo.file_size -> tInfo.size
zInfo.date_time -> tInfo.mtime

*COMMENT*
Note the inconsistencies in naming conventions of the zipinfo methods.

*COMMENT*
Not only is modified time named different between zipinfo and tarinfo,
they even return completely different values of time.

--------------------------------------------------
 Conclusion:
--------------------------------------------------
It is very obvious that these modules need some consistency between
not only themselves but also collectively. People, when emulating a
file type always be sure to emulate the built-in python file type as
closely as possible.

PS: I will be posting more warts very soon. This stdlib is a gawd
awful mess!



More information about the Python-list mailing list