Problem with tarfile module to open *.tar.gz files - unreliable ?
Peter Otten
__peter__ at web.de
Fri Aug 20 13:55:54 EDT 2010
m_ahlenius wrote:
> I am using Python 2.6.5.
>
> Unfortunately I don't have other versions installed so its hard to
> test with a different version.
>
> As for the log compression, its a bit hard to test. Right now I may
> process 100+ of these logs per night, and will get maybe 5 which are
> reported as corrupt (typically a bad CRC) and 2 which it reported as a
> bad tar archive. This morning I checked each of the 7 reported
> problem files by manually opening them with "tar -xzvof" and they were
> all indeed corrupt. Sign.
So many corrupted files? I'd say you have to address the problem with your
infrastructure first.
> Unfortunately due to the nature of our business, I can't post the data
> files online, I hope you can understand. But I really appreciate your
> suggestions.
>
> The thing that gets me is that it seems to work just fine for most
> files, but then not others. Labeling normal files as corrupt hurts us
> as we then skip getting any log data from those files.
>
> appreciate all your help.
I've written an autocorruption script,
import sys
import subprocess
import tarfile
def process(source, dest, data):
for pos in range(len(data)):
for bit in range(8):
new_data = data[:pos] + chr(ord(data[pos]) ^ (1<<bit)) +
data[pos+1:]
assert len(data) == len(new_data)
out = open(dest, "w")
out.write(new_data)
out.close()
try:
t = tarfile.open(dest)
for f in t:
t.extractfile(f)
except Exception, e:
if 0 == subprocess.call(["tar", "-xf", dest]):
return pos, bit
if __name__ == "__main__":
source, dest = sys.argv[1:]
data = open(source).read()
print process(source, dest, data)
and I can indeed construct an archive that is rejected by tarfile, but not
by tar. My working hypothesis is that the python library is a bit stricter
in what it accepts...
Peter
More information about the Python-list
mailing list