[issue21872] LZMA library sometimes fails to decompress a file

Sat Jun 28 15:05:57 CEST 2014

Esa Peuha added the comment:

This code

import _lzma
with open('22h_ticks_bad.bi5', 'rb') as f:
    infile = f.read()
for i in range(8191, 8195):
    decompressor = _lzma.LZMADecompressor()
    first_out = decompressor.decompress(infile[:i])
    first_len = len(first_out)
    last_out = decompressor.decompress(infile[i:])
    last_len = len(last_out)
    print(i, first_len, first_len + last_len, decompressor.eof)

prints this

8191 36243 45480 True
8192 36251 45473 False
8193 36253 45475 False
8194 36260 45480 True

It seems to me that this is a subtle bug in liblzma; if the input stream to the incremental decompressor is broken at the wrong place, the internal state of the decompressor is corrupted. For this particular file, it happens when the break occurs after reading 8192 or 8193 bytes, and lzma.py happens to use a buffer of 8192 bytes. There is nothing wrong with the compressed file, since lzma.py decompresses it correctly if the buffer size is set to almost any other value.

----------
nosy: +Esa.Peuha

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21872>
_______________________________________