[Python-Dev] LZMA compression support in 3.3

Sat Aug 27 17:37:52 CEST 2011

On Sat, Aug 27, 2011 at 5:15 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> As for file formats, these are handled by liblzma itself; the extension module
>> just selects which compressor/decompressor initializer function to use depending
>> on the value of the "format" argument. Our code won't contain anything along the
>> lines of GzipFile; all of that work is done by the underlying C library. Rather,
>> the LZMAFile class will be like BZ2File - just a simple filter that passes the
>> read/written data through a LZMACompressor or LZMADecompressor as appropriate.
>
> This is exactly what I worry about. I think adding file I/O to bz2 was a
> mistake, as this doesn't integrate with Python's IO library (it used
> to, but now after dropping stdio, they were incompatible. Indeed, for
> Python 3.2, BZ2File has been removed from the C module, and lifted to
> Python.
>
> IOW, the _lzma C module must not do any I/O, neither directly nor
> indirectly (through liblzma). The approach of gzip.py (doing IO
> and file formats in pure Python) is exactly right.

It is not my intention for the _lzma C module to do I/O - that will be done by
the LZMAFile class, which will be written in Python. My comparison with bz2 was
in reference to the state of the module after it was rewritten for issue 5863.

Saying "anything along the lines of GzipFile" was a bad choice of wording; what
I meant is that the LZMAFile class won't handle the problem of picking apart the
.xz and .lzma container formats. That is handled by liblzma (operating entirely
on in-memory buffers). It will do _only_ I/O, in a similar fashion to
the BZ2File
class (as of changeset 2cb07a46f4b5, to avoid ambiguity ;) ).

Cheers,
Nadeem