On Wed, Nov 28, 2018 at 10:43 AM Gregory P. Smith <greg@krypto.org> wrote:

On Wed, Nov 28, 2018 at 9:52 AM Brett Cannon <brett@python.org> wrote:
Are we getting to the point that we want a compresslib like hashlib if we are going to be adding more compression algorithms?

Lets avoid the lib suffix when unnecessary.  I used the name hashlib because the name hash was already taken by a builtin that people normally shouldn't be using.  zlib gets a lib suffix because a one letter name is evil and it matches the project name. ;)  "compress" sounds nicer.

... looking on PyPI to see if that name is taken: https://pypi.org/project/compress/ exists and is already effectively what you are describing.  (never used it or seen it used, no idea about quality)

I don't think adding lz4 to the stdlib is worthwhile.  It isn't required for core functionality as zlib is (lowest common denominator zip support).  I'd argue that bz2 doesn't even belong in the stdlib, but we shouldn't go removing things.  PyPI makes getting more algorithms easy.

If anything, it'd be nice to standardize on some stdlib namespaces that others could plug their modules into.  Create a compress in the stdlib with zlib and bz2 in it, and a way for extension modules to add themselves in a managed manner instead of requiring a top level name?  Opening up a designated namespace to third party modules is not something we've done as a project in the past though.  It requires care.  I haven't thought that through.


While my gut reaction was to say "no" to adding lz4 to the stdlib above... 

I'm finding myself reconsidering and not against adding lz4 to the stdlib.

I just want us to have a good reason if we do. This type of extension module tends to be very easy to maintain (and you are volunteering). A good reason in the past has been the algorithm being widely used.  Obviously the case with zlib (gzip and zipfile), bz2, and lzma (.xz).  Those are all slower and tighter though.  lz4 is extremely fast, especially for decompression.  It could make a nice addition as that is an area our standard library offers nothing.

So change my -1 to a +0.5.

Q: Are there other popular alternatives to fill that niche that we should strongly consider instead or as well?

5 years ago the answer would've been Snappy.  15 years ago the answer would've been LZO.

I suggest not rabbit-holing this on whether we should adopt a top level namespace for these such as "compress".  A good question to ask, but we can resolve that larger topic on its own without blocking anything.

lz4 has claimed the global pypi lz4 module namespace today so moving it to the stdlib under that name is normal - A pretty transparent transition.  If we do that, the PyPI version of lz4 should remain for use on older CPython versions, but effectively be frozen, never to gain new features once lz4 has landed in its first actual CPython release.



On Wed, 28 Nov 2018 at 08:44, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 28 Nov 2018 10:28:19 +0000
Jonathan Underwood <jonathan.underwood@gmail.com> wrote:
> Hi,
> I have for sometime maintained the Python bindings to the LZ4
> compression library[0, 1]:
> I am wondering if there is interest in having these bindings move to
> the standard library to sit alongside the gzip, lzma etc bindings?
> Obviously the code would need to be modified to fit the coding
> guidelines etc.

Personally I would find it useful indeed.  LZ4 is very attractive
when (de)compression speed is a primary factor, for example when
sending data over a fast network link or a fast local SSD.

Another compressor worth including is Zstandard (by the same author as
LZ4). Actually, Zstandard and LZ4 cover most of the (speed /
compression ratio) range quite well. Informative graphs below:



Python-Dev mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
Python-Dev mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%40krypto.org