[Numpy-discussion] ANN: python-blosc 1.0.2

Francesc Alted faltet at pytables.org
Thu Nov 4 09:58:55 EDT 2010


====================================================
 Announcing python-blosc 1.0.2
 A Python wrapper for the Blosc compression library
====================================================

What is it?
===========

Blosc (http://blosc.pytables.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc is a Python package that wraps it.

What is new?
============

Updated to Blosc 1.1.2.  Fixes some bugs when dealing with very small
buffers (typically smaller than specified typesizes).  Closes #1.

Basic Usage
===========

[Using IPython shell and a 2-core machine below]

# Create a binary string made of int (32-bit) elements
>>> import array
>>> a = array.array('i', range(10*1000*1000))
>>> bytes_array = a.tostring()

# Compress it
>>> import blosc
>>> bpacked = blosc.compress(bytes_array, typesize=a.itemsize)
>>> len(bytes_array) / len(bpacked)
110      # 110x compression ratio.  Not bad!
# Compression speed?
>>> timeit blosc.compress(bytes_array, typesize=a.itemsize)
100 loops, best of 3: 12.8 ms per loop
>>> len(bytes_array) / 0.0128 / (1024*1024*1024)
2.9103830456733704  # wow, compressing at ~ 3 GB/s, that's fast!

# Decompress it
>>> bytes_array2 = blosc.decompress(bpacked)
# Check whether our data have had a good trip
>>> bytes_array == bytes_array2
True    # yup, it seems so
# Decompression speed?
>>> timeit blosc.decompress(bpacked)
10 loops, best of 3: 21.3 ms per loop
>>> len(bytes_array) / 0.0213 / (1024*1024*1024)
1.7489625814375185  # decompressing at ~ 1.7 GB/s is pretty good too!

More examples showing other features (and using NumPy arrays) are
available on the python-blosc wiki page:

http://github.com/FrancescAlted/python-blosc/wiki

Documentation
=============

Please refer to docstrings.  Start by the main package:

>>> import blosc
>>> help(blosc)

and ask for more docstrings in the referenced functions.

Download sources
================

Go to:

http://github.com/FrancescAlted/python-blosc

and download the most recent release from here.

Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
details.

Mailing list
============

There is an official mailing list for Blosc at:

blosc at googlegroups.com
http://groups.google.es/group/blosc


----

  **Enjoy data!**

-- 
Francesc Alted



More information about the NumPy-Discussion mailing list