[Numpy-discussion] ANN: python-blosc 1.0.2
Francesc Alted
faltet at pytables.org
Thu Nov 4 09:58:55 EDT 2010
====================================================
Announcing python-blosc 1.0.2
A Python wrapper for the Blosc compression library
====================================================
What is it?
===========
Blosc (http://blosc.pytables.org) is a high performance compressor
optimized for binary data. It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.
Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.
python-blosc is a Python package that wraps it.
What is new?
============
Updated to Blosc 1.1.2. Fixes some bugs when dealing with very small
buffers (typically smaller than specified typesizes). Closes #1.
Basic Usage
===========
[Using IPython shell and a 2-core machine below]
# Create a binary string made of int (32-bit) elements
>>> import array
>>> a = array.array('i', range(10*1000*1000))
>>> bytes_array = a.tostring()
# Compress it
>>> import blosc
>>> bpacked = blosc.compress(bytes_array, typesize=a.itemsize)
>>> len(bytes_array) / len(bpacked)
110 # 110x compression ratio. Not bad!
# Compression speed?
>>> timeit blosc.compress(bytes_array, typesize=a.itemsize)
100 loops, best of 3: 12.8 ms per loop
>>> len(bytes_array) / 0.0128 / (1024*1024*1024)
2.9103830456733704 # wow, compressing at ~ 3 GB/s, that's fast!
# Decompress it
>>> bytes_array2 = blosc.decompress(bpacked)
# Check whether our data have had a good trip
>>> bytes_array == bytes_array2
True # yup, it seems so
# Decompression speed?
>>> timeit blosc.decompress(bpacked)
10 loops, best of 3: 21.3 ms per loop
>>> len(bytes_array) / 0.0213 / (1024*1024*1024)
1.7489625814375185 # decompressing at ~ 1.7 GB/s is pretty good too!
More examples showing other features (and using NumPy arrays) are
available on the python-blosc wiki page:
http://github.com/FrancescAlted/python-blosc/wiki
Documentation
=============
Please refer to docstrings. Start by the main package:
>>> import blosc
>>> help(blosc)
and ask for more docstrings in the referenced functions.
Download sources
================
Go to:
http://github.com/FrancescAlted/python-blosc
and download the most recent release from here.
Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
details.
Mailing list
============
There is an official mailing list for Blosc at:
blosc at googlegroups.com
http://groups.google.es/group/blosc
----
**Enjoy data!**
--
Francesc Alted
More information about the NumPy-Discussion
mailing list