[Numpy-discussion] ANN: PyTables 2.2b1 ready for testing

Francesc Alted faltet at pytables.org
Tue Jun 23 14:52:01 EDT 2009


This is for inform you about the first beta release for PyTables 2.2.
You will find there some interesting new features, but no question
that the most appealing one is the new `tables.Expr` class.  You can
think about it as powerful evaluator for generic mathematical
expressions of NumPy arrays as well as disk-based datasets.

`tables.Expr` works like a sort of replacement of the `numpy.memmap`
module, but it has the next advantages over the latter:

* It can evaluate whatever Numexpr expression without need to take
  care of temporaries.  For example, it can compute expressions like:
  "a*b-1" or "(a*arctan2(b,c)*sqrt(d))**2-1" where 'a','b','c' and 'd'
  can be any PyTables homogeneous dataset or NumPy array, in an
  optimal way (i.e. avoiding temporaries and making an effective use
  of the computational resources of your machine).

* Contrarily to `numpy.memmap`, `tables.Expr` works for *arbitrarily*
  large datasets, no matter your platform is 32-bit or 64-bit or your
  available virtual memory: if your disk can keep your input and
  output datasets, you will be able to do your computations.

* In the PyTables tradition, it can make use of compression
  transparently, so even in the case that your datasets does not fit
  on-disk, there is still a chance that the compressed ones do.

Finally, and although in most of scenarios compression does actually
improve the speed of I/O, it is true that CPU is still the main
bottleneck when compressing/decompressing.  This is being addressed.

So, for those of you that need to work with datasets that defies your
computer capabilities, please give the `tables.Expr` a try and report
your experience.  I'll be glad to try to hear you back!

Keep reading for instructions on finding the new code and documentation.

 Announcing PyTables 2.2b1

PyTables is a library for managing hierarchical datasets and designed to
efficiently cope with extremely large amounts of data with support for
full 64-bit file addressing.  PyTables runs on top of the HDF5 library
and NumPy package for achieving maximum throughput and convenient use.

This is the first beta of the PyTables 2.2 series.  Here, you will
find support for NumPy's extended slicing in all `Leaf` objects as
well as an updated Numexpr module (to 1.3.1), which can lead to up a
25% improvement of the time for both in-kernel and indexed queries for
unaligned columns in tables (which can be a quite common situation).

But perhaps the most interesting feature is the introduction of the
`Expr` class, which allows evaluating expressions containing general
array-like objects.  It can evaluate expressions (like '3*a+4*b') that
operate on *arbitrary large* arrays while optimizing the resources
(basically main memory and CPU cache memory) required to perform them.
It works similarly to the Numexpr package, but in addition to NumPy
objects, it also accepts disk-based homogeneous arrays, like the
`Array`, `CArray`, `EArray` and `Column` PyTables objects.

You can find the documentation about the new `Expr` class at:

In case you want to know more in detail what has changed in this
version, have a look at:

You can download a source package with generated PDF and HTML docs, as
well as binaries for Windows, from:

For an on-line version of the manual, visit:


About PyTables:


About the HDF5 library:


About NumPy:



Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy (and numarray!)
makers.  Without them, PyTables simply would not exist.

Share your experience

Let us know of any bugs, suggestions, gripes, kudos, etc. you may


  **Enjoy data!**

  -- The PyTables Team

Francesc Alted

More information about the NumPy-Discussion mailing list