ANN: PyTables 0.7.2 - A hierarchical database

Francesc Alted falted@openlc.org
Mon, 22 Sep 2003 19:41:49 +0200


Announcing PyTables 0.7.2
-------------------------

PyTables is a hierarchical database package designed to efficently
manage very large amounts of data. PyTables is built on top of the
HDF5 library and the numarray package. It features an object-oriented
interface that, combined with natural naming and C-code generated from
Pyrex sources, makes it a fast, yet extremely easy to use tool for
interactively save and retrieve large amounts of data. Besides, it
provides flexible indexed access on disk to anywhere in the data you
want to go.
 
On this release you will not find any exciting new features. It is
mainly a maintenance release where the next issues has been addressed:
       - a memory leak was fixed
       - memory needs is being lowered
       - much faster opening of files
       - done some important optimizations in table reads

More in detail:

What's new
-----------

        - Fixed a nasty memory leak located on the C libraries (it was
          happening during HDF5 attribute writes). After that, the
          memory consumption when using large object trees has dropped
          quite a bit. However, there remains some small leaks that
          has been tracked down to the underlying numarray
          library. These leaks has been reported, and hopefully they
          should be fixed more sooner than later.

        - Table buffers are built dinamically now, so if Tables are
	  not accessed for reading or writing this memory will not be
	  booked. This will help to reduce the memory consumption.

	- The opening of files with lots of nodes has been accelerated
	  between a factor 2 or 3. For example, a file with 10 groups
	  and 3000 tables that takes 9.3 seconds to open in 0.7.1, now
	  takes only 2.8 seconds.

	- The Table.read() method has been refactored and optimized
	  and some parts of its code has been moved to Pyrex. In
	  particular, in the special case of step=1, up to a factor 5
	  of speedup (reaching 160 MB/s on a Pentium4 @ 2 GHz) when
	  reading table contents can be achieved now.

	- Done some cosmetic changes in the user manual, but, as no
          new features has been added, you won't need to read the
          manual again :-)

What is a table?
----------------

A table is defined as a collection of records whose values are stored
in fixed-length fields. All records have the same structure and all
values in each field have the same data type.  The terms
"fixed-length" and "strict data types" seems to be quite a strange
requirement for an language like Python, that supports dynamic data
types, but they serve a useful function if the goal is to save very
large quantities of data (such as is generated by many scientific
applications, for example) in an efficient manner that reduces demand
on CPU time and I/O resources.

What is HDF5?
-------------

For those people who know nothing about HDF5, it is is a general
purpose library and file format for storing scientific data made at
NCSA. HDF5 can store two primary objects: datasets and groups. A
dataset is essentially a multidimensional array of data elements, and
a group is a structure for organizing objects in an HDF5 file. Using
these two basic constructs, one can create and store almost any kind of
scientific data structure, such as images, arrays of vectors, and
structured and unstructured grids. You can also mix and match them in
HDF5 files according to your needs.

Platforms
---------

I'm using Linux as the main development platform, but PyTables should
be easy to compile/install on other UNIX machines. This package has
also passed all the tests on a UltraSparc platform with Solaris 7 and
Solaris 8. It also compiles and passes all the tests on a SGI
Origin2000 with MIPS R12000 processors and running IRIX 6.5.

Regarding Windows platforms, PyTables has been tested with Windows
2000 and Windows XP, but it should also work with other flavors.

An example?
-----------

For online code examples, have a look at

http://pytables.sourceforge.net/tut/tutorial1-1.html

and 

http://pytables.sourceforge.net/tut/tutorial1-2.html

Web site
--------

Go to the PyTables web site for more details:

http://pytables.sourceforge.net/

Share your experience
---------------------

Let me know of any bugs, suggestions, gripes, kudos, etc. you may
have.

Have fun!

-- 
Francesc Alted