Announcing PyTables 0.2 ----------------------- What's new ----------- - Numerical Python arrays supported! - Much improved documentation - Programming API almost stable - Improved navegability across the object tree - Added more unit tests (there are almost 50) - Dropped HDF5_HL dependency (a tailored version is included in sources now) - License changed from LGPL to BSD What is ------- The goal of PyTables is to enable the end user to manipulate easily scientific data tables and Numerical Python objects (new in 0.2!) in a persistent hierarchical structure. The foundation of the underlying hierachical data organization is the excellent HDF5 library (http://hdf.ncsa.uiuc.edu/HDF5). Right now, PyTables provides limited support of all the HDF5 functions, but I hope to add the more interesting ones (for PyTables needs) in the near future. Nonetheless, this package is not intended to serve as a complete wrapper for the entire HDF5 API. A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and strict "data types" seems to be quite a strange requirement for an interpreted language like Python, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientifc applications, for example) in an efficient manner that reduces demand on CPU time and I/O. In order to emulate records (C structs in HDF5) in Python, PyTables implements a special metaclass that detects errors in field assignments as well as range overflows. PyTables also provides a powerful interface to process table data. Quite a bit effort has been invested to make browsing the hierarchical data structure a pleasant experience. PyTables implements just three (orthogonal) easy-to-use methods for browsing. What is HDF5? ------------- For those people who know nothing about HDF5, it is is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. How fast is it? --------------- Despite to be an alpha version and that there is lot of room for improvements (it's still CPU bounded!), PyTables can read and write tables quite fast. But, if you want some (very preliminary) figures (just to know orders of magnitude), in a AMD Athlon@900 it can currently read from 40000 up to 60000 records/s and write from 5000 up to 13000 records/s. Raw data speed in read mode ranges from 1 MB/s up to 2 MB/s, and it drops to the 200 KB/s - 600 KB/s range for writes. Go to http://pytables.sf.net/bench.html for a somewhat more detailed description of this small (and synthetic) benchmark. Anyway, this is only the beginning (premature optimization is the root of all evils, you know ;-). Platforms --------- I'm using Linux as the main development platform, but PyTables should be easy to compile/install on other UNIX machines. Thanks to Scott Prater, this package has passed all the tests on a UltraSparc platform with Solaris 7. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors and running IRIX 6.5. If you are using Windows and you get the library to work, please let me know. An example? ----------- At the bottom of this message there is some code (less that 100 lines and only less than half being real code) that shows basic capabilities of PyTables. Web site -------- Go to the PyTables web site for more details: http://pytables.sf.net/ Final note ---------- This is second alpha release, and probably last alpha, so it is still time if you want to suggest some API addition/change or addition/change of any useful missing capability. Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. -- Francesc Alted falted@openlc.org *-*-*-**-*-*-**-*-*-**-*-*- Small code example *-*-*-**-*-*-**-*-*-**-*-*-* """Small but almost complete example showing the PyTables mode of use. As a result of execution, a 'tutorial1.h5' file is created. You can look at it with whatever HDF5 generic utility, like h5ls, h5dump or h5view. """ import sys from Numeric import * from tables import * #'-**-**-**-**-**-**- user record definition -**-**-**-**-**-**-**-' # Define a user record to characterize some kind of particles class Particle(IsRecord): name = '16s' # 16-character String idnumber = 'Q' # unsigned long long (i.e. 64-bit integer) TDCcount = 'B' # unsigned byte ADCcount = 'H' # unsigned short integer grid_i = 'i' # integer grid_j = 'i' # integer pressure = 'f' # float (single-precision) energy = 'd' # double (double-precision) print print '-**-**-**-**-**-**- file creation -**-**-**-**-**-**-**-' # The name of our HDF5 filename filename = "tutorial1.h5" print "Creating file:", filename # Open a file in "w"rite mode h5file = openFile(filename, mode = "w", title = "Test file") print print '-**-**-**-**-**-**- group an table creation -**-**-**-**-**-**-**-' # Create a new group under "/" (root) group = h5file.createGroup("/", 'detector', 'Detector information') print "Group '/detector' created" # Create one table on it table = h5file.createTable(group, 'readout', Particle(), "Readout example") print "Table '/detector/readout' created" # Get a shortcut to the record object in table particle = table.record # Fill the table with 10 particles for i in xrange(10): # First, assign the values to the Particle record particle.name = 'Particle: %6d' % (i) particle.TDCcount = i % 256 particle.ADCcount = (i * 256) % (1 << 16) particle.grid_i = i particle.grid_j = 10 - i particle.pressure = float(i*i) particle.energy = float(particle.pressure ** 4) particle.idnumber = i * (2 ** 34) # This exceeds long integer range # Insert a new particle record table.appendAsRecord(particle) # Flush the buffers for table table.flush() print print '-**-**-**-**-**-**- table data reading & selection -**-**-**-**-**-' # Read actual data from table. We are interested in collecting pressure values # on entries where TDCcount field is greater than 3 and pressure less than 50 pressure = [ x.pressure for x in table.readAsRecords() if x.TDCcount > 3 and x.pressure < 50 ] print "Last record read:" print x print "Field pressure elements satisfying the cuts ==>", pressure # Read also the names with the same cuts names = [ x.name for x in table.readAsRecords() if x.TDCcount > 3 and x.pressure < 50 ] print print '-**-**-**-**-**-**- array object creation -**-**-**-**-**-**-**-' print "Creating a new group called '/columns' to hold new arrays" gcolumns = h5file.createGroup(h5file.root, "columns", "Pressure and Name") print "Creating a Numeric array called 'pressure' under '/columns' group" h5file.createArray(gcolumns, 'pressure', array(pressure), "Pressure column selection") print "Creating another Numeric array called 'name' under '/columns' group" h5file.createArray('/columns', 'name', array(names), "Name column selection") # Close the file h5file.close() print "File '"+filename+"' created"
participants (1)
-
Francesc Alted