PyTables 0.5 released
![](https://secure.gravatar.com/avatar/6c9110e0401b013d2324fbd6257dc80d.jpg?s=120&d=mm&r=g)
Announcing PyTables 0.5 ----------------------- This is the second public beta release. On this release you will find a 20% of I/O speed improvement over the previous one (0.4), some bugs has been fixed and support for a couple of compression (LZO and UCL) libraries has been added, and... a long awaited Windows version is finally available!. More in detail: What's new ----------- - As a consequence of some twiking the write/read performance has been improved by a 20% overall. One particular case were performance has largely increased (0.5 is up to 6 times faster than 0.4) is when column elements are unidimensional arrays. This impressive speed-up is mainly because of the recent improvements in numarray 0.5 performance (good work, folks!). With that, the reading speed is reaching its theoretical maximum (at least when using the current data access schema). - When reading a Table object, and the user wants to fetch column elements which are unidimensional arrays, a copy of the array from the I/O buffer is delivered automatically to him, so that there is no need to make a call to .copy() method of the numarray arrays anymore. It think this is more comfortable for the user. - The compression was enabled by default in version 0.4, despite of what was stated in the documentation. Now, this has been corrected and compression is *disabled* by default. - Support for two new compression libraries: LZO and UCL (http://www.oberhumer.com/opensource/). These libraries are made by Markus F.X.J. Oberhumer, and they stand for allowing *very* fast decompression. Now, if your data is compressible, you can obtain better reading speed than if not using compression at all!. The improvement is still more noticeable if your are dealing with extremely large (and compressible) data sets. Read the online documentation for more info about that: http://pytables.sourceforge.net/html-doc/usersguide-html3.html#subsection3.4... - A couple of memory leaks has been isolated and fixed (it was hard, but I finally did it!). - A bug with column ordering of tables that happens in some special situations has been fixed (thanks to Stan Heckman for reporting this and suggesting the patch). - File class has now an 'isopen' attribute in order to check if a file is open or not. - Updated documentation, specially for giving advice about the use of the new compression libraries. See "Compression issues" subsection, (also on the web: http://pytables.sourceforge.net/html-doc/usersguide-html.html) - Added more unit tests (up to 218 now!) - PyTables has been tested against newest numarray 0.5 and it works just fine. It even works well with Python 2.3b1. - And last, but not least, a Windows version is available!. Thanks to Alan McIntyre for its porting!. There is even a binary ready for click and install. What it is ---------- In short, PyTables provides a powerful and very Pythonic interface to process and organize your table and array data on disk. Its goal is to enable the end user to manipulate easily scientific data tables and Numerical and numarray Python objects in a persistent hierarchical structure. The foundation of the underlying hierarchical data organization is the excellent HDF5 library (http://hdf.ncsa.uiuc.edu/HDF5). A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and strict "data types" seems to be quite a strange requirement for an interpreted language like Python, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. Quite a bit effort has been invested to make browsing the hierarchical data structure a pleasant experience. PyTables implements just two (orthogonal) easy-to-use methods for browsing. What is HDF5? ------------- For those people who know nothing about HDF5, it is is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux as the main development platform, but PyTables should be easy to compile/install on other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors and running IRIX 6.5. With Windows, PyTables has been tested with Windows 2000 Professional SP1 and Windows XP, but it should also work with other flavors. An example? ----------- For online code examples, have a look at http://pytables.sourceforge.net/tut/tutorial1-1.html and http://pytables.sourceforge.net/tut/tutorial1-2.html Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Have fun! -- Francesc Alted
participants (1)
-
Francesc Alted