![](https://secure.gravatar.com/avatar/81b3970c8247b2521d2f814de5b24475.jpg?s=120&d=mm&r=g)
Announcing PyTables 0.9 ----------------------- I'm pleased to announce the availability of the latest incarnation of PyTables. On this release you will find a series of quite exciting new features, being the most important the indexing capabilities, in-kernel selections, support for complex datatypes and the possibility to modify values in both tables *and* arrays (yeah, finally :). What is ------- PyTables is a hierarchical database package designed to efficiently manage extremely large amounts of data (supports full 64-bit file addressing). It features an object-oriented interface that, combined with C extensions for the peformance-critical parts of the code, makes it a very easy to use tool for high performance data saving and retrieving. It is built on top of the HDF5 library and the numarray package, and provides containers for both heterogeneous data (Tables) and homogeneous data (Array, EArray). It also sports a container for keeping lists of objects of variable length on a very efficient way (VLArray). A flexible support of filters allows you to compress your data on-the-flight by using different compressors and compression enablers. Moreover, its powerful browsing and searching capabilities allow you to do data selections over tables exceeding gigabytes of data in just tenths of second. Changes more in depth --------------------- New features: - Indexing of columns in tables. That allow to make data selections on tables up to 500 times faster than standard selections (for ex. doing a selection along an indexed column of 100 milion of rows takes less than 1 second on a modern CPU). Perhaps the most interesting thing about the indexing algorithm implemented by PyTables is that the time taken to index grows *lineraly* with the length of the data, so, making the indexation process to be *scalable* (quite differently to many relational databases). This means that it can index, in a relatively quick way, arbitrarily large table columns (for ex. indexing a column of 100 milion of rows takes just 100 seconds, i.e. at a rate of 1 Mrow/sec). See more detailed info about that in http://pytables.sourceforge.net/doc/SciPy04.pdf. - In-kernel selections. This feature allow to make data selections on tables up to 5 times faster than standard selections (i.e. pre-0.9 selections), without a need to create an index. As a hint of how fast these selections can be, they are up to 10 times faster than a traditional relational database. Again, see http://pytables.sourceforge.net/doc/SciPy04.pdf for some experiments on that matter. - Support of complex datatypes for all the data objects (i.e. Table, Array, EArray and VLArray). With that, the complete set of datatypes of Numeric and numarray packages are supported. Thanks to Tom Hedley for providing the patches for Array, EArray and VLArray objects, as well as updating the User's Manual and adding unit tests for the new functionality. - Modification of values. You can modifiy Table, Array, EArray and VLArray values. See Table.modifyRows, Table.modifyColumns() and the newly introduced __setitem__() method for Table, Array, EArray and VLArray entities in the Library Reference of User's Manual. - A new sub-package called "nodes" is there. On it, there will be included different modules to make more easy working with different entities (like images, files, ...). The first module that has been added to this sub-package is "FileNode", whose mission is to enable the creation of a database of nodes which can be used like regular opened files in Python. In other words, you can store a set of files in a PyTables database, and read and write it as you would do with any other file in Python. Thanks to Ivan Vilata i Balaguer for contributing this. Improvements: - New __len__(self) methods added in Arrays, Tables and Columns. This, in combination with __getitem__(self,key) allows to better emulate sequences. - Better capabilities to import generic HDF5 files. In particular, Table objects (in the HDF5_HL naming schema) with "holes" in their compound type definition are supported. That allows to read certain files produced by NASA (thanks to Stephen Walton for reporting this). - Much improved test units. More than 2000 different tests has been implemented which accounts for more than 13000 loc (this represents twice of the PyTables library code itself (!)). Backward-incompatible API changes: - The __call__ special method has been removed from objects File, Group, Table, Array, EArray and VLArray. Now, you should use walkNodes() in File and Group and iterrows in Table, Array, EArray and VLArray to get the same functionality. This would provide better compatibility with IPython as well. 'nctoh5', a new importing utility: - Jeff Whitaker has contributed a script to easily convert NetCDF files into HDF5 files using Scientific Python and PyTables. It has been included and documented as a new utility. Bug fixes: - A call to File.flush() now invoke a call to H5Fflush() so to effectively flushing all the file contents to disk. Thanks to Shack Toms for reporting this and providing a patch. - SF #1054683: Security hole in utils.checkNameValidity(). Reported in 2004-10-26 by ivilata - SF #1049297: Suggestion: new method File.delAttrNode(). Reported in 2004-10-18 by ivilata - SF #1049285: Leak in AttributeSet.__delattr__(). Reported in 2004-10-18 by ivilata - SF #1014298: Wrong method call in examples/tutorial1-2.py. Reported in 2004-08-23 by ivilata - SF #1013202: Cryptic error appending to EArray on RO file. Reported in 2004-08-21 by ivilata - SF #991715: Table.read(field="var1", flavor="List") fails. Reported in 2004-07-15 by falted - SF #988547: Wrong file type assumption in File.__new__. Reported in 2004-07-10 by ivilata Where PyTables can be applied? ------------------------------ PyTables is not designed to work as a relational database competitor, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems (DAS), simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), very large XML files, or for creating a centralized repository for system logs, to name only a few possible uses. What is a table? ---------------- A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seem to be quite a strange requirement for a language like Python that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. What is HDF5? ------------- For those people who know nothing about HDF5, it is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux (Intel 32-bit) as the main development platform, but PyTables should be easy to compile/install on many other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors, with the MIPSPro compiler and running IRIX 6.5. It also runs fine on Linux 64-bit platforms, like AMD Opteron running GNU/Linux 2.4.21 Server, Intel Itanium (IA64) running GNU/Linux 2.4.21 or PowerPC G5 with Linux 2.6.x in 64bit mode. It has also been tested in MacOSX platforms (10.2 but should also work on newer versions). Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP (using the Microsoft Visual C compiler), but it should also work with other flavors as well. Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ To know more about the company behind the PyTables development, see: http://www.carabos.com/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Bon profit! -- Francesc Altet
![](https://secure.gravatar.com/avatar/80473ff660f57aa7f90affadd2240008.jpg?s=120&d=mm&r=g)
On Fri, 2004-11-05 at 19:28 +0100, Francesc Altet wrote:
Announcing PyTables 0.9
Francesc, I hit a problem building PyTables 0.9 on FC2 which I didn't see at PyTables 0.8.3: it is attempting to link against the lzo libraries even though setup.py correctly detected I don't have them installed. A short excerpt from the build output: Found HDF5 libraries at /usr/lib Found HDF5 header files at /usr/include Optional lzo libraries or include files not found. Disabling support for them. Optional ucl libraries or include files not found. Disabling support for them. Found numarray 1.1 package installed running build running build_py creating build creating build/lib.linux-i686-2.3 creating build/lib.linux-i686-2.3/tables [...many lines deleted, including a lot of warnings about mismatched pointer types..] build/temp.linux-i686-2.3/src/H5TB-opt.o -lhdf5 -llzo -lucl -o build/lib.linux-i686-2.3/tables/hdf5Extension.so /usr/bin/ld: cannot find -llzo collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 -- Stephen Walton <stephen.walton@csun.edu> Physics & Astronomy CSUN
![](https://secure.gravatar.com/avatar/81b3970c8247b2521d2f814de5b24475.jpg?s=120&d=mm&r=g)
A Diumenge 07 Novembre 2004 21:03, Stephen Walton va escriure:
On Fri, 2004-11-05 at 19:28 +0100, Francesc Altet wrote: Francesc, I hit a problem building PyTables 0.9 on FC2 which I didn't see at PyTables 0.8.3: it is attempting to link against the lzo libraries even though setup.py correctly detected I don't have them installed. A short excerpt from the build output:
[...many lines deleted, including a lot of warnings about mismatched pointer types..] build/temp.linux-i686-2.3/src/H5TB-opt.o -lhdf5 -llzo -lucl -o build/lib.linux-i686-2.3/tables/hdf5Extension.so /usr/bin/ld: cannot find -llzo collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1
Ooops, my bad. I must recognize that I don't check frequently enough the pytables installation without lzo and ucl present. The cure is in CVS now. You can also apply this patch: --- ../exports/pytables-0.9/setup.py 2004-11-05 16:33:58.000000000 +0100 +++ setup.py 2004-11-08 11:23:21.000000000 +0100 @@ -94,6 +94,7 @@ else: if not incdir or not libdir: print "Optional %s libraries or include files not found. Disabling support for them." % (libname,) + return else: # Necessary to include code for optional libs def_macros.append(("HAVE_"+libname.upper()+"_LIB", 1)) Cheers, -- Francesc Altet
![](https://secure.gravatar.com/avatar/80473ff660f57aa7f90affadd2240008.jpg?s=120&d=mm&r=g)
On Mon, 2004-11-08 at 11:30 +0100, Francesc Altet wrote:
Ooops, my bad. I must recognize that I don't check frequently enough the pytables installation without lzo and ucl present.
The cure is in CVS now. You can also apply this patch:
Thanks again, Francesc. -- Stephen Walton <stephen.walton@csun.edu> Physics & Astronomy CSUN
participants (2)
-
Francesc Altet
-
Stephen Walton