ANN: Pytables 2.0 beta1 (hierarchical datasets) released

Francesc Altet faltet at carabos.com
Fri Mar 9 20:35:09 CET 2007


=========================
 Announcing PyTables 2.0b1
=========================

The PyTables development team is very happy to announce the public
availability of the first *beta* version of PyTables 2.0.  Starting with
this release, both the API and the file format have entered in the stage
of freezing (i.e. nothing will be modified unless something is clearly
*wrong*), so people can use it in order to begin with the migration of
their existing PyTables 1.x scripts as well as start enjoying the new
exciting features in this major version ;)

You can download a source package of the version 2.0b1 with
generated PDF and HTML docs from
http://www.pytables.org/download/preliminary/

[Starting with the beta phase, and in order to make the life easier for
Windows beta-testers, we are delivering binary versions for Win32.]

You can also get the latest sources from the Subversion repository at
http://pytables.org/svn/pytables/trunk/

If you are afraid of Subversion (you shouldn't), you can always download
the latest, daily updated, packed sources from
http://www.pytables.org/download/snapshot/

Please have in mind that some sections in the manual can be obsolete
(specially the "Optimization tips" chapter).  The tutorials chapter is
in the process of being updated as well.  The reference chapter should
be fairly up-to-date though.

You may also want to have an in-deep read of the ``RELEASE-NOTES.txt``
file where you will find an entire section devoted to how to migrate
your existing PyTables 1.x apps to the 2.0 version.  You can find an
HTML version of this document at
http://www.pytables.org/moin/ReleaseNotes/Release_2.0b1


Changes more in depth
=====================

Improvements:

- NumPy is finally at the core!  That means that PyTables no longer
  needs numarray in order to operate, although it continues to be
  supported (as well as Numeric).  This also means that you should be
  able to run PyTables in scenarios combining Python 2.5 and 64-bit
  platforms (these are a source of problems with numarray/Numeric
  because they don't support this combination yet).

- Most of the operations in PyTables have experimented noticeable
  speed-ups (sometimes up to 2x, like in regular Python table
  selections).  This is a consequence of both using NumPy internally and
  a considerable effort in terms of refactorization and optimization of
  the new code.

- Combined conditions are finally supported for in-kernel selections.
  So, now it is possible to perform complex selections like::

      result = [ row['var3'] for row in
                 table.where('(var2 < 20) | (var1 == "sas")') ]

  or::

      complex_cond = '((%s <= col5) & (col2 <= %s)) ' \
                     '| (sqrt(col1 + 3.1*col2 + col3*col4) > 3)'
      result = [ row['var3'] for row in
                 table.where(complex_cond % (inf, sup)) ]

  and run them at full C-speed (or perhaps more, due to the cache-tuned
  computing kernel of numexpr, which has been integrated into PyTables).

- Now, it is possible to get fields of the ``Row`` iterator by
  specifying their position, or even ranges of positions (extended
  slicing is supported).  For example, you can do::

      result = [ row[4] for row in table    # fetch field #4
                 if row[1] < 20 ]
      result = [ row[:] for row in table    # fetch all fields
                 if row['var2'] < 20 ]
      result = [ row[1::2] for row in       # fetch odd fields
                 table.iterrows(2, 3000, 3) ]

  in addition to the classical::

      result = [row['var3'] for row in table.where('var2 < 20')]

- ``Row`` has received a new method called ``fetch_all_fields()`` in
  order to easily retrieve all the fields of a row in situations like::

      [row.fetch_all_fields() for row in table.where('column1 < 0.3')]

  The difference between ``row[:]`` and ``row.fetch_all_fields()`` is
  that the former will return all the fields as a tuple, while the
  latter will return the fields in a NumPy void type and should be
  faster.  Choose whatever fits better to your needs.

- Now, all data that is read from disk is converted, if necessary, to
  the native byteorder of the hosting machine (before, this only
  happened with ``Table`` objects).  This should help to accelerate apps
  that have to do computations with data generated in platforms with a
  byteorder different than the user machine.

- The modification of values in ``*Array`` objects (through __setitem__)
  now doesn't make a copy of the value in the case that the shape of the
  value passed is the same as the slice to be overwritten. This results
  in considerable memory savings when you are modifying disk objects
  with big array values.

- All the leaf constructors have received a new parameter called
  ``byteorder`` that lets the user specify the byteorder of their data
  *on disk*.  This effectively allows to create datasets in other
  byteorders than the native platform.

- Native HDF5 datasets with ``H5T_ARRAY`` datatypes are fully supported
  for reading now.

- The test suites for the different packages are installed now, so you
  don't need a copy of the PyTables sources to run the tests.  Besides,
  you can run the test suite from the Python console by using::

  >>> tables.tests()

Bug fixes:

- As mentioned above, the fact that NumPy is at the core makes that
  certain bizarre interactions between numarray and NumPy scalars don't
  affect the behaviour of table selections anymore.  Fixes
  http://www.pytables.org/trac/ticket/29.

- Did I mention that PyTables 2.0 can be safely used in 64-bit platforms
  in combination with Python 2.5? ;)

Deprecated features:

- Not many, really.  Please see ``RELEASE-NOTES.txt`` file.

Backward-incompatible changes:

- Many.  Please see ``RELEASE-NOTES.txt`` file.


Important note for Windows users
================================

In order to keep PyTables happy, you will need to get the HDF5 library
compiled for MSVC 7.1, aka .NET 2003.  It can be found at
ftp://ftp.hdfgroup.org/HDF5/current/bin/windows/5-165-win-net.ZIP

Please remember that, from PyTables 2.0 on, Python 2.3 (and lesser) is
not supported anymore.


What is PyTables?
=================

**PyTables** is a package for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data (with
support for full 64-bit file addressing).  It features an
object-oriented interface that, combined with C extensions for the
performance-critical parts of the code, makes it a very easy-to-use tool
for high performance data storage and retrieval.

PyTables runs on top of the HDF5 library and NumPy package (but numarray
and Numeric are also supported) for achieving maximum throughput and
convenient use.

Besides, PyTables I/O for table objects is buffered, implemented in C
and carefully tuned so that you can reach much better performance with
PyTables than with your own home-grown wrappings to the HDF5 library.
PyTables Pro sports indexing capabilities as well, allowing selections
in tables exceeding one billion of rows in just seconds.


Platforms
=========

This version has been extensively checked on quite a few platforms, like
Linux on Intel32 (Pentium), Win on Intel32 (Pentium), Linux on Intel64
(Itanium2), FreeBSD on AMD64 (Opteron), Linux on PowerPC (and PowerPC64)
and MacOSX on PowerPC.  For other platforms, chances are that the code
can be easily compiled and run without further issues.  Please contact
us in case you are experiencing problems.


Resources
=========

Go to the PyTables web site for more details:

http://www.pytables.org

About the HDF5 library:

http://hdf.ncsa.uiuc.edu/HDF5/

About NumPy:

http://numpy.scipy.org/

To know more about the company behind the development of PyTables, see:

http://www.carabos.com/


Acknowledgments
===============

Thanks to various users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Many
thanks also to SourceForge who have helped to make and distribute this
package!


Share your experience
=====================

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


----

  **Enjoy data!**

  -- The PyTables Team

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"


More information about the Python-announce-list mailing list