Hi all, I'm posting this message to announce the availability of the *second alpha release of PyTables 2.0*, the new and shiny major version of PyTables. This release settles the file format used in this major version, removing the need to use pickled objects in order to store system attributes, so we expect that no more changes will happen to the on-disk format for future 2.0 releases. The storage and handling of group filters has also been streamlined. The new release also allows running the complete test suite from within Python, enables new tests and fixes some problems with test data installation, among other fixes. We expect to have the documentation revised and the API definitely settled very soon in order to release the first beta version. The official announcement follows. Enjoy data! :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data "" ---- =========================== Announcing PyTables 2.0a2 =========================== This is the second *alpha* version of PyTables 2.0. This release, although being fairly stable regarding its operativity, is tagged as alpha because the API can still change a bit (but hopefully not a great deal), so it is meant basically for developers and people who want to get a taste of the new exciting features in this major version. You can download a source package of the version 2.0a2 with generated PDF and HTML docs from http://www.pytables.org/download/preliminary/ You can also get the latest sources from the Subversion repository at http://pytables.org/svn/pytables/trunk/ If you are afraid of Subversion (you shouldn't), you can always download the latest, daily updated, packed sources from http://www.pytables.org/download/snapshot/ Please have in mind that some sections in the manual can be obsolete (specially the "Optimization tips" chapter). The reference chapter should be fairly up-to-date though. You may also want to have an in-deep read of the ``RELEASE-NOTES.txt`` file where you will find an entire section devoted to how to migrate your existing PyTables 1.x apps to the 2.0 version. You can find an HTML version of this document at http://www.pytables.org/moin/ReleaseNotes/Release_2.0a2 Changes more in depth ===================== Improvements: - NumPy is finally at the core! That means that PyTables no longer needs numarray in order to operate, although it continues to be supported (as well as Numeric). This also means that you should be able to run PyTables in scenarios combining Python 2.5 and 64-bit platforms (these are a source of problems with numarray/Numeric because they don't support this combination yet). - Most of the operations in PyTables have experimented noticeable speed-ups (sometimes up to 2x, like in regular Python table selections). This is a consequence of both using NumPy internally and a considerable effort in terms of refactorization and optimization of the new code. - Numexpr has been integrated in all in-kernel selections. So, now it is possible to perform complex selections like:: result = [ row['var3'] for row in table.where('(var2 < 20) | (var1 == "sas")') ] or:: complex_cond = '((%s <= col5) & (col2 <= %s)) ' \ '| (sqrt(col1 + 3.1*col2 + col3*col4) > 3)' result = [ row['var3'] for row in table.where(complex_cond % (inf, sup)) ] and run them at full C-speed (or perhaps more, due to the cache-tuned computing kernel of Numexpr). - Now, it is possible to get fields of the ``Row`` iterator by specifiying their position, or even ranges of positions (extended slicing is supported). For example, you can do:: result = [ row[4] for row in table # fetch field #4 if row[1] < 20 ] result = [ row[:] for row in table # fetch all fields if row['var2'] < 20 ] result = [ row[1::2] for row in # fetch odd fields table.iterrows(2, 3000, 3) ] in addition to the classical:: result = [row['var3'] for row in table.where('var2 < 20')] - ``Row`` has received a new method called ``fetch_all_fields()`` in order to easily retrieve all the fields of a row in situations like:: [row.fetch_all_fields() for row in table.where('column1 < 0.3')] The difference between ``row[:]`` and ``row.fetch_all_fields()`` is that the former will return all the fields as a tuple, while the latter will return the fields in a NumPy void type and should be faster. Choose whatever fits better to your needs. - Now, all data that is read from disk is converted, if necessary, to the native byteorder of the hosting machine (before, this only happened with ``Table`` objects). This should help to accelerate apps that have to do computations with data generated in platforms with a byteorder different than the user machine. - All the leaf constructors have received a new parameter called ``byteorder`` that lets the user specify the byteorder of their data *on disk*. This effectively allows to create datasets in other byteorders than the native platform. - Native HDF5 datasets with ``H5T_ARRAY`` datatypes are fully supported for reading now. - The test suites for the different packages are installed now, so you don't need a copy of the PyTables sources to run the tests. Besides, you can run the test suite from the Python console by using::
tables.tests()
Bug fixes: - As mentioned above, the fact that NumPy is at the core makes that certain bizarre interactions between numarray and NumPy scalars don't affect the behaviour of table selections anymore. Fixes http://www.pytables.org/trac/ticket/29. - Did I mention that PyTables 2.0 can be safely used in 64-bit platforms in combination with Python 2.5? ;) Deprecated features: - Not many, really. Please see ``RELEASE-NOTES.txt`` file. Backward-incompatible changes: - Many. Please see ``RELEASE-NOTES.txt`` file. Important note for Windows users ================================ In order to keep PyTables happy, you will need to get the HDF5 library compiled for MSVC 7.1, aka .NET 2003. It can be found at ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-165-win-net.ZIP Please remember that, from PyTables 2.0 on, Python 2.3 (and lesser) is not supported anymore. What is PyTables? ================= **PyTables** is a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data (with support for full 64-bit file addressing). It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code, makes it a very easy-to-use tool for high performance data storage and retrieval. PyTables runs on top of the HDF5 library and NumPy package (but numarray and Numeric are also supported) for achieving maximum throughput and convenient use. Besides, PyTables I/O for table objects is buffered, implemented in C and carefully tuned so that you can reach much better performance with PyTables than with your own home-grown wrappings to the HDF5 library. PyTables Pro sports indexing capabilities as well, allowing selections in tables exceeding one billion of rows in just seconds. Platforms ========= This version has been extensively checked on quite a few platforms, like Linux on Intel32 (Pentium), Win on Intel32 (Pentium), Linux on Intel64 (Itanium2), FreeBSD on AMD64 (Opteron), Linux on PowerPC (and PowerPC64) and MacOSX on PowerPC. For other platforms, chances are that the code can be easily compiled and run without further issues. Please contact us in case you are experiencing problems. Resources ========= Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdf.ncsa.uiuc.edu/HDF5/ About NumPy: http://numpy.scipy.org/ To know more about the company behind the development of PyTables, see: http://www.carabos.com/ Acknowledgments =============== Thanks to various users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Team
Hi everybody once again, We have done a new micro-release of the second alpha of PyTables 2.0, PyTables 2.0a2a. This fixes a missing import (thanks to Antonio Valentino and Steven H. Rogers for the information) and missing images in the HTML version of the manual in the 2.0a2 version released yesterday. We hope that the next release will be a beta one, and we encourage you to test it. Thank you! As usual, the released files are available at http://www.pytables.org/download/preliminary/ For more information on PyTables, visit http://www.pytables.org/ Cheers, :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data ""
participants (1)
-
Ivan Vilata i Balaguer