
I've been experimenting with pulling quantitative data out of a MySQL table into NumPy arrays via Andy Dustman's excellent MySQLdb module and then calculating various statistics from the data using Gary Strangman's excellent stats.py functions, which when operating on NumPy arrays are lightning-fast. The problem is the speed with which data can be extracted from a column of a MySQL (or any other SQL database) query result set and stuffed into a NumPy array. This inevitably involves forming a Python list and then assigning that to a NumPy array. This is both slow and memory-hungry, especially with large datsets (I have een playing with a few million rows). I was wondering if it would be feasible to initially add a method to the _mysql class in the MySQLdb module which iterated through a result set using a C routine (rather than a Python routine) and stuffed the data directly into a NumPy array (or arrays - one for each column in the result set) in one fell swoop (or even iterating row-by-row but in C)? I suspect that such a facility would be much faster than having to move the data into NumPy via a standard Python list (or actually via tuples within a list, which i sthe way the Python DB-API returns results). If this direct MySQL-to-NumPy interface worked well, it might be desirable to add it to the Python DB-API specification for optional implementation in the other database modules which conform to the API. There are probably other extensions which would make the DB-API more useful for statistical applications, which tend to be set (column)-oriented rather than row-oriented - will post to the list as these occur to me. Cheers, Tim Churches PS I will be away for the next week so apologies in advance for not replying immediately to any follow-ups to this posting. TC