
On Fri, 14 Apr 2000, Tim Churches wrote:
Andy Dustman wrote:
Yes, but the problem with mysql_store_result() is the large amount of memory required to store the result set. Couldn't the user be responsible for predetermining the size of the array via a query such as "select count(*) from sometable where...." and then pass this value as a parameter to the executeNumPy() method? In MySQL at least such count(*) queries are resolved very quickly so such an approach wouldn't take twice the time. Then mysql_use_result() could be used to populate the initialised NumPy array with data row, so there so only ever one complete copy of the data in memory, and that copy is in the NumPy array.
After some more thought on this subject, and some poking around at NumPy, I came to the following conclusions: Since NumPy arrays are fixed-size, but otherwise sequences (in the multi-dimensional case, sequences of sequences), the best approach would be for the user to pass in a pre-sized array (i.e. from zeros(), and btw, the docstring for zeros is way wrong), and _mysql would simply access it through the Sequence object protocol, and update as many values as it could: If you passed a 100-row array, it would fill 100 rows or as many as were in the result set, whichever is less. Since this requires no special knowledge of NumPy, it could be a standard addition (no conditional compiliation required). This method (tentatively _mysql.fetch_rows_into_array(array)) would return the array argument as the result. IndexError would likely be raised if the array was too narrow (too many columns in result set). Probably this would not be a MySQLdb.Cursor method, but perhaps I can have a seperate module with a cursor subclass which returns NumPy arrays.
Question: Would it be adequate to put all columns returned into the array? If label columns need to be returned, this could pose a problem. They may have to be returned as a separate query. Or else non-numeric columns would be excluded and returned in a list of tuples (this would be harder).
Yes, more thought needed here - my initial thought was one NumPy array per column, particularly since NumPy arrays must be homogenous wrt data type. Each NumPy array could be named the same as the column from which it is derived.
Okay, I think I know what you mean here. You are wanting to return each column as a (vertical) vector, whereas I am thinking along the lines of returning the result set as a matrix. Is that correct? Since it appears you can efficiently slice out column vectors as a[:,n], is my idea acceptable? i.e.
a=Numeric.multiarray.zeros( (2,2),'d') a[1,1]=2 a[0,1]=-1 a[1,0]=-3 a array([[ 0., -1.], [-3., 2.]]) a[:,0] array([ 0., -3.]) a[:,1] array([-1., 2.])
-- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!"