[Numpy-discussion] Re: [GS-discuss] Re: NumPy, Python DB-API and MySQL

Tim Churches tchur at bigpond.com
Fri Apr 28 18:32:44 EDT 2000


Andy Dustman wrote:
[...snip...]
> 
> Okay, I think I know what you mean here. You are wanting to return each
> column as a (vertical) vector, whereas I am thinking along the lines of
> returning the result set as a matrix. Is that correct?

Yes, exactly.

>                                                        Since it appears
> you can efficiently slice out column vectors as a[:,n], is my idea
> acceptable? i.e.
> 
> >>> a=Numeric.multiarray.zeros( (2,2),'d')
> >>> a[1,1]=2
> >>> a[0,1]=-1
> >>> a[1,0]=-3
> >>> a
> array([[ 0., -1.],
>        [-3.,  2.]])
> >>> a[:,0]
> array([ 0., -3.])
> >>> a[:,1]
> array([-1.,  2.])

The only problem is that NumPy arrays must be homogeneous wrt type,
which means that, say, a categorical column containing just a few
distinct values stored as an integer would have to be upcast to a double
in the NumPy matrix if it was part of a query which also returned a
float. 

Would it be possible to extend your idea of passing in an array to the
query? Perhaps the user could pass in a list of pre-existing, pre-sized
sequence objects (which might be rank-1 NumPy arrays of various
appropriate data types or Python tuples) which correspond to the columns
which are to be returned by the SQL query. It would be up to the user to
determine the correct type for each NumPy array and to size the array or
tuples correctly. The reason for needing tuples as well as NumPy arrays
is that, as you mention, NumPy arrays only support numbers. The
intention would be for all of this to be wrapped in a class which may
issue a number of small queries to determine the number of rows to be
returned and the data types of the columns, so the user is shielded from
having to work out these details. The only bit that has to be written in
C is the function which takes the sequence of sequences (NumPy Arrays or
Python tuples) in which to store the query results, column-wise and
stuffs the value for each column for each row of the result set into the
appropriate passed-in sequence object. I would be more than happy to
assist with the Python code, testing and documentation but my C skills
aren't up to helping with the guts of it. In other words, making this
part of the low-level _mysql interface would be sufficient.

Cheers,

Tim C

> 
> --
> andy dustman       |     programmer/analyst     |      comstar.net, inc.
> telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d
> "Therefore, sweet knights, if you may doubt your strength or courage,
> come no further, for death awaits you all, with nasty, big, pointy teeth!"




More information about the NumPy-Discussion mailing list