[AstroPy] SQL-like queries against astropy.table objects

Eric Jensen ejensen1 at swarthmore.edu
Sat Feb 8 09:00:32 EST 2014


Hi Brian,

I think that your use case is actually pretty straightforward in an astropy table, since you just want to pull out a subset of the table.  It seems that what's tricky in the stackoverflow example you linked to is the JOIN and GROUPBY parts of the query, not the selection.

You can easily select rows that meet multiple criteria, and use the resulting boolean array to create a new table with the subset of data you want.  Here's an example, not with your variables but just using a text file of interferometry complex visibilities I had around (U, V, W baseline lengths, real and imaginary parts of the visibility): 

In [32]:

from astropy.table import Table, Column
import numpy as np
In [33]:

t = Table.read('test_data.txt', format='ascii.commented_header')
In [34]:

t.colnames
Out[34]:
['U', 'V', 'W', 'Real', 'Imag', 'Weight', 'Freq']
In [35]:

# Select only entries with long baselines in U and W, and Real amplitudes > 1 Jy:
good_rows = np.logical_and(t['U'] > 100, t['V'] > 100, t['Real'] > 1)
In [42]:

# Make a new table from this selection, and see how many rows were in 
# new table vs. the original one:
new_table = t[good_rows]
new_table['U'].shape

Out[42]:
(120,)
In [41]:

t['U'].shape
Out[41]:
(4500,)

There might be even better ways to do it (and there is almost certainly a cleaner way to get the number of rows in a table than calling 'shape' on one column), but this should give you the basic idea of the sort of selection that it seem like you're looking for. 

There's also a table-joining function:

http://docs.astropy.org/en/latest/api/astropy.table.operations.join.html#astropy.table.operations.join

For more complex operations, others who are more familiar with both astropy and SQL can chime in, but thought I'd offer what (little) I know on the selection part. 

Hope this helps,

Eric



On Feb 7, 2014, at 11:22 PM, Brian Kloppenborg wrote:

> Greetings,
> 
> I have unfortunately fallen quite far behind on astropy development and 
> my knowledge of the finer features of many of Python's libraries is 
> limited because I do most of my work in C/C++.
> 
> I have a use case in which I have a small (< 100k row) amount of static 
> data which I need to both visualize and be able to easily manipulate 
> (grouping, inclusion/exclusion based upon some criteria, application of 
> some simple mathematics without altering the original data). I've used 
> SQL extensively in the past and the queries which I would run are a 
> perfect use case; however, visualizing the data with matplotlib (or 
> similar) might be more efficiently accomplished from an astropy.table 
> object. Is it possible to run an SQL-like query against an astropy.table 
> object? If not with SQL-like syntax, how else could this be done?
> 
> For example, say I have a table with the following columns (not the use 
> case, but an easy demo):
> JD, mag, mag_error
> 
> I could find all data points with an SNR > 100 within a date range by:
>     SELECT * from table WHERE (JD > value AND JD < value AND 
> mag/mag_err > 100);
> quite quickly. Conversely, implementing something similar using numpy 
> operations appears to be quite painful (e.g. 
> http://stackoverflow.com/questions/7169240/moving-large-sql-query-to-numpy) 
> unless this is already built-in to astropy.table objects.
> 
> Kind regards,
> Brian Kloppenborg
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20140208/c1aaa11b/attachment.html>


More information about the AstroPy mailing list