[Tutor] toy program to find standard deviation of 2 columns of a sqlite3 database
Alan Gauld
alan.gauld at yahoo.co.uk
Sun Jul 3 08:30:51 EDT 2022
On 03/07/2022 03:49, Manprit Singh wrote:
> con.create_aggregate("stddev", 1, StdDev)
> cur.execute("select stddev(X1), stddev(X2) from table1")
I just wanted to say thanks for posting this. I have never used,
nor seen anyone else use, the ability to create a user defined aggregate
function in SQLite - usually I just extract the data into python
and use python to do the aggregation. But your question made me
read up on how that all worked so it has taught me something new.
(It also makes me appreciate how the Pyhon API is much easier
to use than the raw C API to SQLite!)
> My question is, as you can see i have used list inside the class StdDev, which
> I think is an inefficient way to do this kind of problem because there may be
> a large number of values in a column and it can take a huge amount of memory.
> Can this problem be solved with the use of iterators ? What would be the best
> approach to do it ?
If I'm working with so much data that this would be a problem I'd
use the database itself to store the intermediate data. That would
be much slower but much less memory dependant. But as others have
said, with aggregate functions you don't usually need to store
data from all rows you just store a few inermediate results
which you combine at the end.
If you are trying to use an in-memory function - like the
stddev function here - then you need to fit all the data in
memory anyway so the function will simply not work if you can't
store the data in RAM. In that case you need to find(or write)
another function that doesn't use memory for storage or
uses less storage.
It is also worth pointing out that most industrial strength
SQL databases come with a far richer set of aggregate functions
than SQLite. So if you do have to work with large volumes of data
you should probably switch to someting like Oracle, DB2, SQLServer(*),
etc and just use the functions built into the server. If they
don't have such a function they also have amuch simpler way
of defining stored procedures. As ever, choose the appropriate
tool for the job.
(*)These are just the ones I know, I assume MySql, Postgres etc
have similarly broad libraries.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
More information about the Tutor
mailing list