[Tutor] toy program to find standard deviation of 2 columns of a sqlite3 database

Manprit Singh manpritsinghece at gmail.com
Sun Jul 3 09:01:20 EDT 2022


Sir,
I am just going through all the functionalities available in sqlite3 module
, just to see if I can use sqlite3 as a good data analysis tool or not .

Upto this point I have figured out that and sqlite data base file can be an
excellent replacement for data stored in files .

You can preserve data in a structured form, email to someone who need it
etc etc .

But for good data analysis ....I found pandas is superior . I use pandas
for data analysis and visualization .


Btw ....this is true . You should use right tool for your task .



Regards
Manprit Singh

On Sun, 3 Jul, 2022, 18:02 Alan Gauld via Tutor, <tutor at python.org> wrote:

> On 03/07/2022 03:49, Manprit Singh wrote:
>
> > con.create_aggregate("stddev", 1, StdDev)
> > cur.execute("select stddev(X1), stddev(X2) from table1")
>
> I just wanted to say thanks for posting this. I have never used,
> nor seen anyone else use, the ability to create a user defined aggregate
> function in SQLite - usually I just extract the data into python
> and use python to do the aggregation. But your question made me
> read up on how that all worked so it has taught me something new.
> (It also makes me appreciate how the Pyhon API is much easier
> to use than the raw C API to SQLite!)
>
> > My question is, as you can see i have used list inside the class StdDev,
> which
> > I think is an inefficient way to do this kind of problem because there
> may be
> > a large number of values in a column and it can take a huge amount of
> memory.
> > Can this problem be solved with the use of iterators ? What would be the
> best
> > approach to do it ?
>
> If I'm working with so much data that this would be a problem I'd
> use the database itself to store the intermediate data. That would
> be much slower but much less memory dependant. But as others have
> said, with aggregate functions you don't usually need to store
> data from all rows you just store a few inermediate results
> which you combine at the end.
>
> If you are trying to use an in-memory function - like the
> stddev function here - then you need to fit all the data in
> memory anyway so the function will simply not work if you can't
> store the data in RAM. In that case you need to find(or write)
> another function that doesn't use memory for storage or
> uses less storage.
>
> It is also worth pointing out that most industrial strength
> SQL databases come with a far richer set of aggregate functions
> than SQLite. So if you do have to work with large volumes of data
> you should probably switch to someting like Oracle, DB2, SQLServer(*),
> etc and just use the functions built into the server. If they
> don't have such a function they also have amuch simpler way
> of defining stored procedures. As ever, choose the appropriate
> tool for the job.
>
> (*)These are just the ones I know, I assume MySql, Postgres etc
> have similarly broad libraries.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list