[Tutor] toy program to find standard deviation of 2 columns of a sqlite3 database

Sun Jul 3 21:44:11 EDT 2022

Dear Sir,

In Pandas, handling an sql query is so simple as given below:
import sqlite3
import pandas as pd

con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("create table test(i, j)")
ls = [(2, 4), (3, 5), (4, 2), (7, 9)]
cur.executemany("insert into test(i, j) values (?, ?)", ls)

pd.read_sql("select i, j from test", con).std(ddof=0)

will give the desired result:

i    1.870829
j    2.549510
dtype: float64

On Mon, Jul 4, 2022 at 2:13 AM Dennis Lee Bieber <wlfraed at ix.netcom.com>
wrote:

> On Sun, 3 Jul 2022 22:29:41 +0530, Manprit Singh
> <manpritsinghece at gmail.com> declaimed the following:
>
> >Now as sum is an aggregate function, same way population standard
> >deviation is also an aggregate function. We should be able to make a
> >user defined function
> >
>
>         It is also superfluous: SQLite3 already has count(), sum() and even
> avg() built-in (though it lacks many of the bigger statistical computations
> -- variance, std. dev, covariance, correlation, linear regression -- that
> many of the bigger client/server RDBMs support).
>
> >
> >for getting mean.I am not getting the mechanism to subtract the mean
> >from each value of the column in the same step method or by any other
> >way
> >
>         Note that the definition for creating aggregates includes
> something for
> number of arguments. Figure out how to specify multiple arguments and you
> might be able to have SQLite3 provide "current item" and "mean" (avg) to
> the step() method. I'm not going to take the time to experiment (for the
> most part, I'd consider it simpler to just grab the entire dataset from the
> database, and run the number crunching in Python, rather than the overhead
> of having SQLite3 invoke a Python "callback" method for each item, just to
> be able to have the SQLite3 return a single computed value.
>
>
> >Btw I would like to write a one liner to calculate population std
> >deviation  of a list:
> >lst = [2, 5, 7, 9, 10]
> >mean = sum(lst)/len(lst)
> >std_dev = (sum((ele-mean)**2 for ele in lst)/len(lst))**0.5
> >print(std_dev)
>
>         Literally, except for the imports, that is just...
>
> print(statistics.pstdev(lst))
>
> >>> import math as m
> >>> import statistics as s
> >>> lst = [2, 5, 7, 9, 10]
> >>> print(s.pstdev(lst))
> 2.870540018881465
> >>>
>
>         Going up a level in complexity (IE -- not using the imported
> pstdev())
>
> >>> print("Population Std. Dev.: %s" % m.sqrt( s.mean( (ele - s.mean(lst))
> ** 2 for ele in lst)))
> Population Std. Dev.: 2.870540018881465
> >>>
>
>         This has the problem  that it invokes mean(lst) for each element,
> so
> may be slower for large data sets (that problem will also exist if you
> manage a multi-argument step() for SQLite3).
>
>         Anytime you have
>
>                 sum(equation-with-elements-of-data) / len(data)
>
> you can replace it with just
>
>                 mean(equation...)
>
>
> --
>         Wulfraed                 Dennis Lee Bieber         AF6VN
>         wlfraed at ix.netcom.com
> http://wlfraed.microdiversity.freeddns.org/
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>