[Tutor] toy program to find standard deviation of 2 columns of a sqlite3 database
Manprit Singh
manpritsinghece at gmail.com
Sun Jul 3 21:44:11 EDT 2022
Dear Sir,
In Pandas, handling an sql query is so simple as given below:
import sqlite3
import pandas as pd
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("create table test(i, j)")
ls = [(2, 4), (3, 5), (4, 2), (7, 9)]
cur.executemany("insert into test(i, j) values (?, ?)", ls)
pd.read_sql("select i, j from test", con).std(ddof=0)
will give the desired result:
i 1.870829
j 2.549510
dtype: float64
On Mon, Jul 4, 2022 at 2:13 AM Dennis Lee Bieber <wlfraed at ix.netcom.com>
wrote:
> On Sun, 3 Jul 2022 22:29:41 +0530, Manprit Singh
> <manpritsinghece at gmail.com> declaimed the following:
>
> >Now as sum is an aggregate function, same way population standard
> >deviation is also an aggregate function. We should be able to make a
> >user defined function
> >
>
> It is also superfluous: SQLite3 already has count(), sum() and even
> avg() built-in (though it lacks many of the bigger statistical computations
> -- variance, std. dev, covariance, correlation, linear regression -- that
> many of the bigger client/server RDBMs support).
>
> >
> >for getting mean.I am not getting the mechanism to subtract the mean
> >from each value of the column in the same step method or by any other
> >way
> >
> Note that the definition for creating aggregates includes
> something for
> number of arguments. Figure out how to specify multiple arguments and you
> might be able to have SQLite3 provide "current item" and "mean" (avg) to
> the step() method. I'm not going to take the time to experiment (for the
> most part, I'd consider it simpler to just grab the entire dataset from the
> database, and run the number crunching in Python, rather than the overhead
> of having SQLite3 invoke a Python "callback" method for each item, just to
> be able to have the SQLite3 return a single computed value.
>
>
> >Btw I would like to write a one liner to calculate population std
> >deviation of a list:
> >lst = [2, 5, 7, 9, 10]
> >mean = sum(lst)/len(lst)
> >std_dev = (sum((ele-mean)**2 for ele in lst)/len(lst))**0.5
> >print(std_dev)
>
> Literally, except for the imports, that is just...
>
> print(statistics.pstdev(lst))
>
> >>> import math as m
> >>> import statistics as s
> >>> lst = [2, 5, 7, 9, 10]
> >>> print(s.pstdev(lst))
> 2.870540018881465
> >>>
>
> Going up a level in complexity (IE -- not using the imported
> pstdev())
>
> >>> print("Population Std. Dev.: %s" % m.sqrt( s.mean( (ele - s.mean(lst))
> ** 2 for ele in lst)))
> Population Std. Dev.: 2.870540018881465
> >>>
>
> This has the problem that it invokes mean(lst) for each element,
> so
> may be slower for large data sets (that problem will also exist if you
> manage a multi-argument step() for SQLite3).
>
> Anytime you have
>
> sum(equation-with-elements-of-data) / len(data)
>
> you can replace it with just
>
> mean(equation...)
>
>
> --
> Wulfraed Dennis Lee Bieber AF6VN
> wlfraed at ix.netcom.com
> http://wlfraed.microdiversity.freeddns.org/
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list