[Tutor] toy program to find standard deviation of 2 columns of a sqlite3 database

Peter Otten __peter__ at web.de
Tue Jul 5 03:20:41 EDT 2022


On 04/07/2022 19:14, Manprit Singh wrote:
> Dear Sir,
>
> Finally I came up with a solution which seems more good to me, rather than
> using the previous approach. In this solution I have used shortcut method
> for calculating the standard deviation.
>
> import sqlite3
>
> class StdDev:
>
>      def __init__(self):
>          self.cnt = 0
>          self.sumx = 0
>          self.sumsqrx = 0
>
>      def step(self, x):
>          self.cnt += 1
>          self.sumx += x
>          self.sumsqrx += x**2
>
>      def finalize(self):
>          return ((self.sumsqrx - self.sumx**2/self.cnt)/self.cnt)**0.5
>
> conn = sqlite3.connect(":memory:")
> cur = conn.cursor()
> cur.execute("create table table1(X1 int, X2 int)")
> ls = [(2, 5),
>        (3, 7),
>        (4, 2),
>        (5, 1),
>        (8, 6)]
> cur.executemany("insert into table1 values(?, ?)", ls)
> conn.commit()
>
> conn.create_aggregate("stdev", 1, StdDev)
> std_dev, = cur.execute("select stdev(X1), stdev(X2) from table1")
> print(std_dev)
> cur.close()
> conn.close()
>
>
> gives  output
>
> (2.0591260281974, 2.315167380558045)
>
> That's all.  This is what I was looking for .So what will be the best
> solution to this problem ? This one or the previous one posted by me ?

As always -- it depends. I believe the numerical error for the above
algorithm tends to be much higher than for the one used in the
statistics module. I'd have to google for the details, though, and I am
lazy enough to leave that up to you.

> The whole credit goes to Dennis lee bieber & avi.e.gross at gmail.com

I think I mentioned it first ;)


More information about the Tutor mailing list