Request for feedback on API design
Arnaud Delobelle
arnodel at gmail.com
Mon Dec 13 15:19:27 EST 2010
Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
> I am soliciting feedback regarding the API of my statistics module:
>
> http://code.google.com/p/pycalcstats/
>
>
> Specifically the following couple of issues:
>
> (1) Multivariate statistics such as covariance have two obvious APIs:
>
> A pass the X and Y values as two separate iterable arguments, e.g.:
> cov([1, 2, 3], [4, 5, 6])
>
> B pass the X and Y values as a single iterable of tuples, e.g.:
> cov([(1, 4), (2, 5), (3, 6)]
>
> I currently support both APIs. Do people prefer one, or the other, or
> both? If there is a clear preference for one over the other, I may drop
> support for the other.
>
I don't have an informed opinion on this.
> (2) Statistics text books often give formulae in terms of sums and
> differences such as
>
> Sxx = n*Σ(x**2) - (Σx)**2
Interestingly, your Sxx is closely related to the variance:
if x is a list of n numbers then
Sxx == (n**2)*var(x)
And more generally if x and y have the same length n, then Sxy (*) is
related to the covariance
Sxy == (n**2)*cov(x, y)
So if you have a variance and covariance function, it would be redundant
to include Sxx and Sxy. Another argument against including Sxx & co is
that their definition is not universally agreed upon. For example, I
have seen
Sxx = Σ(x**2) - (Σx)**2/n
HTH
--
Arnaud
(*) Here I take Sxy to be n*Σ(xy) - (Σx)(Σy), generalising from your
definition of Sxx.
More information about the Python-list
mailing list