# [CentralOH] OT: Statistician needed

Eric Floehr eric at intellovations.com
Mon May 16 18:55:36 EDT 2016

Neil,

I'm not a statistician at all so I don't know what I should even be asking.

Basically, I have some RMSE from a set of forecasts. These forecasts are
from representative locations. I want to know what the range of RMSE would
be (within some confidence factor) if I had taken the error at *all* the
locations.

So when I take a set of forecasts from 50 locations (say each state
capital, assuming that's representative) within the U.S. and get the RMSE
from those forecasts and locations. But there are lots more than 50
locations, so that set of 50 locations is only a sample of all the
forecasts for the U.S. So that set of 50 is only a sample RMSE, which is
likely *close* to the actual RMSE if I had taken *all* the locations.

So when comparing two forecast providers, each with an RMSE, I'm only
estimating each of those RMSEs for each provider. So that estimate isn't
the *true* error, and so if I have two providers, and I want to rank order
them, I want to have some level of confidence that the difference in RMSE
between them is statistically significant.

Does that make sense?
Eric

On Mon, May 16, 2016 at 5:35 PM, Neil Ludban <nludban at columbus.rr.com>
wrote:

> I'm not making any connections here...
>
> (a) Starting with 50 representative locations instead of all of them.
>
> (b) Wanting to estimate paramaters as if you had used them all.
>
> (c) For example, compare the error estimate of one of them by itself
> with another one of them by itself.
>
>
> On Mon, 16 May 2016 14:59:44 -0400
> Eric Floehr <eric at intellovations.com> wrote:
> > Hey all,
> >
> > I'm in need of some help with statistics, and if anyone has any thoughts
> on
> > this, or know someone who could do this, I would appreciate it greatly.
> >
> > I have a set of errors, normally distributed around 0 error (it's
> > temperature forecast error). You can assume that the sample of forecasts
> is
> > representative of the entire population (for example, taking 50 strategic
> > locations around the U.S. to represent all U.S. locations).
> >
> > I then calculate the mean absolute error, and the RMSE. These have some
> > positive value.
> >
> > What I would like to calculate on the MAE and RMSE is a confidence
> interval
> > that the population error is within given the sample MAE or RMSE and it's
> > related statistics (sample size, mean error, MAE, RMSE, standard
> deviation,
> > etc.).
> >
> > For example, let's say that one provider's RMSE is 3.18 (A) and another's
> > is 3.5 (B). I'd like to know with some confidence that there is (or
> isn't)
> > a difference between providers (i.e. that provider A confidently has
> lower
> > error than B).
> >
> > Currently, the way I'm doing it is using the normative inverse function
> in
> > Excel:
> >
> > Lower bound: NORMINV(0.005,RMSE,STDDEV_RMSE/SQRT(NUMBER_OF_SAMPLES))
> >
> > Upper bound: NORMINV(0.995,RMSE,STDDEV_RMSE/SQRT(NUMBER_OF_SAMPLES))
> >
> > as in section 9.18 of:
> >
> >
> > But I'm not at all convinced that I'm doing that right, or that it
> applies
> > in this situation.
> >
> > Thanks so much!
> > Eric
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centraloh
>
-------------- next part --------------
An HTML attachment was scrubbed...