[CentralOH] OT: Statistician needed

Mon May 16 22:36:22 EDT 2016

On Mon, 16 May 2016 18:55:36 -0400
Eric Floehr <eric at intellovations.com> wrote:
> Neil,
> 
> I'm not a statistician at all so I don't know what I should even be asking.
> 
> Basically, I have some RMSE from a set of forecasts. These forecasts are
> from representative locations. I want to know what the range of RMSE would
> be (within some confidence factor) if I had taken the error at *all* the
> locations.
> 
> So when I take a set of forecasts from 50 locations (say each state
> capital, assuming that's representative) within the U.S. and get the RMSE
> from those forecasts and locations. But there are lots more than 50
> locations, so that set of 50 locations is only a sample of all the
> forecasts for the U.S. So that set of 50 is only a sample RMSE, which is
> likely *close* to the actual RMSE if I had taken *all* the locations.
> 
> So when comparing two forecast providers, each with an RMSE, I'm only
> estimating each of those RMSEs for each provider. So that estimate isn't
> the *true* error, and so if I have two providers, and I want to rank order
> them, I want to have some level of confidence that the difference in RMSE
> between them is statistically significant.
> 
> Does that make sense?
> Eric
> 

I'll assume you are calculating statistics for the same time period (eg,
the last 90 days) for each location independently, and collectively for
the 50 representative locations.  I would argue that all you can get
out of this is order of magnitude statistics -- the unmet requirement
is that all the input values (errors, in this case) are independent.  In
reality, everybody is sharing the same data that's input to a small number
of simulation programs and outputs fudged by a moderate number of
meteorologists.

What you could easily do is ask a different question: what percent of
this location's predictions came within a certain number of degrees of
the actual value?  If location A gets 75% within +/- 2 degrees, and B
gets only 40%, there's a significant difference.  I would start with the
standard deviation of the 50 locations as the initial tolerance.

Given a normal mean and stddev calculated using error as the raw data,
the percent of predictions between -tol and +tol is:

(NORMDIST(+tol, mean, stddev, TRUE)
 - NORMDIST(-tol, mean, stddev, TRUE)) * 100%