Smoothing a discrete set of data

Paul Moore gustav at morpheus.demon.co.uk
Tue Sep 10 15:35:43 EDT 2002


"Terry Reedy" <tjreedy at udel.edu> writes:

> "Paul Moore" <gustav at morpheus.demon.co.uk> wrote in message
> news:3csmdso5.fsf at morpheus.demon.co.uk...
> > I have a set of data, basically a histogram.
> 
> Actually, the two examples you give are *not* histograms.  A histogram
> is a discrete frequency graph: how many numbers fit in each bit.  Note
> that binning numbers like this them tosses away any order and other
> covariate info.  Unimodal histograms are usually smoothed by moment
> matching: mean, standard deviation, and possibly more.

Thanks for the clarification. As became increasingly obvious to me as
I read the various replies, I am not being adequately clear in my own
mind what I'm trying to do. Part of this is a result of not thinking
in the right terms (for some reason, I didn't see the question in
terms of stats, even though I was thinking of things like averaging
and regression...), and part is simply a matter of oversimplifying and
losing the key details of what I'm trying to do.

Given that this is basically statistics, it's ironic that I did maths
as my university degree, and while I didn't do a lot of stats in my
course, it was an area I had had a strong "amateur" interest in prior
to doing the degree course.

> > sampled IO rates on a machine
> > - I want to look for trends (is IO higher overnight or during the
> > day,
> 
> If you take measurements (samples) every hour, for instance, you have
> a time series.  There are many books on this subject alone.

Interesting - I'd heard of time series, but had never got far enough
into the subject to get a proper feel for what they covered...

> > etc) or fuel consumption for my car (do I use less fuel since I had
> > the service).
> 
> This is a standard question with standard methods.  If you ignore
> order and other covariates and group measurements as before and after,
> a t-test or signed rank test would be appropriate.

When you put it like that, it's so obvious!! I think I was so fixated
on making the picture look like the point I was trying to make (too
much time working with business graphics, where the graph is there to
make a point, rather than to explain the facts :-()

> > Normally, what I do with things like this is draw a graph, and try
> > to spot the trends "by eyeball". But it feels to me like I should
> > be able to write something which smooths the data out. I just
> > don't see how
> 
> As a statistician, I am a fan of eyeballing raw data (with appropriate
> caveats about testing what you think you see) in addition to numerical
> analysis.  However, each statistical procedure is aimed as answering a
> question (and many are based on some assumption about the data).  So
> you need to better formulate what you want to know.

You're right. I think that applying some of the suggestions other
posters made (low-pass filtering the graph, for example) would help to
display some of the grosser trends better (I think that was another
aspect of what I was looking for, but I see now that it *is* a
separate aspect). Once I have that "feel", I need to get more
analytical about specifics.

Thanks for your suggestions, and for all the other posters' comments
as well. This has reawakened an old interest for me, and I look
forward to dusting off some of my old books and learning more...

Paul.



More information about the Python-list mailing list