Smoothing a discrete set of data
Paul Moore
gustav at morpheus.demon.co.uk
Tue Sep 10 21:35:43 CEST 2002
"Terry Reedy" <tjreedy at udel.edu> writes:
> "Paul Moore" <gustav at morpheus.demon.co.uk> wrote in message
> news:3csmdso5.fsf at morpheus.demon.co.uk...
> > I have a set of data, basically a histogram.
>
> Actually, the two examples you give are *not* histograms. A histogram
> is a discrete frequency graph: how many numbers fit in each bit. Note
> that binning numbers like this them tosses away any order and other
> covariate info. Unimodal histograms are usually smoothed by moment
> matching: mean, standard deviation, and possibly more.
Thanks for the clarification. As became increasingly obvious to me as
I read the various replies, I am not being adequately clear in my own
mind what I'm trying to do. Part of this is a result of not thinking
in the right terms (for some reason, I didn't see the question in
terms of stats, even though I was thinking of things like averaging
and regression...), and part is simply a matter of oversimplifying and
losing the key details of what I'm trying to do.
Given that this is basically statistics, it's ironic that I did maths
as my university degree, and while I didn't do a lot of stats in my
course, it was an area I had had a strong "amateur" interest in prior
to doing the degree course.
> > sampled IO rates on a machine
> > - I want to look for trends (is IO higher overnight or during the
> > day,
>
> If you take measurements (samples) every hour, for instance, you have
> a time series. There are many books on this subject alone.
Interesting - I'd heard of time series, but had never got far enough
into the subject to get a proper feel for what they covered...
> > etc) or fuel consumption for my car (do I use less fuel since I had
> > the service).
>
> This is a standard question with standard methods. If you ignore
> order and other covariates and group measurements as before and after,
> a t-test or signed rank test would be appropriate.
When you put it like that, it's so obvious!! I think I was so fixated
on making the picture look like the point I was trying to make (too
much time working with business graphics, where the graph is there to
make a point, rather than to explain the facts :-()
> > Normally, what I do with things like this is draw a graph, and try
> > to spot the trends "by eyeball". But it feels to me like I should
> > be able to write something which smooths the data out. I just
> > don't see how
>
> As a statistician, I am a fan of eyeballing raw data (with appropriate
> caveats about testing what you think you see) in addition to numerical
> analysis. However, each statistical procedure is aimed as answering a
> question (and many are based on some assumption about the data). So
> you need to better formulate what you want to know.
You're right. I think that applying some of the suggestions other
posters made (low-pass filtering the graph, for example) would help to
display some of the grosser trends better (I think that was another
aspect of what I was looking for, but I see now that it *is* a
separate aspect). Once I have that "feel", I need to get more
analytical about specifics.
Thanks for your suggestions, and for all the other posters' comments
as well. This has reawakened an old interest for me, and I look
forward to dusting off some of my old books and learning more...
Paul.
More information about the Python-list
mailing list