Data mining/pattern recogniton software in Python?
Jon Clements
joncle at googlemail.com
Fri Mar 23 23:10:28 EDT 2012
On Friday, 23 March 2012 16:43:40 UTC, Grzegorz Staniak wrote:
> Hello,
>
> I've been asked by a colleague for help in a small educational
> project, which would involve the recognition of patterns in a live
> feed of data points (readings from a measuring appliance), and then
> a more general search for patterns on archival data. The language
> of preference is Python, since the lab uses software written in
> Python already. I can see there are packages like Open CV,
> scikit-learn, Orange that could perhaps be of use for the mining
> phase -- and even if they are slanted towards image pattern
> recognition, I think I'd be able to find an appropriate package
> for the timeseries analyses. But I'm wondering about the "live"
> phase -- what approach would you suggest? I wouldn't want to
> force an open door, perhaps there are already packages/modules that
> could be used to read data in a loop i.e. every 10 seconds,
> maintain a a buffer of 15 readings and ring a bell when the data
> in buffer form a specific pattern (a spike, a trough, whatever)?
>
> I'll be grateful for a push in the right direction. Thanks,
>
> GS
> --
> Grzegorz Staniak <gstaniak _at_ gmail [dot] com>
It might also be worth checking out pandas[1] and scikits.statsmodels[2].
In terms of reading data in a loop I would probably go for a producer-consumer model (possibly using a Queue[3]). Have the consumer constantly try to get another reading, and notify the consumer which can then determine if it's got enough data to calculate a peak/trough. This article is also a fairly good read[4].
That's some pointers anyway,
hth,
Jon.
[1] http://pandas.pydata.org/
[2] http://statsmodels.sourceforge.net/
[3] http://docs.python.org/library/queue.html
[4] http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/
More information about the Python-list
mailing list