Python libraries for log mining and event abstraction? (possibly OT)
felciano at gmail.com
Wed Jun 25 00:39:14 CEST 2008
I am trying to do some event abstraction to mine a set of HTTP logs.
We have a pretty clean stateless architecture with user IDs that
allows us to understand what is retrieved on each session, and should
allow us to detect the higher-order user activity from the logs.
Ideally I'd love a python toolkit that has abstracted this out into a
basic set of API calls or even a query language.
An simple example is: find all instances of a search request, followed
by a 2+ search requests with additional words in the search string,
and group these into a higher-order "Iterative Search Refinement"
event (i.e. the user got too many search results to start with, and is
adding additional words to narrow down the results). So what I need is
the ability to select temporally-related events out of the event
stream (e.g. find searches by the same user within 10 second of each
other), further filter based on additional criteria across these event
(e.g. select only search events where there are additional search
criteria relative to the previous search), and a way to annotate, roll-
up or otherwise group matching patterns into a higher-level event.
Some of these patterns may require non-trivial criteria / logic not
supported by COTS log analytics, which is why I'm trying a toolkit
approach that allows customization.
I've been hunting around Google and the usual open source sites for
something like this and haven't found anything (in python or
otherwise). This is surprising to me, as I would think many people
would benefit from something like this, so maybe I'm just describing
the problem wrong or using the wrong keywords. I'm posting this to
this group because it feels somewhat AI-ish (temporal event
abstraction, etc) and that therefore pythonistas may have experience
with (there seems to be a reasonably high correlation there). Further,
if I can't find anything I'm going to have to build it myself, and it
will be in python, so any pointers on elegant design patterns for how
to do this using pythonic functional programming would be appreciated.
Barring anything else I will start from itertools and work from
That said, I'm hoping to use an existing library rather than re-invent
the wheel. Any suggestions on where to look for something like this?
More information about the Python-list