parsing a file for analysis

Rita rmorgan466 at gmail.com
Sat Feb 26 00:45:20 EST 2011


I have a large text (4GB) which I am parsing.

I am reading the file to collect stats on certain items.

My approach has been simple,

for row in open(file):
  if "INFO" in row:
    line=row.split()
    user=line[0]
    host=line[1]
    __time=line[2]
    ...

I was wondering if there is a framework or a better algorithm to read such
as large file and collect it stats according to content. Also, are there any
libraries, data structures or functions which can be helpful? I was told
about 'collections' container.  Here are some stats I am trying to get:

*Number of unique users
*Break down each user's visit according to time, t0 to t1
*what user came from what host.
*what time had the most users?

(There are about 15 different things I want to query)

I understand most of these are redundant but it would be nice to have a
framework or even a object oriented way of doing this instead of loading it
into a database.


Any thoughts or ideas?




--- Get your facts first, then you can distort them as you please.--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110226/c4d43025/attachment.html>


More information about the Python-list mailing list