parsing a file for analysis
Andrea Crotti
andrea.crotti.0 at gmail.com
Sat Feb 26 10:29:54 EST 2011
Il giorno 26/feb/2011, alle ore 06.45, Rita ha scritto:
> I have a large text (4GB) which I am parsing.
>
> I am reading the file to collect stats on certain items.
>
> My approach has been simple,
>
> for row in open(file):
> if "INFO" in row:
> line=row.split()
> user=line[0]
> host=line[1]
> __time=line[2]
> ...
>
> I was wondering if there is a framework or a better algorithm to read such as large file and collect it stats according to content. Also, are there any libraries, data structures or functions which can be helpful? I was told about 'collections' container. Here are some stats I am trying to get:
>
> *Number of unique users
> *Break down each user's visit according to time, t0 to t1
> *what user came from what host.
> *what time had the most users?
>
> (There are about 15 different things I want to query)
>
> I understand most of these are redundant but it would be nice to have a framework or even a object oriented way of doing this instead of loading it into a database.
>
>
> Any thoughts or ideas?
Not an expert, but maybe it might be good to push the data into a database, and then you can tweak the DBMS and write
smart queries to get all the statistics you want from it.
It might take a while (maybe with regexp splitting is faster) but it's done only once and then you work with DB tools.
More information about the Python-list
mailing list