[Tutor] use gzip with large files
Kent Johnson
kent37 at tds.net
Tue Jul 19 18:51:36 CEST 2005
Adam Bark wrote:
> If you use something like this:
>
> for line in file.readlines():
This will read the whole file into a list of lines; the OP doesn't want to read the whole file at once.
> then line is a string to the next newline and it automatically detects
> the EOF and the same with file.readline() but that will give you one
> character at a time.
?? file.readline() reads by lines, not by characters.
A file is also an iterator (by lines) so the simplest way to iterate over lines in a file is just
for line in f:
# process line
Specifically frank should be able to use
file = gzip.GzipFile(logfile)
for line in file():
though 'file' is a bad choice of names as it is also a built-in.
To check for EOF when using readline() just look for an empty string:
while True:
line = f.readline()
if not line:
break
but 'for line in f' is preferable.
Kent
>
> On 7/19/05, *frank h.* <frank.hoffsummer at gmail.com
> <mailto:frank.hoffsummer at gmail.com>> wrote:
>
> hello all
> I am trying to write a script in python that parses a gzipped logfile
>
> the unzipped logfiles can be very large (>2GB)
>
> basically the statements
>
> file = gzip.GzipFile(logfile)
> data = file.read ()
>
> for line in data.striplines():
> ....
>
>
> would do what I want, but this is not feasible becasue the gzip files
> are so huge.
>
> So I do file.readline() in a for loop, but have no idea how long to
> continue, because I dont know how many lines the files contain. How do
> I check for end of file when using readline() ?
> simply put it in a while loop and enclose it with try: except: ?
>
> what would be the best (fastest) approach to deal with such large gzip
> files in python?
>
> thanks
> _______________________________________________
> Tutor maillist - Tutor at python.org <mailto:Tutor at python.org>
> http://mail.python.org/mailman/listinfo/tutor
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list