[Tutor] manipulating a file

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Mon Feb 7 08:37:45 CET 2005



On Mon, 7 Feb 2005, Reed L. O'Brien wrote:

> I want to read the httpd-access.log and remove any oversized log records
>
> I quickly tossed this script together.  I manually mv-ed log to log.bak
> and touched a new logfile.
>
> running the following with print i uncommented does print each line to
> stdout.  but it doesn't write to the appropriate file...


Hello!

Let's take a look at the program again:

###
import os
srcfile = open('/var/log/httpd-access.log.bak', 'r')
dstfile = open('/var/log/httpd-access.log', 'w')
while 1:
     lines = srcfile.readlines()
     if not lines: break
     for i in lines:
         if len(i) < 2086:
             dstfile.write(i)
srcfile.close()
dstfile.close()
###

> a) what am I missing?
> b) is there a less expensive way to do it?

Hmmm... I don't see anything offhand that prevents httpd-access.log from
containing the lines you expect.  Do you get any error messages, like
permission problems, when you run the program?

Can you show us how you are running the program, and how you are checking
that the resulting file is empty?


Addressing the question on efficiency and expense: yes.  The program at
the moment tries to read all lines into memory at once, and this is
expensive if the file is large.  Let's fix this.


In recent versions of Python, we can modify file-handling code from:

###
lines = somefile.readlines()
for line in lines:
    ...
###

to this:

###
for line in somefile:
    ...
###

That is, we don't need to extract a list of 'lines' out of a file.
Python allows us to loop directly across a file object.  We can find more
details about this in the documentation on "Iterators" (PEP 234):

    http://www.python.org/peps/pep-0234.html

Iterators are a good thing to know, since Python's iterators are deeply
rooted in the language design.  (Even if it they were retroactively
embedded.  *grin*)


A few more comments: the while loop appears unnecessary, since on the
second run-through the loop, we'll have already read all the lines out of
the file.  (I am assuming that nothing is writing to the backup file at
the time.)  If the body of a while loop just runs once, we don't need a
loop.

This simplifies the code down to:

###
srcfile = open('/var/log/httpd-access.log.bak', 'r')
dstfile = open('/var/log/httpd-access.log', 'w')
for line in srcfile:
    if len(line) < 2086:
        dstfile.write(line)
srcfile.close()
dstfile.close()
###


I don't see anything else here that causes the file writing to fail.  If
you can tell us more information on how you're checking the program's
effectiveness, that may give us some more clues.

Best of wishes to you!



More information about the Tutor mailing list