[Tutor] deleting one line in multiple files

bhaaluu bhaaluu at gmail.com
Fri Sep 14 13:40:00 CEST 2007


On 9/13/07, wormwood_3 <wormwood_3 at yahoo.com> wrote:

> I think the problem is that the original script you borrowed looks at the file passed to input, and iterates over the lines in that file, removing them if they match your pattern. What you actually want to be doing is iterating over the lines of your list file, and for each line (which represents a file), you want to open *that* file, do the check for your pattern, and delete appropriately.
>
> Hope I am not completely off:-)

This is exactly what I'd like to do. =)
After manually opening about 25 files individually and deleting the line
that I wanted to delete, and seeing about 175 files left to finish, I thought
to myself, 'I'm learning Python! Python is supposed to be really good at
this kind of stuff.' So, since there isn't a rush deadline for this project,
I figured I could play around and see what kinds of solutions I could find.

The 'fileinput' snippet is one solution, but I'd rather be able to pass it a
list of filenames to work on, rather than have to manually change the
filename in the snippet each time. The files are numbered, in order,
from 0001 to 0175, (filename0001.html to filename0175.html).
One thought was to be able to change the filename number incrementally,
assign it to a variable, and run it through a for loop? Isn't it amazing
how a Newbie approaches a problem? =)

I'm also looking into 'sed' for doing this.  I've used 'sed' in the past for
deleting a specific line from files, as well as doing simple search and
replace in a file. I just figured that if it can be done in 'sed,' it
can be done
in Python much easier and maybe even more elegantly (although at
this point in my Python career, elegance isn't a top priority).

Happy Programming!
--
bhaaluu at gmail dot com


>
> If I am right so far, you want to do something like:
>
> import fileinput
>
> for file in fileinput.input("filelist.list", inplace=1):
>     curfile = file.open()
>     for line in curfile:
>         line = line.strip()
>         if not '<script type'in line:
>             print line
>
> BUT, fileinput was made (if I understand the documentation) to avoid having to do this. This is where the sys.argv[1:] values come in. The example on this page (look under "Processing Each Line of One or More Files:
> The fileinput Module") helped clarify it to me: http://www.oreilly.com/catalog/lpython/chapter/ch09.html. If you do:
>
> % python myscript.py "<script type" `ls`
> This should pass in all the items in the folder you run this in (be sure it only contains the files you want to edit!), looking for "<script type". Continuing with the O'Reilly example:
>
> import fileinput, sys, string
> # take the first argument out of sys.argv and assign it to searchterm
> searchterm, sys.argv[1:] = sys.argv[1], sys.argv[2:]
> for line in fileinput.input():
>    num_matches = string.count(line, searchterm)
>    if num_matches:                     # a nonzero count means there was a match
>        print "found '%s' %d times in %s on line %d." % (searchterm, num_matches,
>            fileinput.filename(), fileinput.filelineno())
>
> To test this, I put the above code block in "mygrep.py", then made a file "test.txt" in the same folder, with some trash lines, and 1 line with the string you said you want to match on. Then I did:
>
> sam at B74kb0x:~$ python mygrep.py "<script type" test.txt
> found '<script type' 1 times in test.txt on line 3.
>
> So you could use the above block, and edit the print line to also edit the file as you want, maybe leaving the print to confirm it did what you expect.
>
> Hope this helps!
> -Sam
>
> _____________________________________
> I have a directory of files, and I've created a file list
> of the files I want to work on:
>
> $ ls > file.list
>
> Each file in file.list needs to have a line removed,
> leaving the rest of the file intact.
>
> I found this snippet on the Net, and it works fine for one file:
>
> # the lines with '<script type' are deleted.
> import fileinput
>
> for line in fileinput.input("file0001.html", inplace=1):
>     line = line.strip()
>     if not '<script type'in line:
>         print line
>
> The docs say:
> This iterates over the lines of all files listed in sys.argv[1:]...
> I'm not sure how to implement the argv stuff.
>
> However, the documentation also states:
> To specify an alternative list of filenames,
> pass it as the first argument to input().
> A single file name is also allowed.
>
> So, when I replace file0001.html with file.list (the alternative list
> of filenames, nothing happens.
>
> # the lines with '<script type' are deleted.
> import fileinput
>
> for line in fileinput.input("file.list", inplace=1):
>     line = line.strip()
>     if not '<script type'in line:
>         print line
>
> file.list has one filename on each line, ending with a newline.
> file0001.html
> file0002.html
> :::
> :::
> file0175.html
>
> Have I interpreted the documentation wrong?
> The goal is to delete the line that has '<script type' in it.
> I can supply more information if needed.
> TIA.
> --
> bhaaluu at gmail dot com
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
>
>


More information about the Tutor mailing list