[Tutor] deleting one line in multiple files

Michael Langford mlangford.cs03 at gtalumni.org
Fri Sep 14 14:26:04 CEST 2007


Sed is a little easier than python for this project, but python is more
flexible.

Sed would be

sed -e "s/^.*\<script line.*$//g"     file000.lst

That would leave a blank line where your script line was. To run this
command multiple times, you'd use xargs:

ls filename* | xargs -I fn -n 1 sed -e "s/^.*\<script line.*$//g" fn
fn.modified

This will take each file that starts with filename, strip out the line that
has <script line in it, then rename it to fn.modified.

A second application of xargs will easily rename all the .modified files
back to the original name

ls filename* | xargs -I fn -n 1 mv fn.modified fn

So your total sed solution is:

ls filename* | xargs -I fn -n 1 sed -e "s/^.*\<script line.*$//g" fn
fn.modified
ls filename* | xargs -I fn -n 1 mv fn.modified fn

As far as python goes, use of import sys.argv and xargs will get you to the
same place, but a more flexible solution.

      --Michael
As for the python one, you want to import sys.args


-- 
Michael Langford
Phone: 404-386-0495
Consulting: http://www.TierOneDesign.com/
Entertaining: http://www.ThisIsYourCruiseDirectorSpeaking.com

On 9/14/07, bhaaluu <bhaaluu at gmail.com> wrote:
>
> On 9/13/07, wormwood_3 <wormwood_3 at yahoo.com> wrote:
>
> > I think the problem is that the original script you borrowed looks at
> the file passed to input, and iterates over the lines in that file, removing
> them if they match your pattern. What you actually want to be doing is
> iterating over the lines of your list file, and for each line (which
> represents a file), you want to open *that* file, do the check for your
> pattern, and delete appropriately.
> >
> > Hope I am not completely off:-)
>
> This is exactly what I'd like to do. =)
> After manually opening about 25 files individually and deleting the line
> that I wanted to delete, and seeing about 175 files left to finish, I
> thought
> to myself, 'I'm learning Python! Python is supposed to be really good at
> this kind of stuff.' So, since there isn't a rush deadline for this
> project,
> I figured I could play around and see what kinds of solutions I could
> find.
>
> The 'fileinput' snippet is one solution, but I'd rather be able to pass it
> a
> list of filenames to work on, rather than have to manually change the
> filename in the snippet each time. The files are numbered, in order,
> from 0001 to 0175, (filename0001.html to filename0175.html).
> One thought was to be able to change the filename number incrementally,
> assign it to a variable, and run it through a for loop? Isn't it amazing
> how a Newbie approaches a problem? =)
>
> I'm also looking into 'sed' for doing this.  I've used 'sed' in the past
> for
> deleting a specific line from files, as well as doing simple search and
> replace in a file. I just figured that if it can be done in 'sed,' it
> can be done
> in Python much easier and maybe even more elegantly (although at
> this point in my Python career, elegance isn't a top priority).
>
> Happy Programming!
> --
> bhaaluu at gmail dot com
>
>
> >
> > If I am right so far, you want to do something like:
> >
> > import fileinput
> >
> > for file in fileinput.input("filelist.list", inplace=1):
> >     curfile = file.open()
> >     for line in curfile:
> >         line = line.strip()
> >         if not '<script type'in line:
> >             print line
> >
> > BUT, fileinput was made (if I understand the documentation) to avoid
> having to do this. This is where the sys.argv[1:] values come in. The
> example on this page (look under "Processing Each Line of One or More Files:
> > The fileinput Module") helped clarify it to me:
> http://www.oreilly.com/catalog/lpython/chapter/ch09.html. If you do:
> >
> > % python myscript.py "<script type" `ls`
> > This should pass in all the items in the folder you run this in (be sure
> it only contains the files you want to edit!), looking for "<script type".
> Continuing with the O'Reilly example:
> >
> > import fileinput, sys, string
> > # take the first argument out of sys.argv and assign it to searchterm
> > searchterm, sys.argv[1:] = sys.argv[1], sys.argv[2:]
> > for line in fileinput.input():
> >    num_matches = string.count(line, searchterm)
> >    if num_matches:                     # a nonzero count means there was
> a match
> >        print "found '%s' %d times in %s on line %d." % (searchterm,
> num_matches,
> >            fileinput.filename(), fileinput.filelineno())
> >
> > To test this, I put the above code block in "mygrep.py", then made a
> file "test.txt" in the same folder, with some trash lines, and 1 line with
> the string you said you want to match on. Then I did:
> >
> > sam at B74kb0x:~$ python mygrep.py "<script type" test.txt
> > found '<script type' 1 times in test.txt on line 3.
> >
> > So you could use the above block, and edit the print line to also edit
> the file as you want, maybe leaving the print to confirm it did what you
> expect.
> >
> > Hope this helps!
> > -Sam
> >
> > _____________________________________
> > I have a directory of files, and I've created a file list
> > of the files I want to work on:
> >
> > $ ls > file.list
> >
> > Each file in file.list needs to have a line removed,
> > leaving the rest of the file intact.
> >
> > I found this snippet on the Net, and it works fine for one file:
> >
> > # the lines with '<script type' are deleted.
> > import fileinput
> >
> > for line in fileinput.input("file0001.html", inplace=1):
> >     line = line.strip()
> >     if not '<script type'in line:
> >         print line
> >
> > The docs say:
> > This iterates over the lines of all files listed in sys.argv[1:]...
> > I'm not sure how to implement the argv stuff.
> >
> > However, the documentation also states:
> > To specify an alternative list of filenames,
> > pass it as the first argument to input().
> > A single file name is also allowed.
> >
> > So, when I replace file0001.html with file.list (the alternative list
> > of filenames, nothing happens.
> >
> > # the lines with '<script type' are deleted.
> > import fileinput
> >
> > for line in fileinput.input("file.list", inplace=1):
> >     line = line.strip()
> >     if not '<script type'in line:
> >         print line
> >
> > file.list has one filename on each line, ending with a newline.
> > file0001.html
> > file0002.html
> > :::
> > :::
> > file0175.html
> >
> > Have I interpreted the documentation wrong?
> > The goal is to delete the line that has '<script type' in it.
> > I can supply more information if needed.
> > TIA.
> > --
> > bhaaluu at gmail dot com
> > _______________________________________________
> > Tutor maillist  -  Tutor at python.org
> > http://mail.python.org/mailman/listinfo/tutor
> >
> >
> >
> >
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070914/ebcecf6e/attachment-0001.htm 


More information about the Tutor mailing list