HTML "sanitizer" in Python

Scott Stirling SSTirlin at holnam.com
Thu Apr 29 12:20:27 EDT 1999


On opening files in Windows--I was hoping there was a way to give python the full file path.  Everything I have seen so far just tells me how to open a file if it's in the same directory I am running python from.

I don't have sed on my MS Windows PC at work.  This was part of the initial explanation--I am working for a company where we have DOS, Windows and Office 97.  No sed, no Unix.  This is a Y2K Project too, so we are on a budget with little leeway for new ideas that weren't included in the original statement of work and project plan.

Scott
>>> Stephan Houben <stephan at pcrm.win.tue.nl> 04/29 9:49 AM >>>
"Scott Stirling" <SSTirlin at holnam.com> writes:


> 1) What is the Python syntax for opening a file in MS Windows?  I was following Guido's tutorial yesterday, but I could not figure out how to open a file in Windows.

??? I don't think it's different on windows than on linux.
Just do:

f = open("my_file.html", "rt")

(OK, there *is* a difference, I guess; you really need the "t" in "rt".
 Otherwise the carriage returns show up in your file.)

> 2) How do I find a string of text in the open file and delete it iteratively?

Check out the "string" module.

> 3) How do I save the file in Windows after I have edited it with the Python program?  How do I close it?

Well you open a second file, for writing this time:
  f2 = open("output.html", "wt")

Then you write to it to your heart's content:
  f2.write("blahblahblah")

Then you close it:
  f2.close()

But all this is in the Python docs, so perhaps you should try to read them.

> 4) If someone helps me out, I think I should be able to use this info. and the tutorial and the Lutz book to loop the process and make the program run until all *.htm files in a folder have been handled once.

Well, if I understand correctly, the *only* thing you're trying to do
is to remove some specific strings from a bunch of files. Now if I
were you, I wouldn't even bother to use Python on something that
simple; I would just use sed. With sed, you could do:

  sed 'g/string_to_be_eliminated//g' my_file.html > output.html

Presto, that's it.  I think that there is a version for GNU sed for
Windows somewhere out there; do yourself a favour and get it.

Greetings,

Stephan

-- 
http://www.python.org/mailman/listinfo/python-list

________________________________________________________________
Scott M. Stirling
Visit the HOLNAM Year 2000 Web Site: http://web/y2k
Keane - Holnam Year 2000 Project
Office:  734/529-2411 ext. 2327 fax: 734/529-5066 email: sstirlin at holnam.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~





More information about the Python-list mailing list