HTML "sanitizer" in Python
Scott Stirling
SSTirlin at holnam.com
Thu Apr 29 12:20:27 EDT 1999
On opening files in Windows--I was hoping there was a way to give python the full file path. Everything I have seen so far just tells me how to open a file if it's in the same directory I am running python from.
I don't have sed on my MS Windows PC at work. This was part of the initial explanation--I am working for a company where we have DOS, Windows and Office 97. No sed, no Unix. This is a Y2K Project too, so we are on a budget with little leeway for new ideas that weren't included in the original statement of work and project plan.
Scott
>>> Stephan Houben <stephan at pcrm.win.tue.nl> 04/29 9:49 AM >>>
"Scott Stirling" <SSTirlin at holnam.com> writes:
> 1) What is the Python syntax for opening a file in MS Windows? I was following Guido's tutorial yesterday, but I could not figure out how to open a file in Windows.
??? I don't think it's different on windows than on linux.
Just do:
f = open("my_file.html", "rt")
(OK, there *is* a difference, I guess; you really need the "t" in "rt".
Otherwise the carriage returns show up in your file.)
> 2) How do I find a string of text in the open file and delete it iteratively?
Check out the "string" module.
> 3) How do I save the file in Windows after I have edited it with the Python program? How do I close it?
Well you open a second file, for writing this time:
f2 = open("output.html", "wt")
Then you write to it to your heart's content:
f2.write("blahblahblah")
Then you close it:
f2.close()
But all this is in the Python docs, so perhaps you should try to read them.
> 4) If someone helps me out, I think I should be able to use this info. and the tutorial and the Lutz book to loop the process and make the program run until all *.htm files in a folder have been handled once.
Well, if I understand correctly, the *only* thing you're trying to do
is to remove some specific strings from a bunch of files. Now if I
were you, I wouldn't even bother to use Python on something that
simple; I would just use sed. With sed, you could do:
sed 'g/string_to_be_eliminated//g' my_file.html > output.html
Presto, that's it. I think that there is a version for GNU sed for
Windows somewhere out there; do yourself a favour and get it.
Greetings,
Stephan
--
http://www.python.org/mailman/listinfo/python-list
________________________________________________________________
Scott M. Stirling
Visit the HOLNAM Year 2000 Web Site: http://web/y2k
Keane - Holnam Year 2000 Project
Office: 734/529-2411 ext. 2327 fax: 734/529-5066 email: sstirlin at holnam.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
More information about the Python-list
mailing list