replacing words in HTML file

Daniel Fetchinson fetchinson at googlemail.com
Thu Apr 29 05:38:53 EDT 2010


> | > Any idea how I can replace words in a html file? Meaning only the
> | > content will get replace while the html tags, javascript, & css are
> | > remain untouch.
> |
> | I'm not sure what you tried and what you haven't but as a first trial
> | you might want to
> |
> | <untested>
> |
> | f = open( 'new.html', 'w' )
> | f.write( open( 'index.html' ).read( ).replace( 'replace-this', 'with-that'
> ) )
> | f.close( )
> |
> | </untested>
>
> If 'replace-this' occurs inside the javascript etc or happens to be an
> HTML tag name, it will get mangled. The OP didn't want that.

Correct, that is why I started with "I'm not sure what you tried and
what you haven't but as a first trial you might". For instance if the
OP wants to replace words which he knows are not in javascript and/or
css and he knows that these words are also not in html attribute
names/values, etc, etc, then the above approach would work, in which
case BeautifulSoup is a gigantic overkill. The OP needs to specify
more clearly what he wants, before really useful advice can be given.

Cheers,
Daniel


> The only way to get this right is to parse the file, then walk the doc
> tree enditing only the text parts.
>
> The BeautifulSoup module (3rd party, but a single .py file and trivial to
> fetch and use, though it has some dependencies) does a good job of this,
> coping even with typical not quite right HTML. It gives you a parse
> tree you can easily walk, and you can modify it in place and write it
> straight back out.
>
> Cheers,
> --
> Cameron Simpson <cs at zip.com.au> DoD#743
> http://www.cskk.ezoshosting.com/cs/
>
> The Web site you seek
> cannot be located but
> endless others exist
> - Haiku Error Messages
> http://www.salonmagazine.com/21st/chal/1998/02/10chal2.html
>


-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown



More information about the Python-list mailing list