Need direction on mass find/replacement in HTML files
Tim Chase
python.list at tim.thechases.com
Fri Apr 30 16:55:49 EDT 2010
On 04/30/2010 02:54 PM, KevinUT wrote:
> I want to globally change the following:<a href="http://
> www.mysite.org/?page=contacts"><font color="#269BD5">
>
> into:<a href="pages/contacts.htm"><font color="#269BD5">
Normally I'd just do this with sed on a *nix-like OS:
find . -iname '*.html' -exec sed -i.BAK
's at href="http://www.mysite.org/?page=\([^"]*\)@href="pages/\1.htm at g'
{} \;
This finds all the HTML files (*.html) under the current
directory ('.') calling sed on each one. Sed then does the
substitution you describe, changing
href="http://www.mysite.org/?page=<whatever>
into
href="pages/<whatever>.htm
moving the original file to a .BAK file (you can omit the
"-i.BAK" parameter if you don't want this backup behavior;
alternatively assuming you don't have any pre-existing .BAK
files, after you've vetted the results, you can then use
find . -name '*.BAK' -exec rm {} \;
to delete them all) and then overwrites the original with the
modified results.
Yes, one could hack up something in Python, perhaps adding some
real HTML-parsing brains to it, but for the most part, that
one-liner should do what you need. Unless you're stuck on Win32
with no Cygwin-like toolkit
-tkc
More information about the Python-list
mailing list