Using Regular Expresions to change .htm to .php in files

Ryan Ginstrom software at
Mon Aug 27 03:00:42 CEST 2007

> On Behalf Of Mark
> This line should be:
> sed "s/\.htm$/.php/g" < $each > /tmp/$$

I think a more robust way to go about this would be:

(1) Use os.walk to walk through the directory

(2) Use Beautiful Soup to extract the internal links from each file

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(doc)
links = soup('a')
internal_links = [link["href"]
                        for link in links
                        if link.has_key("href") and not

(4) Do straight string replacements on those links (no regex needed)

(5) Save each html file to *.html.bak before changing

Ryan Ginstrom

More information about the Python-list mailing list