Replace and inserting strings within .txt files with the use of regex

John S jstrickler at
Sun Aug 8 04:42:31 CEST 2010

On Aug 7, 8:20 pm, Νίκος < at> wrote:
> Hello dear Pythoneers,
> I have over 500 .php web pages in various subfolders under 'data'
> folder that i have to rename to .html and and ditch the '<?' and '?>'
> tages from within and also insert a very first line of <!-- id -->
> where id must be an identification unique number of every page for
> counter tracking purposes. ONly pure html code must be left.
> I before find otu Python used php and now iam switching to templates +
> python solution so i ahve to change each and every page.
> I don't know how to handle such a big data replacing problem and
> cannot play with fire because those 500 pages are my cleints pages and
> data of those filesjust cannot be messes up.
> Can you provide to me a script please that is able of performing an
> automatic way of such a page content replacing?
> Thanks a million!

If the 500 web pages are PHP only in the sense that there is only one
pair of <? ?> tags in each file, surrounding the entire content, then
what you ask for is doable.

from os.path import join
import os

id = 1  # id number
for currdir,files,dirs in os.walk('data'):
    for f in files:
        if f.endswith('php'):
            source_file_name = join(currdir,f)    # get abs path to
            source_file = open(source_file_name)
            source_contents =  # read contents of
PHP file

            # replace tags
            source_contents = source_contents.replace('<%','')
            source_contents = source_contents.replace('%>','')

            # add ID
            source_contents = ( '<!-- %d -->' % id ) + source_contents
            id += 1

            # create new file with .html extension
            source_file_name =
            dest_file = open(source_file_name,'w')
            dest_file.write(source_contents)  # write contents

Note: error checking left out for clarity.

On the other hand, if your 500 web pages contain embedded PHP
variables or logic, you have a big job ahead. Django templates and PHP
are two different languages for embedding data and logic in web pages.
Converting a project from PHP to Django involves more than renaming
the template files and deleting "<?" and friends.

For example, here is a snippet of PHP which checks which browser is
viewing the page:

if (strpos($_SERVER['HTTP_USER_AGENT'], 'MSIE') !== FALSE) {
    echo 'You are using Internet Explorer.<br />';

In Django, you would typically put this logic in a Django *view*
(which btw is not what is called a 'view' in MVC term), which is the
code that prepares data for the template. The logic would not live
with the HTML. The template uses "template variables" that the view
has associated with a Python variable or function. You might create a
template variable (created via a Context object) named 'browser' that
contains a value that identifies the browser.

Thus, your Python template (HTML file) might look like this:

{% if browser == 'IE' %}You are using Internet Explorer{% endif %}

PHP tends to combine the presentation with the business logic, or in
MVC terms, combines the view with the controller. Django separates
them out, which many people find to be a better way. The person who
writes the HTML doesn't have to speak Python, but only know the names
of template variables and a little bit of template logic. In PHP, the
HTML code and all the business logic lives in the same files. Even
here, it would probably make sense to calculate the browser ID in the
header of the HTML file, then access it via a variable in the body.

If you have 500 static web pages that are part of the same
application, but that do not contain any logic, your application might
need to be redesigned.

Also, you are doing your changes on a COPY of the application on a non-
public server, aren't you? If not, then you really are playing with


More information about the Python-list mailing list