Exmaples of re module

Bernhard Reiter bernhard at alpha1.csd.uwm.edu
Fri Jun 4 06:54:28 CEST 1999

On Fri, 4 Jun 1999 01:00:10 GMT, air at apex.net <air at apex.net> wrote:
>I am looking for some examples of the re module, mainly I need to be able to do what I used to do with awk, grab part of a file from Point a to point b, in awk you could do it like this
>or something similar to this, any help would be appreciated.

Hmm here is my zero-dot-second shot at doing a typical awk task
with python, it might be a crude example, because I just played with 
some elements and the main() part is a bit lengthy.
Methods from the string and re modules are used as needed.
For some purposes the string methods seem to be sufficient.

I wanted to filter lines which my tirc here spits out, and they
look like (with ^C being real Control-Cs, I replaced them for this posting):

### Opening logfile (channel #jzz), [Mon Feb 16 09:55:44 1999]
[Mon Feb 16 09:57:12 1999]<ROW^C7,0;> da ist er ja 
[Mon Feb 16 09:57:18 1999]#jzz> Wer? ;-)
[Mon Feb 16 09:57:32 1999]<ROW^C7,0;> Cliff (bernhard at alpha1.csd.uwm.edu)
[Mon Feb 16 09:57:47 1999]#jzz> Ja! 17:00 war doch abgemacht, oder?
[Mon Feb 16 09:59:12 1999]* Cliff bemerkt, dass wir etwa 2 Sekunden "ping Zeit" haben.

After filtering, they look like:

### Opening logfile (channel #jzz), [Mon Feb 15 09:55:44 1999]
[09:57:12] <JOW> da ist er ja 
[09:57:18] <Cliff> Wer? ;-)
[09:57:32] <JOW> Cliff (bernhard at alpha1.csd.uwm.edu)
[09:57:47] <Cliff> Ja! 17:00 war doch abgemacht, oder?
[09:59:12]* Cliff bemerkt, dass wir etwa 2 Sekunden "ping Zeit" haben.

Usually you would want to overwrite the body of process().
I tried to minic a few AWK variables, they are not all used.
They could be commented out for speed, if unneeded.

You can use a global variable to record a state, like
when you have encoundered your beginning line.

I inserted a few comments to show you where the BEGIN and END awk
actions have to take place, so this is untested coded ahead:

#! /bin/env python
"""Format and clean tirc normal logfile for xxx IRC session logs.  v%(version)s

 %(progname)s [inputfilename(s)]  >outputfilename

The default is to use <Cliff> as user.
The time entries on each line will be cutted to include only the time
and not the day or year, which easily can be seen from the beginning
and end of a consecutive session.

Of course this is dependend on the time format tirc uses.
# initial 2.6.1999 Bernhard Reiter
# 3.6.1999 demo for comp.lang.python

import fileinput
import sys
import string
import re


def process(line,fileinputobject,write):
    """Process one inputline and spit out what is wanted."""

    # share some things between subsequent calls
    global channel
    global nickre


    # imitate gawk a bit   ;-)

    if line[0:3]=="###":
	# opening or closing of logfile

	# grab channelname
	matchobj=re.search('\(channel (#[^)]+)\)',line)
	sys.stderr.write("Found channel: " + channel + "\n" )
	# prepare nick replacement procedure
	nickre=re.compile(re.escape("]" +channel+">"))
    if line[0]=="[":
	# normal line

	# replace funny control string
	# distance between time and normal text
	if line[25:27]=="]<":
		line=line[:26]+" "+line[26:]

	# cut date and year out 


def main():
    """Check script arguments and do the work."""

    if len(sys.argv)==1:
	sys.stderr.write(__doc__ % 
	    {"progname":sys.argv[0], "version":__version__})

	if sys.platform=="win32":
	    import msvcrt
	    sys.stderr.write("\nPress any key to start reading from stdin.\n")
	    sys.stderr.write("Now reading from stdin.\n")

    #AWK's BEGINs {}

    #prepare for AWK middelpart in process()


	for line in fileinputobject:

    #AWK's ENDs {}

if __name__=="__main__":

More information about the Python-list mailing list