[Tutor] Making Regular Expressions readable

Stephen Nelson-Smith sanelson at gmail.com
Mon Mar 8 17:12:35 CET 2010


I've written this today:

#!/usr/bin/env python
import re

pattern = r'(?P<ForwardedFor>^(-|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}(,
(?P<RemoteLogname>(\S*)) (?P<RemoteUser>(\S*))
(?P<Status>(\S*)) (?P<Size>(\S*))

regex = re.compile(pattern)

lines = 0
no_cookies = 0

for line in open('/home/stephen/scratch/feb-100.txt'):
  lines +=1
  line = line.strip()
  match = regex.match(line)

  if match:
    data = match.groupdict()
    if data['SiteIntelligenceCookie'] == '':
      no_cookies +=1
    print "Couldn't match ", line

print "I analysed %s lines." % (lines,)
print "There were %s lines with missing Site Intelligence cookies." %

It works fine, but it looks pretty unreadable and unmaintainable to
anyone who hasn't spent all day writing regular expressions.

I remember reading about verbose regular expressions.  Would these help?

How could I make the above more maintainable?


Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd

More information about the Tutor mailing list