[Tutor] trouble with re

Ertl, John john.ertl at fnmoc.navy.mil
Mon May 8 19:40:29 CEST 2006


I have a file with 10,000 + lines and it has a coma delimited string on each
line.

The file should look like:

DFRE,ship name,1234567
FGDE,ship 2,
,sdfsf

The ,sdfsf  line is bad data


Some of the lines are messed up...I want to find all lines that do not end
in a comma or seven digits and do some work on them.  I can do the search
for just the last seven digits but I can not do the seven digits or the
comma at the end in the same search.

Any ideas


import re
import sys
import os

p = re.compile('\d{7}$ | [,]$')   # this is the line that I can not get
correct I an trying to find lines that end in a comma or 7 digits
newFile = open("newFile.txt",'w')
oldFile = open("shipData.txt",'r')

for line in oldFile:
        if p.search(line):
           newFile.write(line)
        else:
           newFile.write("*BAD DATA " + line)

newFile.close()
oldFile.close() 


More information about the Tutor mailing list