[Tutor] Help with regular expression

Steven D'Aprano steve at pearwood.info
Sun Apr 15 16:24:17 CEST 2012


syed zaidi wrote:
> Dear Steve,Tutor doesn't allow attachment of huge files. I am attaching
> the files I am taking as input, code and the output CSV file. I hope then
> you would be able to help me. DOT keg files open in file viewer, you can
> also view them in python. The CSV file is the desired output file.


There is no need to send four files when one will do. Also no need to send a 
file with multiple thousands of lines long when a dozen or so lines should be 
sufficient.

It would also help if you told us what the fields in the file should be 
called. You are probably familiar with them, but we aren't.

Since I don't know what the fields are called, I'm going to just make up some 
names.

def parse_d_line(line):
     # Expects a line like this:
     # D    SBG_0147 aceE; xxx xxx\tK00163 xxx xxx [EC:1.2.4.1]
     a, b = line.split('\t')  # split on tab character
     c, d = a.split(';')
     letter, sbg_code, other_code = c.split()
     compound1 = d.strip()
     words = b.split()
     k_code = words[0]
     ec = words[-1]
     compound2 = " ".join(words[1:-1])
     return (letter, sbg_code, other_code, compound1, k_code, compound2, ec)


kegfile = open('something.keg')
# skip lines until a bare exclamation mark
for line in kegfile:
     if line.strip() == '!':
         break

# analyse D lines only, skipping all others
for line in kegfile:
     if line.startswith('D'):
         print(parse_d_line(dline))
     elif line.strip() == '!':
         break  # stop processing


You will notice I don't use regular expressions in this.

     Some people, when confronted with a problem, think "I know,
     I'll use regular expressions." Now they have two problems.
     -- Jamie Zawinski




-- 
Steven



More information about the Tutor mailing list