[Tutor] Help with regular expression
Steven D'Aprano
steve at pearwood.info
Sun Apr 15 16:24:17 CEST 2012
syed zaidi wrote:
> Dear Steve,Tutor doesn't allow attachment of huge files. I am attaching
> the files I am taking as input, code and the output CSV file. I hope then
> you would be able to help me. DOT keg files open in file viewer, you can
> also view them in python. The CSV file is the desired output file.
There is no need to send four files when one will do. Also no need to send a
file with multiple thousands of lines long when a dozen or so lines should be
sufficient.
It would also help if you told us what the fields in the file should be
called. You are probably familiar with them, but we aren't.
Since I don't know what the fields are called, I'm going to just make up some
names.
def parse_d_line(line):
# Expects a line like this:
# D SBG_0147 aceE; xxx xxx\tK00163 xxx xxx [EC:1.2.4.1]
a, b = line.split('\t') # split on tab character
c, d = a.split(';')
letter, sbg_code, other_code = c.split()
compound1 = d.strip()
words = b.split()
k_code = words[0]
ec = words[-1]
compound2 = " ".join(words[1:-1])
return (letter, sbg_code, other_code, compound1, k_code, compound2, ec)
kegfile = open('something.keg')
# skip lines until a bare exclamation mark
for line in kegfile:
if line.strip() == '!':
break
# analyse D lines only, skipping all others
for line in kegfile:
if line.startswith('D'):
print(parse_d_line(dline))
elif line.strip() == '!':
break # stop processing
You will notice I don't use regular expressions in this.
Some people, when confronted with a problem, think "I know,
I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski
--
Steven
More information about the Tutor
mailing list