[Tutor] List processing
Kent Johnson
kent37 at tds.net
Thu Jun 2 04:47:22 CEST 2005
cgw501 at york.ac.uk wrote:
> Hi,
>
> I have a load of files I need to process. Each line of a file looks
> something like this:
>
> eYAL001C1 Spar 81 3419 4518 4519 2 1
>
> So basically its a table, separated with tabs. What I need to do is make a
> new file where all the entries in the table are those where the values in
> columns 1 and 5 were present as a pair more than once in the original file.
>
> I really have very little idea how to achiev this. So far I read in the
> file to a list , where each item in the list is a list of the entries on a
> line.
I would do this with two passes over the data. The first pass would accumulate lines and count pairs
of (col1, col5); the second pass would output the lines whose count is > 1. Something like this
(untested):
lines = []
counts = {}
# Build a list of split lines and count the (col1, col5) pairs
for line in open('input.txt'):
line = line.split() # break line on tabs
key = (line[1], line[5]) # or (line[0], line[4]) depending on what you mean by col 1
counts[key] = counts.get(key, 0) + 1 # count the key pair
lines.append(line)
# Output the lines whose pairs appear more than once
f = open('output.txt', 'w')
for line in lines:
if counts[(line[1], line[5])] > 1:
f.write('\t'.join(line))
f.write('\n')
f.close()
Kent
More information about the Tutor
mailing list