[Tutor] List processing
Terry Carroll
carroll at tjc.com
Thu Jun 2 00:33:47 CEST 2005
On 1 Jun 2005 cgw501 at york.ac.uk wrote:
> eYAL001C1 Spar 81 3419 4518 4519 2 1
>
> So basically its a table, separated with tabs. What I need to do is make
> a new file where all the entries in the table are those where the values
> in columns 1 and 5 were present as a pair more than once in the original
> file.
This is half-baked, but I toss it out in case anyone can build on it.
Create a dictionary, keyed on column 1. Read a line and split it into
the columns. For each line, create a dictionary entry that is a
dictionary keyed by column 5, whose entry is a list of lists, the inner
list of which contains columns 2, 3, 4 and 6. When a dupe is found, add
an additional inner list.
So, upon processing this line, you have a dictionary D:
{'eYAL001C1': {'4518': [['Spar', '3419', '4519', '2', '1']]}}
As you process each new line, one of three things is true:
1) Col 1 is used as a key, but col5 is not used as an inner key;
2) Col 1 is used as a key, and col5 is used as an inner key
3) Col 1 is not used as a key
So, for each new line:
if col1 in d.keys():
if col5 in d[col1].keys()
d[col1][col5].append([col2, col3, col4, col6])
else
d[col1][col5] = [[col2, col3, col4, col6]]
else:
d[col1]={col5:[[col2, col3, col4, col6]
The end result is that you'll have all your data from the file in the form
of a dictionary indexed by column 1. Each entry in the top-level
dictionary is a second-level dictionary indexed by column 2. Each entry
in that second-level dictionary is a list of lists, and each list in that
list of lists is columns 2, 3, 4 and 6.
if the list of lists has a length of 1, then the col1/col5 combo only
appears once in the input file. But if it has a length > 1, it occurred
more than once, and satisfies you condition of "columns 1 and 5 were
present as a pair more than once"
So to get at these:
for key1 in d:
for key2 in d[key1]:
if len(d[key1][key2]) > 1:
for l in d[key1][key2]:
print key1, l[0], l[1], l[2], key2, l[3]
I haven't tested this approach (or syntax) but I think the approach is
basically sound.
More information about the Tutor
mailing list