[Tutor] Faster procedure to filter two lists . Please help
Max Noel
maxnoel_fr at yahoo.fr
Sat Jan 15 01:04:16 CET 2005
On Jan 14, 2005, at 23:28, kumar s wrote:
>>>> for i in range(len(what)):
> ele = split(what[i],'\t')
> cor1 = ele[0]
> for k in range(len(my_report)):
> cols = split(my_report[k],'\t')
> cor = cols[0]
> if cor1 == cor:
> print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]
>
>
> 164:623 6649 TCATGGCTGACAACCCATCTTGGGA
> 484:11 6687 ATTATCATCACATGCAGCTTCACGC
> 490:339 6759 GAATGGGGCCGCCAGAACACAGACA
> 247:57 6880 AGTCCTCGTGGAACTACAACTTCAT
> 113:623 6901 TCATGGGTGTTCGGCATGACCCCAA
Okay, so the idea is, the first column of each row is a key, and you
want to display only the rows whose key is the first column (key?) of a
row in my_report, right?
As Danny said, you should use dictionaries for this, with a structure
in the lines of:
what = { '164:623': '6649 TCATGGCTGACAACCCATCTTGGGA',
'484:11': '6687 ATTATCATCACATGCAGCTTCACGC',
'490:339': '6759 GAATGGGGCCGCCAGAACACAGACA',
} (etc.)
Lacking that, as Danny said, nested loops are a huge time sink. Also,
you should avoid using C-style for loops -- Python-style for loops
(equivalent to Perl's foreach) are much more elegant (and probably
faster) in that case. Here's how I would do it with your data
structures (warning, untested code, test before use):
# First, create a list where each element is one of the keys in
my_report
# Also, strings have a split() method, which by default splits on any
whitespace
# (tabs included)
headers = [element.split()[0] for element in my_report]
for element in what:
# Okay, the nested loop is still (more or less) there, but it occurs
within a
# 'in' operator, and is therefore executed in C -- much faster.
if element.split()[0] in headers:
print element
Also, it's shorter -- 4 lines, comments aside. Nevertheless, as Danny
suggested, an approach using dictionaries would blow this away,
speed-wise.
Hope that helps,
-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
"Look at you hacker... A pathetic creature of meat and bone, panting
and sweating as you run through my corridors... How can you challenge a
perfect, immortal machine?"
More information about the Tutor
mailing list