[Tutor] Faster procedure to filter two lists . Please help

Sat Jan 15 04:12:03 CET 2005

Hi Danny:
 Thank you for your suggestion. I tried creating a
dictionary of 'what' list and searched keys with
has_key method and it is pretty fast. 

Thanks again. following is the piece of code.

K

>>> cors = []
>>> intr = []
>>> for i in range(len(what)):
	ele = split(what[i],'\t')
	cors.append(ele[0])
	intr.append(ele[1])

>>> what_dict = dict(zip(cors,intr))

>>> for i in range(len(my_report)):
	cols = split(my_report[i],'\t')
	cor = cols[0]
	if what_dict.has_key(cor):
		intr = what_dict[cor]

my_final_report.append(cols[0]+'\t'+intr+'\t'+cols[1]+'\t'+cols[2])

--- Danny Yoo <dyoo at hkn.eecs.berkeley.edu> wrote:

> 
> 
> On Fri, 14 Jan 2005, kumar s wrote:
> 
> > >>>for i in range(len(what)):
> > 	ele = split(what[i],'\t')
> > 	cor1 = ele[0]
> > 	for k in range(len(my_report)):
> > 		cols = split(my_report[k],'\t')
> > 		cor = cols[0]
> > 		if cor1 == cor:
> > 			print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]
> 
> 
> 
> Hi Kumar,
> 
> 
> Ok, this calls for the use of an "associative map"
> or "dictionary".
> 
> 
> The main time sink is the loop here:
> 
> > 	for k in range(len(my_report)):
> > 		cols = split(my_report[k],'\t')
> > 		cor = cols[0]
> > 		if cor1 == cor:
> > 			print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]
> 
> Conceptually, my_report can be considered a list of
> key/value pairs.  For
> each element in 'my_report', the "key" is the first
> column (cols[0]), and
> the "value" is the rest of the columns (cols[1:]).
> 
> 
> The loop above can, in a pessimistic world, require
> a search across the
> whole of 'my_report'.  This can take time that is
> proportional to the
> length of 'my_report'.  You mentioned earlier that
> each list might be of
> length 249502, so we're looking into a process whose
> overall cost is
> gigantic.
> 
> 
> [Notes on calculating runtime cost: when the
> structure of the code looks
> like:
> 
>     for element1 in list1:
>         for element2 in list2:
>             some_operation_that_costs_K_time()
> 
> then the overall cost of running this loop will be
> 
>     K * len(list1) * len(list2)
> ]
> 
> 
> We can do much better than this if we use a
> "dictionary" data structure. A
> "dictionary" can reduce the time it takes to do a
> lookup search down from
> a linear-time operation to an atomic-time one.  Do
> you know about
> dictionaries yet?  You can take a look at:
> 
>     http://www.ibiblio.org/obp/thinkCSpy/chap10.htm
> 
> which will give an overview of a dictionary.  It
> doesn't explain why
> dictionary lookup is fast, but we can talk about
> that later if you want.
> 
> 
> Please feel free to ask any questions about
> dictionaries and their use.
> Learning how to use a dictionary data structure is a
> skill that pays back
> extraordinarily well.
> 
> 
> Good luck!
> 
> 

__________________________________ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com