[Tutor] Faster procedure to filter two lists . Please help
kumar s
ps_python at yahoo.com
Sat Jan 15 04:12:03 CET 2005
Hi Danny:
Thank you for your suggestion. I tried creating a
dictionary of 'what' list and searched keys with
has_key method and it is pretty fast.
Thanks again. following is the piece of code.
K
>>> cors = []
>>> intr = []
>>> for i in range(len(what)):
ele = split(what[i],'\t')
cors.append(ele[0])
intr.append(ele[1])
>>> what_dict = dict(zip(cors,intr))
>>> for i in range(len(my_report)):
cols = split(my_report[i],'\t')
cor = cols[0]
if what_dict.has_key(cor):
intr = what_dict[cor]
my_final_report.append(cols[0]+'\t'+intr+'\t'+cols[1]+'\t'+cols[2])
--- Danny Yoo <dyoo at hkn.eecs.berkeley.edu> wrote:
>
>
> On Fri, 14 Jan 2005, kumar s wrote:
>
> > >>>for i in range(len(what)):
> > ele = split(what[i],'\t')
> > cor1 = ele[0]
> > for k in range(len(my_report)):
> > cols = split(my_report[k],'\t')
> > cor = cols[0]
> > if cor1 == cor:
> > print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]
>
>
>
> Hi Kumar,
>
>
> Ok, this calls for the use of an "associative map"
> or "dictionary".
>
>
> The main time sink is the loop here:
>
> > for k in range(len(my_report)):
> > cols = split(my_report[k],'\t')
> > cor = cols[0]
> > if cor1 == cor:
> > print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]
>
> Conceptually, my_report can be considered a list of
> key/value pairs. For
> each element in 'my_report', the "key" is the first
> column (cols[0]), and
> the "value" is the rest of the columns (cols[1:]).
>
>
> The loop above can, in a pessimistic world, require
> a search across the
> whole of 'my_report'. This can take time that is
> proportional to the
> length of 'my_report'. You mentioned earlier that
> each list might be of
> length 249502, so we're looking into a process whose
> overall cost is
> gigantic.
>
>
> [Notes on calculating runtime cost: when the
> structure of the code looks
> like:
>
> for element1 in list1:
> for element2 in list2:
> some_operation_that_costs_K_time()
>
> then the overall cost of running this loop will be
>
> K * len(list1) * len(list2)
> ]
>
>
> We can do much better than this if we use a
> "dictionary" data structure. A
> "dictionary" can reduce the time it takes to do a
> lookup search down from
> a linear-time operation to an atomic-time one. Do
> you know about
> dictionaries yet? You can take a look at:
>
> http://www.ibiblio.org/obp/thinkCSpy/chap10.htm
>
> which will give an overview of a dictionary. It
> doesn't explain why
> dictionary lookup is fast, but we can talk about
> that later if you want.
>
>
> Please feel free to ask any questions about
> dictionaries and their use.
> Learning how to use a dictionary data structure is a
> skill that pays back
> extraordinarily well.
>
>
> Good luck!
>
>
__________________________________
Do you Yahoo!?
Meet the all-new My Yahoo! - Try it today!
http://my.yahoo.com
More information about the Tutor
mailing list