improved search algorithm

grahamdick77 at gmail.com grahamdick77 at gmail.com
Mon May 4 14:15:13 EDT 2009


Hi

I have an excel file that is read into python (8000 rows)

from csv        import reader, writer
incsv = reader(open(MY_FILE), dialect='excel')
keys = incsv.next()

There are mixed datatypes.

the last column contains a cumulative frequency running in order
0.0000 to 1.0000 for the 8000 rows

for a loop of 100,000 times I want to take a new random number each
time and find the row with the next biggest cumulative frequency value

Here's my current (pseudo)code:

for 1 to 100000

myRand = random.random()
for line in incsv:
            if float(line[-1]) > myRand:
                resline = []
                for item in line:
                    try:
                        i = int(item)
                    except ValueError:
                        try:
                            i = float(item)
                        except ValueError:
                            i = item
                    resline.append(i)
                #Here we construct a dict of pair values:
                #{'ID':18,...
                res = dict(zip(keys,resline))
                break
            else:
                continue

      #do some stuff with res




I'm scanning over each line of the csv and deciding which row to
select (100k times) this is just not very efficient

How can i improve this code.
for line in incsv:
            if float(line[-1]) > random.random():

I can use numpy etc. whatever



More information about the Python-list mailing list