clarification

Sat Aug 18 11:37:42 EDT 2007

On 17/08/07, Beema shafreen <beema.shafreen at gmail.com> wrote:
> hi everybody,
> i have a file with data separated by tab
> mydata:
> fhl1    fkh2
> dfp1    chk1
> mal3    alp14
> mal3    moe1
> mal3    spi1
> mal3    bub1
> mal3    bub3
> mal3    mph1
> mal3    mad3
> hob1    nak1
> hob1    wsp1
> hob1    rad3
> cdr2    cdc13
> cdr2    cdc2
> shows these two are separated by tab represented as columns
> i have to check the common data between these two coloumn1 an coloumn2
> my code:
> data = []
> data1 = []
> result = []
> fh = open('sheet1','r')
> for line in fh.readlines():
>         splitted = line.strip().split('\t')
>         data.append(splitted[0])
>         data1.append(splitted[1])
>         for k in data:
>                 if k in data1:
>                         result.append(k)
>                         print result
> fh.close()
>
> can you tell me problem with my script and what should is do for this

For a start, you are iterating k in data  *everytime* you iterate a
line in fh  which will give you a speed issue and give you duplicates
in the result.  The following is probably what you intended to do

> for line in fh.readlines():
>         do stuff
> for k in data:
>         do stuff

.split()  splits by spaces, newlines AND tabs so you just need

> splitted = line.split()

eg

>>> ln = 'fhl1\tfkh2\r\n'
>>> ln.split()
['fhl1', 'fkh2']

I think I would have done something like this (not tested)

Input = open('sheet1').read().split()
data = set(Input[::2])
data1 = set (Input[1::2])
result = data.intersection(data1)

or even this (if you don't need data and data1 later in the code)

Input = open('sheet1').read().split()
result = set(Input[::2]).intersection(set (Input[1::2]))

HTH :)