clarification
Tim Williams
listserver at tdw.net
Sat Aug 18 11:37:42 EDT 2007
On 17/08/07, Beema shafreen <beema.shafreen at gmail.com> wrote:
> hi everybody,
> i have a file with data separated by tab
> mydata:
> fhl1 fkh2
> dfp1 chk1
> mal3 alp14
> mal3 moe1
> mal3 spi1
> mal3 bub1
> mal3 bub3
> mal3 mph1
> mal3 mad3
> hob1 nak1
> hob1 wsp1
> hob1 rad3
> cdr2 cdc13
> cdr2 cdc2
> shows these two are separated by tab represented as columns
> i have to check the common data between these two coloumn1 an coloumn2
> my code:
> data = []
> data1 = []
> result = []
> fh = open('sheet1','r')
> for line in fh.readlines():
> splitted = line.strip().split('\t')
> data.append(splitted[0])
> data1.append(splitted[1])
> for k in data:
> if k in data1:
> result.append(k)
> print result
> fh.close()
>
> can you tell me problem with my script and what should is do for this
For a start, you are iterating k in data *everytime* you iterate a
line in fh which will give you a speed issue and give you duplicates
in the result. The following is probably what you intended to do
> for line in fh.readlines():
> do stuff
> for k in data:
> do stuff
.split() splits by spaces, newlines AND tabs so you just need
> splitted = line.split()
eg
>>> ln = 'fhl1\tfkh2\r\n'
>>> ln.split()
['fhl1', 'fkh2']
I think I would have done something like this (not tested)
Input = open('sheet1').read().split()
data = set(Input[::2])
data1 = set (Input[1::2])
result = data.intersection(data1)
or even this (if you don't need data and data1 later in the code)
Input = open('sheet1').read().split()
result = set(Input[::2]).intersection(set (Input[1::2]))
HTH :)
More information about the Python-list
mailing list