[Tutor] need to get unique elements out of a 2.5Gb file
Srinivas Iyyer
srini_iyyer_bio at yahoo.com
Thu Feb 2 06:50:44 CET 2006
Hi Group,
I have a file which is 2.5 Gb.,
TRIM54 NM_187841.1 GO:0004984
TRIM54 NM_187841.1 GO:0001584
TRIM54 NM_187841.1 GO:0003674
TRIM54 NM_187841.1 GO:0004985
TRIM54 NM_187841.1 GO:0001584
TRIM54 NM_187841.1 GO:0001653
TRIM54 NM_187841.1 GO:0004984
There are many duplicate lines. I wanted to get rid
of the duplicates.
I chose to parse to get uniqe element.
f1 = open('mfile','r')
da = f1.read().split('\n')
dat = da[:-1]
f2 = open('res','w')
dset = Set(dat)
for i in dset:
f2.write(i)
f2.write('\n')
f2.close()
Problem: Python says it cannot hande such a large
file.
Any ideas please help me.
cheers
srini
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the Tutor
mailing list