[Tutor] need to get unique elements out of a 2.5Gb file

Srinivas Iyyer srini_iyyer_bio at yahoo.com
Thu Feb 2 06:50:44 CET 2006


Hi Group,

I have a file which is 2.5 Gb.,
 
TRIM54  NM_187841.1     GO:0004984
TRIM54  NM_187841.1     GO:0001584
TRIM54  NM_187841.1     GO:0003674
TRIM54  NM_187841.1     GO:0004985
TRIM54  NM_187841.1     GO:0001584
TRIM54  NM_187841.1     GO:0001653
TRIM54  NM_187841.1     GO:0004984

There are many duplicate lines.  I wanted to get rid
of the duplicates.

I chose to parse to get uniqe element.

f1 = open('mfile','r')
da = f1.read().split('\n')
dat = da[:-1]
f2 = open('res','w')
dset = Set(dat)
for i in dset:
    f2.write(i)
    f2.write('\n')
f2.close()

Problem: Python says it cannot hande such a large
file. 
Any ideas please help me.

cheers
srini

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the Tutor mailing list