[Tutor] large file

Alan Gauld alan.gauld at btinternet.com
Mon Jun 14 02:19:21 CEST 2010

"Hs Hs" <ilhs_hs at yahoo.com> wrote

> I have a very large file 15Gb.

> Every two lines are part of one readgroup.
> I want to add two variables to every line.

> HWUSI-EAS1211_0001:1:1:977:20764#0   RG:Z:2301
> HWUSI-EAS1211_0001:1:1:977:20764#0    RG:Z:2302
> ...
> Since I cannot read the entire file, I wanted to cat the file

What makes you think you cannot read the entire file?

> something like this:
> cat myfile  | python myscript.py > myfile.sam

How does that help over Python reading the file line by line?

> I do not know how to execute my logic after I read the line, 
> althought I tried:

> while True:
>        second = raw_input()
>        x =  second.split('\t')

Why are you splitting theline? You only need to append
data to the end of the line...

> Could someone help me here either what I want to do.

In pseudo code:

open input and ouput files
read the first 14 lines from input
oddLine = True
while True:
     read line from input
     if oddLine:
            append odd data
           append evenData
     write line to output file
     oddLine = not oddLine

You probably want a try/except in there to catch the end of file.

This is not very different from the menu example in the file
handling topic of my tutorial...


Alan Gauld
Author of the Learn to Program web site

More information about the Tutor mailing list