Splitting a file from specific column content

Sun Jan 22 09:45:08 EST 2012

In article 
<e1f0636a-195c-4fbb-931a-4d619d5f0d18 at g27g2000yqa.googlegroups.com>,
 Yigit Turgut <y.turgut at gmail.com> wrote:

> Hi all,
> 
> I have a text file approximately 20mb in size and contains about one
> million lines. I was doing some processing on the data but then the
> data rate increased and it takes very long time to process. I import
> using numpy.loadtxt, here is a fragment of the data ;
> 
> 0.000006 	 -0.0004
> 0.000071 	 0.0028
> 0.000079 	 0.0044
> 0.000086 	 0.0104
> .
> .
> .
> 
> First column is the timestamp in seconds and second column is the
> data. File contains 8seconds of measurement, and I would like to be
> able to split the file into 3 parts seperated from specific time
> locations. For example I want to divide the file into 3 parts, first
> part containing 3 seconds of data, second containing 2 seconds of data
> and third containing 3 seconds.

I would do this with standard unix tools:

grep '^[012]' input.txt > first-three-seconds.txt
grep '^[34]' input.txt > next-two-seconds.txt
grep '^[567]' input.txt > next-three-seconds.txt

Sure, it makes three passes over the data, but for 20 MB of data, you 
could have the whole job done in less time than it took me to type this.

As a sanity check, I would run "wc -l" on each of the files and confirm 
that they add up to the original line count.