Organize large DNA txt files
thomasvangurp at gmail.com
thomasvangurp at gmail.com
Fri Mar 20 10:45:24 EDT 2009
Dear Fellow programmers,
I'm using Python scripts too organize some rather large datasets
describing DNA variation. Information is read, processed and written
too a file in a sequential order, like this
1+
1-
2+
2-
etc.. The files that i created contain positional information
(nucleotide position) and some other info, like this:
file 1+:
--------------------------------------------
1 73 0 1 0 0
1 76 1 0 0 0
1 77 0 1 0 0
--------------------------------------------
file 1-
--------------------------------------------
1 74 0 0 6 0
1 78 0 0 4 0
1 89 0 0 0 2
Now the trick is that i want this:
File 1+ AND File 1-
--------------------------------------------
1 73 0 1 0 0
1 74 0 0 6 0
1 76 1 0 0 0
1 77 0 1 0 0
1 78 0 0 4 0
1 89 0 0 0 2
-------------------------------------------
So the information should be sorted onto position. Right now I've
written some very complicated scripts that read a number of lines from
file 1- and 1+ and then combine this output. The problem is of course
that the running number of file 1- can be lower then 1+, resulting in
a incorrect order. Since both files are too large to input in a
dictionary at once (both are 100 MB+) I need some sort of a
alternative that can quickly sort everything without crashing my pc..
Your thoughts are appreciated..
Kind regards,
Thomas
More information about the Python-list
mailing list