remove last 76 letters from string
MRAB
python at mrabarnett.plus.com
Wed Aug 5 20:32:22 EDT 2009
PeroMHC wrote:
> Hi All, So here is the problem... I have a FASTA file (used for DNA
> analyses) that looks like this:
>
> ...
>> gnl|SRA|SRR019045.10.1 SL-XAY_956090708:2:1:0:1028.1 length=152
> NCTTTTTTTATTTTTTGTATAAATGAAGTTTCACTATATCGGACGAGCGGTTCAGCAGTCATTCCGAGAC
> CGATATAGTGAAACTTCATTTCTACAAAAANTACCAAACGTCGCTCGGCAGAGCGTCGTGTTGGGCAAGA
> GAGTAGCACTCG
>> gnl|SRA|SRR019045.11.1 SL-XAY_956090708:2:1:0:1151.1 length=152
> NGGTNTGGNNNNCNCCNTNCTNCNNCNTCANCCTCCNGTCNCANNCCNCNTNNNNNCNNNNNCNNTNCTT
> CTNCNNTCTCCATTCCTTCTTNATAGCCTGCTCCANCGCACGTTGAACCTTCTGCACCACGAACGCACTC
> ACACCACTCATC
>> gnl|SRA|SRR019045.12.1 SL-XAY_956090708:2:1:0:1197.1 length=152
> NGTCGGGTCTTCGCTATCACTGGACTGCTCCCATCAGCTATAGGTCCTCCCCGCCACACCCCATGCCCAC
> CGCCTATCCACGTCTGTCACAACCTCATACATCAGACAGTCACACTTACCAACATATCCAAGCACCTCAA
> GCAACACATCAT
> ...
>
> This snippet represents 3 individual DNA sequences. Each sequences is
> identified by the line starting with >
> The complete file has about 10 million individual sequences.
>
> A simple enough problem, I want to read in this data, and cut out the
> last 76 letters (nucleotides) from each individual sequence and send
> them to a new txt file with a similar format.
>
> Any help on how to do this would be appreciated.
> Thanks!
If the input file is large then you can reduce the amount of memory
needed by reading the input file a line at a time by iterating over the
file object:
input_file = open(input_path)
for line in input_file:
...
input_file.close()
Each line will end with '\n', so use the 'rstrip' method to remove it,
and then slice the last 76 characters:
last_part = line.rstrip()[-76 : ]
More information about the Python-list
mailing list