remove last 76 letters from string
python at mrabarnett.plus.com
Thu Aug 6 02:32:22 CEST 2009
> Hi All, So here is the problem... I have a FASTA file (used for DNA
> analyses) that looks like this:
>> gnl|SRA|SRR019045.10.1 SL-XAY_956090708:2:1:0:1028.1 length=152
>> gnl|SRA|SRR019045.11.1 SL-XAY_956090708:2:1:0:1151.1 length=152
>> gnl|SRA|SRR019045.12.1 SL-XAY_956090708:2:1:0:1197.1 length=152
> This snippet represents 3 individual DNA sequences. Each sequences is
> identified by the line starting with >
> The complete file has about 10 million individual sequences.
> A simple enough problem, I want to read in this data, and cut out the
> last 76 letters (nucleotides) from each individual sequence and send
> them to a new txt file with a similar format.
> Any help on how to do this would be appreciated.
If the input file is large then you can reduce the amount of memory
needed by reading the input file a line at a time by iterating over the
input_file = open(input_path)
for line in input_file:
Each line will end with '\n', so use the 'rstrip' method to remove it,
and then slice the last 76 characters:
last_part = line.rstrip()[-76 : ]
More information about the Python-list