[Tutor] character counting

spir denis.spir at gmail.com
Sun Mar 23 13:27:03 CET 2014


On 03/23/2014 07:28 AM, Mustafa Musameh wrote:
> Hi;
> I have a file that looks like this:
>> title 1
> AAATTTGGGCCCATA...
> TTAACAAGTTAAAT…
>> title 2
> AAATTTAAACCCGGGG…
> ATATATATA…
>>
> I wrote the following to count the As, Cs, Gs anTs for each title I wrote the following
> import sys
>
> file = open('file.fna')
>
> data=file.readlines()
>
> for line in data:
>
>      line = line.rstrip()
>
>      if line.startswith('>') :
>
>          print line
>
>      if not line.startswith('>') :
>
>          seq = line.rstrip()
>
>          counters={}
>
>          for char in seq:
>
>              counters[char] = counters.get(char,0) + 1
>
>          Ks = counters.keys()
>
>          Ks.sort()
>
>          for k in Ks:
>
>              print sum(counters.itervalues())
>
>
>
>
>
> I want to get the following out put:
>
>> title
> 234
>> title 1
> 3453
> ….
> but what i get
>> title 1
> 60
> 60
> 60
> 60
>> it seems it do counting for each line and print it out.
>
> Can you help me please
> Thanks

(Your code does not work at all, as is. Probably you did not just copy paste a 
ruuning program.)

You are not taking into account the fact that there is a predefinite and small 
set of of bases, which are the keys of the 'counters' dict. This would simplify 
your code: see line below with "***". Example (adapted to python 3, and to read 
a string directly, instead of a file):

data = """\
>title 1
AAATTTGGGCCCATA
TTAACAAGTTAAAT
>title 2
AAATTTAAACCCGGGG
ATATATATA
"""

for line in data.split("\n"):
     line = line.strip()
     if line == "":      # for last line, maybe others
         continue
     if line.startswith('>'):
         print(line)
         continue

     counters = {"A":0, "C":0, "G":0, "T":0}	# ***
     for base in line:
         counters[base] += 1
     bases = ["A","C","G","T"]			# ***
     for base in bases:
         print(counters[base], end=" ")
     print()
==>

>title 1
5 3 3 4
7 1 1 5
>title 2
6 3 4 3
5 0 0 4

Is this what you want?

denis


More information about the Tutor mailing list