Thank you, 
just a reminder of my data:
it cossets of multiple sequences of DNA that I need to count the bases(characters) and calculate the percentage of C+G and calculate the entropy.
before each sequence there is a header or identifier (lets say ID)
so it is like
>ID 1…etc
AAGGTAACCATATATACCGGG….etc (up to or even more than 3000 characters)
>ID 2…etc
… etc
I need the out pu to be like this:
> ID…1.. etc
sequence length = a value
G & G content: a value
Entropy = a value
> ID…2.. etc
sequence length = a value
G & G content: a value
Entropy = a value

I wrote a program close to what Denis suggested , however it works only if I have one sequence (one header and one sequence), I can not modify it to work if I have several sequences (like above). I also get an incorrect value for entropy (H) 

seq = ''
while True:
        line = raw_input()
        index = line.find('>')
        if index > -1:
            print line
            line = line.rstrip()
            line = line.upper()
            seq = seq + line
print ' Sequence length : ', len(seq)
counters = {}
for char in seq:
    char = char.strip()
    if counters.has_key(char):
        counters[char] += 1
        counters[char] = 1
c_g = 100*(counters['C']+counters['G'])/len(seq)
print ' The C & G content: ' '%.1f'%  c_g, '%'
import math
all = len(seq)
Pa = (counters['A'])/all
Pc = counters['C']/all
Pg = counters['G']/all
Pt = counters['T']/all

H =-1*(Pa*math.log(Pa,2) + Pc*math.log(Pc,2) + Pg*math.log(Pg,2) + Pt*math.log(Pt,2))

print ' H = ' , H

I do not know why Pa, Pc, Pg, Pt give me a value of 0, although when I type counters['A'] or counters['C']. counters[T'] , counters['G'] or all I get values > 0.

So please how I can fix this calculations and how I modify this program to read each sequence, print the results then read the second one and print the results and so on..

Many thanks for your help and support.

On 03/23/2014 07:28 AM, Mustafa Musameh wrote:
> Hi;
> I have a file that looks like this:
>> title 1
>> title 2
> ?
> I wrote the following to count the As, Cs, Gs anTs for each title I wrote the following
> import sys
> file = open('file.fna')
> data=file.readlines()
> for line in data:
>      line = line.rstrip()
>      if line.startswith('>') :
>          print line
>      if not line.startswith('>') :
     seq = line.rstrip()
>          counters={}
>          for char in seq:
>              counters[char] = counters.get(char,0) + 1
>          Ks = counters.keys()
>          Ks.sort()
>          for k in Ks:
>              print sum(counters.itervalues())
> I want to get the following out put:
>> title
> 234
>> title 1
> 3453
> ?.
> but what i get
>> title 1
> 60
> 60
> 60
> 60
> ?
> it seems it do counting for each line and print it out.
> Can you help me please
> Thanks

(Your code does
 not work at all, as is. Probably you did not just copy paste a 
ruuning program.)

You are not taking into account the fact that there is a predefinite and small 
set of of bases, which are the keys of the 'counters' dict. This would simplify 
your code: see line below with "***". Example (adapted to python 3, and to read 
a string directly, instead of a file):

data = """\
>title 1
>title 2

for line in data.split("\n"):
     line = line.strip()
     if line == "":      # for last line, maybe others
     if line.startswith('>'):

     counters = {"A":0, "C":0, "G":0, "T":0}    # ***
     for base in
         counters[base] += 1
     bases = ["A","C","G","T"]            # ***
     for base in bases:
         print(counters[base], end=" ")

>title 1
5 3 3 4
7 1 1 5
>title 2
6 3 4 3
5 0 0 4

Is this what you want?



On 23/03/14 06:28, Mustafa Musameh wrote:
> Hi;
> I have a file that looks like this:
>  >title 1
>  >title 2
> ?

> I want to get the following out put:
>  >title
> 234
>  >title 1
> 3453
> ?.

Your example data and example output don't match - at least
not in any way I can see.

Can you provide sample input and output from that sample?
That will help us understand exactly what you want.

It might be useful to break the code into functions so that
you have one to read the lines and if appropriate call a
second that analyzes a line returning the counts. Then
a third function can print the results in the format
you want. An optional fourth function could assign
analysis results to the dictionary but that's probably

You could even ignore the first one and just make
it your main driver code, but the second and third would
be helpful in testing and make the main code easier
to read.

On 23Mar2014 17:28, Mustafa Musameh <jmmy71 at yahoo.com> wrote:
> Hi;
> I have a file that looks like this:
> >title 1
> >title 2
> I wrote the following to count the As, Cs, Gs anTs for each title I wrote the
> following
> import sys
> file = open('file.fna')
> data=file.readlines()
> for line in data:
>     line = line.rstrip()
>     if line.startswith('>') :
>         print line
>     if not
 line.startswith('>') :

You could just say "else" here instead of "if not".

>         seq = line.rstrip()
>         counters={}
>         for char in seq:
>             counters[char] = counters.get(char,0) + 1
>         Ks = counters.keys()
>         Ks.sort()
>         for k in Ks:
>             print sum(counters.itervalues())

This prints the same sum as many times as there are keys.
Notice that your print statement has no mention of "k"?

You either want just the "print" with no loop over Ks or you want
the loop, with some expression inside which changes depending on
the value of "k". You call, of course, depending on your desired

I am reading Practical Programming - An Introduction to Computer Science 
Using Python 3.  They give this example:


>>> -3 .__abs__()

When I try it in idle or a terminal I get different results.

Python 3.3.5 (default, Mar 12 2014, 02:09:17)
[GCC 4.6.3] on linux

>>> abs(-3)

>>> -3 .__abs__()

If I use a variable it works.

>>> x = -3
>>> x.__abs__()

I am curious as to what is happening.  Is the book wrong?  I checked 
it's errata and nothing is mentioned.

Regards, Jim


Jim Byrnes <jf_byrnes at comcast.net> writes:

> I am reading Practical Programming - An Introduction to Computer
> Science Using Python 3.  They give this example:
> >>> abs(-3)
> 3
> >>> -3 .__abs__()
> 3

That's a poor example, in my opinion. It's not good for an introductory
text to show calling dunder methods like that on an integer literal.

Perhaps you could communicate with the maintainer of that material, to
point out the problem with their example. Hopefully they will remove the
example from an
 introductory text.

> Python 3.3.5 (default, Mar 12 2014, 02:09:17)
> [GCC 4.6.3] on linux
> >>> abs(-3)
> 3
> >>> -3 .__abs__()
> -3

Yes, I get the same result as you.

The reason is that the expression is being evaluated as::

    -( (3) .__abs__() )

That is:

* Create the integer object 3

* Get the result of that object's ?__abs__? method

* Negate (invert the sign) of the value

(If you really care, see the end of this message[0] for a demonstration
that this is exactly what happens.)

Presumably the expression should be different, as shown in your next

> If I use a variable it works.
> >>> x = -3
> >>> x.__abs__()
> 3

Yes, this is a better way to do it. But why are we calling the ?__abs__?
function directly at all? That is,
 after all, the point of the ?abs?
built-in function: to call the correct method on the object::

    >>> x = -3
    >>> abs(x)

> I am curious as to what is happening.  Is the book wrong?  I checked
> it's errata and nothing is mentioned.

I think that this is both an erratum, and a demonstration that the
example is a bad idea for the book entirely. Can you contact the
maintainer of that work to let them know?


    >>> import dis
    >>> dis.dis("-3 .__abs__()")
      1           0 LOAD_CONST               0 (3) 
                  3 LOAD_ATTR                0 (__abs__) 
     6 CALL_FUNCTION            0 (0 positional, 0 keyword pair) 
                  9 UNARY_NEGATIVE      
                 10 RETURN_VALUE        

Jim Byrnes <jf_byrnes at comcast.net> Wrote in message:
> I am reading Practical Programming - An Introduction to Computer Science 
> Using Python 3.  They give this example:
>  >>> abs(-3)
> 3
>  >>> -3 .__abs__()
> 3

Ben is right,  dunder methods don't belong in introductory texts.
And they seldom should be called directly;  they're there for
folks who are defining new classes that want to mimic behavior of

But he didn't show you the simplest fix:




Dave Angel <davea at davea.name> writes:

> Ben is right,  dunder methods don't belong in introductory texts.

> But he didn't show you the simplest
> (-3).__abs__()

I disagree; the simplest fix is not to call that method directly, and
just use::


