[Tutor] Tutor Digest, Vol 121, Issue 56

Jumana yousef jmmy71 at yahoo.com
Mon Mar 24 09:43:02 CET 2014


Thank you, 
just a reminder of my data:
it cossets of multiple sequences of DNA that I need to count the bases(characters) and calculate the percentage of C+G and calculate the entropy.
before each sequence there is a header or identifier (lets say ID)
so it is like
>ID 1…etc
AAGGTAACCATATATACCGGG….etc (up to or even more than 3000 characters)
>ID 2…etc
AAATTTTTAAATTTTTTAAAATATATATACGCGCGCATGCCCCGGGGG….. etc
… etc
I need the out pu to be like this:
> ID…1.. etc
sequence length = a value
G & G content: a value
Entropy = a value
> ID…2.. etc
sequence length = a value
G & G content: a value
Entropy = a value
….etc


I wrote a program close to what Denis suggested , however it works only if I have one sequence (one header and one sequence), I can not modify it to work if I have several sequences (like above). I also get an incorrect value for entropy (H) 

#!/usr/bin/python
seq = ''
while True:
    try:
        line = raw_input()
        index = line.find('>')
        if index > -1:
            print line
        else:
            line = line.rstrip()
            line = line.upper()
            seq = seq + line
    except:
        break
print ' Sequence length : ', len(seq)
counters = {}
for char in seq:
    char = char.strip()
    if counters.has_key(char):
        counters[char] += 1
    else:
        counters[char] = 1
c_g = 100*(counters['C']+counters['G'])/len(seq)
print ' The C & G content: ' '%.1f'%  c_g, '%'
import math
all = len(seq)
Pa = (counters['A'])/all
Pc = counters['C']/all
Pg = counters['G']/all
Pt = counters['T']/all

H =-1*(Pa*math.log(Pa,2) + Pc*math.log(Pc,2) + Pg*math.log(Pg,2) + Pt*math.log(Pt,2))

print ' H = ' , H

I do not know why Pa, Pc, Pg, Pt give me a value of 0, although when I type counters['A'] or counters['C']. counters[T'] , counters['G'] or all I get values > 0.

So please how I can fix this calculations and how I modify this program to read each sequence, print the results then read the second one and print the results and so on..

Many thanks for your help and support.


On Monday, 24 March 2014 5:09 PM, "tutor-request at python.org" <tutor-request at python.org> wrote:
 
Send Tutor mailing list submissions to
    tutor at python.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://mail.python.org/mailman/listinfo/tutor
or, via email, send a message with subject or body 'help' to
    tutor-request at python.org

You can reach the person managing the list at
    tutor-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Tutor
 digest..."


Today's Topics:

   1. Re: character counting (spir)
   2. Re: character counting (Alan Gauld)
   3. Re: character counting (Cameron Simpson)
   4. __abs__()  not acting as expected (Jim Byrnes)
   5. Expressions, literals,    operator precedence (was: __abs__()
      not acting as expected) (Ben Finney)
   6. Re: __abs__()  not acting as expected (Dave Angel)
   7. Re: __abs__()  not acting as expected (Ben Finney)


----------------------------------------------------------------------

Message: 1
Date: Sun, 23 Mar 2014 13:27:03 +0100
From: spir <denis.spir at gmail.com>
To: tutor at python.org
Subject: Re: [Tutor] character counting
Message-ID:
 <532ED317.7090003 at gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed

On 03/23/2014 07:28 AM, Mustafa Musameh wrote:
> Hi;
> I have a file that looks like this:
>> title 1
> AAATTTGGGCCCATA...
> TTAACAAGTTAAAT?
>> title 2
> AAATTTAAACCCGGGG?
> ATATATATA?
> ?
>
> I wrote the following to count the As, Cs, Gs anTs for each title I wrote the following
> import sys
>
> file = open('file.fna')
>
> data=file.readlines()
>
> for line in data:
>
>      line = line.rstrip()
>
>      if line.startswith('>') :
>
>          print line
>
>      if not line.startswith('>') :
>
>     
     seq = line.rstrip()
>
>          counters={}
>
>          for char in seq:
>
>              counters[char] = counters.get(char,0) + 1
>
>          Ks = counters.keys()
>
>          Ks.sort()
>
>          for k in Ks:
>
>              print sum(counters.itervalues())
>
>
>
>
>
> I want to get the following out put:
>
>> title
> 234
>> title 1
> 3453
> ?.
> but what i get
>> title 1
> 60
> 60
> 60
> 60
> ?
> it seems it do counting for each line and print it out.
>
> Can you help me please
> Thanks

(Your code does
 not work at all, as is. Probably you did not just copy paste a 
ruuning program.)

You are not taking into account the fact that there is a predefinite and small 
set of of bases, which are the keys of the 'counters' dict. This would simplify 
your code: see line below with "***". Example (adapted to python 3, and to read 
a string directly, instead of a file):

data = """\
>title 1
AAATTTGGGCCCATA
TTAACAAGTTAAAT
>title 2
AAATTTAAACCCGGGG
ATATATATA
"""

for line in data.split("\n"):
     line = line.strip()
     if line == "":      # for last line, maybe others
         continue
     if line.startswith('>'):
         print(line)
         continue

     counters = {"A":0, "C":0, "G":0, "T":0}    # ***
     for base in
 line:
         counters[base] += 1
     bases = ["A","C","G","T"]            # ***
     for base in bases:
         print(counters[base], end=" ")
     print()
==>

>title 1
5 3 3 4
7 1 1 5
>title 2
6 3 4 3
5 0 0 4

Is this what you want?

denis


------------------------------

Message: 2
Date: Sun, 23 Mar 2014 16:23:19 +0000
From: Alan Gauld <alan.gauld at btinternet.com>
To: tutor at python.org
Subject: Re: [Tutor] character counting
Message-ID: <lgn1p9$i0o$1 at ger.gmane.org>
Content-Type: text/plain;
 charset=windows-1252; format=flowed

On 23/03/14 06:28, Mustafa Musameh wrote:
> Hi;
> I have a file that looks like this:
>  >title 1
> AAATTTGGGCCCATA...
> TTAACAAGTTAAAT?
>  >title 2
> AAATTTAAACCCGGGG?
> ATATATATA?
> ?
>


> I want to get the following out put:
>
>  >title
> 234
>  >title 1
> 3453
> ?.

Your example data and example output don't match - at least
not in any way I can see.

Can you provide sample input and output from that sample?
That will help us understand exactly what you want.

It might be useful to break the code into functions so that
you have one to read the lines and if appropriate call a
second that analyzes a line returning the counts. Then
a third function can print the results in the format
you want. An optional fourth function could assign
 the
analysis results to the dictionary but that's probably
overkill.

You could even ignore the first one and just make
it your main driver code, but the second and third would
be helpful in testing and make the main code easier
to read.


HTH
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos



------------------------------

Message: 3
Date: Mon, 24 Mar 2014 08:51:12 +1100
From: Cameron Simpson <cs at zip.com.au>
To: tutor at python.org
Subject: Re: [Tutor] character counting
Message-ID: <20140323215112.GA21052 at cskk.homeip.net>
Content-Type: text/plain; charset=us-ascii

On 23Mar2014 17:28, Mustafa Musameh <jmmy71 at yahoo.com> wrote:
> Hi;
> I have a file that looks like this:
> >title 1
> AAATTTGGGCCCATA...
> TTAACAAGTTAAAT
> >title 2
> AAATTTAAACCCGGGG
> ATATATATA
> 
> 
> I wrote the following to count the As, Cs, Gs anTs for each title I wrote the
> following
> 
> import sys
> 
> file = open('file.fna')
> 
> data=file.readlines()
> for line in data:
>     line = line.rstrip()
>     if line.startswith('>') :
>         print line
>     if not
 line.startswith('>') :

You could just say "else" here instead of "if not".

>         seq = line.rstrip()
>         counters={}
>         for char in seq:
>             counters[char] = counters.get(char,0) + 1
>         Ks = counters.keys()
>         Ks.sort()
>         for k in Ks:
>             print sum(counters.itervalues())

This prints the same sum as many times as there are keys.
Notice that your print statement has no mention of "k"?

You either want just the "print" with no loop over Ks or you want
the loop, with some expression inside which changes depending on
the value of "k". You call, of course, depending on your desired
result.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au>

"Don't you know the speed limit is 55 miles per hour???"
"Yeah, but I wasn't going to be out that long."
        - Steven Wright


------------------------------

Message: 4
Date: Sun, 23 Mar 2014 23:08:35 -0500
From: Jim Byrnes <jf_byrnes at comcast.net>
To: tutor at python.org
Subject: [Tutor] __abs__()  not acting as expected
Message-ID: <lgob3l$9bl$1 at ger.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

I am reading Practical Programming - An Introduction to Computer Science 
Using Python 3.  They give this example:

>>>
 abs(-3)
3

>>> -3 .__abs__()
3

When I try it in idle or a terminal I get different results.

Python 3.3.5 (default, Mar 12 2014, 02:09:17)
[GCC 4.6.3] on linux

>>> abs(-3)
3

>>> -3 .__abs__()
-3

If I use a variable it works.

>>> x = -3
>>> x.__abs__()
3

I am curious as to what is happening.  Is the book wrong?  I checked 
it's errata and nothing is mentioned.

Regards, Jim





------------------------------

Message: 5
Date: Mon, 24 Mar 2014 15:36:45 +1100
From: Ben Finney <ben+python at benfinney.id.au>
To: tutor at python.org
Subject: [Tutor] Expressions, literals,    operator precedence
 (was:
    __abs__()  not acting as expected)
Message-ID: <85r45sqkki.fsf at benfinney.id.au>
Content-Type: text/plain; charset=utf-8

Jim Byrnes <jf_byrnes at comcast.net> writes:

> I am reading Practical Programming - An Introduction to Computer
> Science Using Python 3.  They give this example:
>
> >>> abs(-3)
> 3
>
> >>> -3 .__abs__()
> 3

That's a poor example, in my opinion. It's not good for an introductory
text to show calling dunder methods like that on an integer literal.

Perhaps you could communicate with the maintainer of that material, to
point out the problem with their example. Hopefully they will remove the
example from an
 introductory text.

> Python 3.3.5 (default, Mar 12 2014, 02:09:17)
> [GCC 4.6.3] on linux
>
> >>> abs(-3)
> 3
>
> >>> -3 .__abs__()
> -3

Yes, I get the same result as you.

The reason is that the expression is being evaluated as::

    -( (3) .__abs__() )

That is:

* Create the integer object 3

* Get the result of that object's ?__abs__? method

* Negate (invert the sign) of the value

(If you really care, see the end of this message[0] for a demonstration
that this is exactly what happens.)


Presumably the expression should be different, as shown in your next
example::

> If I use a variable it works.
>
> >>> x = -3
> >>> x.__abs__()
> 3

Yes, this is a better way to do it. But why are we calling the ?__abs__?
function directly at all? That is,
 after all, the point of the ?abs?
built-in function: to call the correct method on the object::

    >>> x = -3
    >>> abs(x)
    3

> I am curious as to what is happening.  Is the book wrong?  I checked
> it's errata and nothing is mentioned.

I think that this is both an erratum, and a demonstration that the
example is a bad idea for the book entirely. Can you contact the
maintainer of that work to let them know?


[0]::

    >>> import dis
    >>> dis.dis("-3 .__abs__()")
      1           0 LOAD_CONST               0 (3) 
                  3 LOAD_ATTR                0 (__abs__) 
             
     6 CALL_FUNCTION            0 (0 positional, 0 keyword pair) 
                  9 UNARY_NEGATIVE      
                 10 RETURN_VALUE        

-- 
\       ?When I get new information, I change my position. What, sir, |
  `\             do you do with new information?? ?John Maynard Keynes |
_o__)                                                                  |
Ben Finney



------------------------------

Message: 6
Date: Mon, 24 Mar 2014 01:43:53 -0400 (EDT)
From: Dave Angel <davea at davea.name>
To: tutor at python.org
Subject: Re: [Tutor] __abs__()  not acting as expected
Message-ID: <lgogd8$krc$1 at ger.gmane.org>

Jim Byrnes <jf_byrnes at comcast.net> Wrote in message:
> I am reading Practical Programming - An Introduction to Computer Science 
> Using Python 3.  They give this example:
> 
>  >>> abs(-3)
> 3
> 
>  >>> -3 .__abs__()
> 3
> 

Ben is right,  dunder methods don't belong in introductory texts.
And they seldom should be called directly;  they're there for
folks who are defining new classes that want to mimic behavior of
builtins. 

But he didn't show you the simplest fix:

(-3).__abs__()

-- 
DaveA



------------------------------

Message: 7
Date: Mon, 24 Mar 2014 17:06:09 +1100
From: Ben Finney <ben+python at benfinney.id.au>
To: tutor at python.org
Subject: Re: [Tutor] __abs__()  not acting as expected
Message-ID: <85mwggqgfi.fsf at benfinney.id.au>
Content-Type: text/plain; charset=utf-8

Dave Angel <davea at davea.name> writes:

> Ben is right,  dunder methods don't belong in introductory texts.
[?]

> But he didn't show you the simplest
 fix:
>
> (-3).__abs__()

I disagree; the simplest fix is not to call that method directly, and
just use::

    abs(-3)

-- 
\          ?Computer perspective on Moore's Law: Human effort becomes |
  `\           twice as expensive roughly every two years.? ?anonymous |
_o__)                                                                  |
Ben Finney



------------------------------

Subject: Digest Footer

_______________________________________________
Tutor maillist  -  Tutor at python.org
https://mail.python.org/mailman/listinfo/tutor


------------------------------

End of Tutor Digest, Vol 121, Issue 56
**************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20140324/7d852eae/attachment-0001.html>


More information about the Tutor mailing list