[Tutor] please help

Sat Mar 22 14:56:28 CET 2014

On 03/21/2014 10:39 PM, Cameron Simpson wrote:
> On 21Mar2014 20:31, Mustafa Musameh <jmmy71 at yahoo.com> wrote:
>> Please help. I have been search the internet to understand how to write a simple program/script with python, and I did not do anything.
>> I have a file that look like this
>>> ID 1
>> agtcgtacgt…
>>> ID 2
>> attttaaaaggggcccttcc
>> .
>> .
>> .
>> in other words, it contains several IDs each one has a sequence of 'acgt' letters
>> I need to write a script in python where the output will be, for example, like this
>>> ID 1
>> a = 10%, c = 40%,  g=40%, t = 10%
>>> ID 2
>> a = 15%, c = 35%,  g=35%, t = 15%
>> .
>> .
>> .
>> (i mean the first line is the ID and the second line is the frequency of each letter )
>> How I can tell python to print the first line as it is and count characters starting from the second line till the beginning of the next '>' and so on
>
> You want a loop that reads lines in pairs. Example:
>
>    while True:
>      line1 = fp.readline()
>      print line1,
>      line2 = fp.readline()
>      ... process the line and report ...
>
> Then to process the line, iterate over the line. Because a line is
> string, and a string is a sequence of characters, you can write:
>
>    for c in line2:
>      ... collect statistics about c ...
>    ... print report ...
>
> I would collect the statistics using a dictionary to keep count of
> the characters. See the dict.setdefault method; it should be helpful.

I think it would be easier to do that in 2 loops:
* first read the file line by line, building a list of pairs (id, base-sequence)
   (and on the fly check the input is correct, if needed)
* then traverse the sequences of bases to get numbers & percentages, and write out

d