[Tutor] learning curve

Kent Johnson kent37 at tds.net
Mon Jan 29 16:56:02 CET 2007


Daniel Klose wrote:
> Hi all,
> 
> All I would like to do is take a file and count the number of times a
> letter occurs in it.  It so happens that there letters are amino acids.
> There are also some other checks in the script but these are not a
> concern just yet.
> 
> What I would like to do is create a dictionary of arrays.
> In perl (my current scripting language of choice) I would simply put:
>  ${$dictionary{$key}}[$element] += 1
> I have no idea how to create this kind of structure in python.

I don't speak perl much but it looks like you have a dict whose values 
are lists. Not quite the same as what you have below, which is a dict 
whose values are integers.
> 
> Also I have a while loop.  If this were perl, rather than using the i =
> 0 while(i < len(x)):
> I would do : for (my $i = 0; $i < @array; $i++) {}.  I have found the
> range function but I am not sure how to use it properly.

You could use
   for i in range(len(strArray)):
but this is not good usage; better to iterate over strArray directly.

> What I would like to do is create an index that allows me to access the
> same element in two arrays (lists) of identical size.

You can use the zip() function to process two lists in parallel:
for x, y in zip(xlist, ylist):
   # x is an element from xlist
   # y is the corresponding element from ylist
> 
> I have pasted in my current code below, I would be very grateful if you
> could help me trim up this code.
> #!/usr/bin/python
> 
> import sys, os
> 
> structDir = '/home/danny/dataset/structure/'
> seqDir   = '/home/danny/dataset/sequence/'
> 
> target = sys.argv[1]
> 
> seqFile = seqDir      + target
> strFile = structDir   + target

os.path.join() would be more idiomatic here though what you have works.
> 
> seqDictionary = {}
> 
> if (os.path.isfile(seqFile) and os.path.isfile(strFile)):
>    
>     structureHandle = open(strFile)
>     structureString = structureHandle.readline()
>    
>     sequenceHandle  = open(seqFile)
>     sequenceString = sequenceHandle.readline()
>    
>     strArray = list(structureString)
>     seqArray = list(sequenceString)

You don't have to convert to lists; strings are already sequences.
> 
>     if len(strArray) == len(seqArray):
>         print "Length match\n"
>        
>         i=0
>         while(i < len(strArray)):
>             if seqDictionary.has_key(seqArray[i]):
>                 seqDictionary[seqArray[i]] += 1
>             else:
>                 seqDictionary[seqArray[i]] = 1
>                
>             i += 1

The idiomatic way to iterate over sequenceString is just
   for c in sequenceString:

You don't seem to be using strArray except to get the length. Maybe this 
is where you need zip()? For example you could say
   for structChr, seqChr in zip(structureString, sequenceString):

An alternative to your conditional with has_key() is to use dict.get() 
with a default value:
   seqDictionary[c] = seqDictionary.get(c, 0) + 1

so the whole loop becomes just
   for c in sequenceString:
     seqDictionary[c] = seqDictionary.get(c, 0) + 1

In Python 2.5 you can use defaultdict to create a dict with a default 
value of 0:
from collections import defaultdict
seqDictionary = defaultdict(int)

then in the loop you can say
     seqDictionary[c] += 1

Kent



More information about the Tutor mailing list