[Tutor] learning curve
Kent Johnson
kent37 at tds.net
Mon Jan 29 16:56:02 CET 2007
Daniel Klose wrote:
> Hi all,
>
> All I would like to do is take a file and count the number of times a
> letter occurs in it. It so happens that there letters are amino acids.
> There are also some other checks in the script but these are not a
> concern just yet.
>
> What I would like to do is create a dictionary of arrays.
> In perl (my current scripting language of choice) I would simply put:
> ${$dictionary{$key}}[$element] += 1
> I have no idea how to create this kind of structure in python.
I don't speak perl much but it looks like you have a dict whose values
are lists. Not quite the same as what you have below, which is a dict
whose values are integers.
>
> Also I have a while loop. If this were perl, rather than using the i =
> 0 while(i < len(x)):
> I would do : for (my $i = 0; $i < @array; $i++) {}. I have found the
> range function but I am not sure how to use it properly.
You could use
for i in range(len(strArray)):
but this is not good usage; better to iterate over strArray directly.
> What I would like to do is create an index that allows me to access the
> same element in two arrays (lists) of identical size.
You can use the zip() function to process two lists in parallel:
for x, y in zip(xlist, ylist):
# x is an element from xlist
# y is the corresponding element from ylist
>
> I have pasted in my current code below, I would be very grateful if you
> could help me trim up this code.
> #!/usr/bin/python
>
> import sys, os
>
> structDir = '/home/danny/dataset/structure/'
> seqDir = '/home/danny/dataset/sequence/'
>
> target = sys.argv[1]
>
> seqFile = seqDir + target
> strFile = structDir + target
os.path.join() would be more idiomatic here though what you have works.
>
> seqDictionary = {}
>
> if (os.path.isfile(seqFile) and os.path.isfile(strFile)):
>
> structureHandle = open(strFile)
> structureString = structureHandle.readline()
>
> sequenceHandle = open(seqFile)
> sequenceString = sequenceHandle.readline()
>
> strArray = list(structureString)
> seqArray = list(sequenceString)
You don't have to convert to lists; strings are already sequences.
>
> if len(strArray) == len(seqArray):
> print "Length match\n"
>
> i=0
> while(i < len(strArray)):
> if seqDictionary.has_key(seqArray[i]):
> seqDictionary[seqArray[i]] += 1
> else:
> seqDictionary[seqArray[i]] = 1
>
> i += 1
The idiomatic way to iterate over sequenceString is just
for c in sequenceString:
You don't seem to be using strArray except to get the length. Maybe this
is where you need zip()? For example you could say
for structChr, seqChr in zip(structureString, sequenceString):
An alternative to your conditional with has_key() is to use dict.get()
with a default value:
seqDictionary[c] = seqDictionary.get(c, 0) + 1
so the whole loop becomes just
for c in sequenceString:
seqDictionary[c] = seqDictionary.get(c, 0) + 1
In Python 2.5 you can use defaultdict to create a dict with a default
value of 0:
from collections import defaultdict
seqDictionary = defaultdict(int)
then in the loop you can say
seqDictionary[c] += 1
Kent
More information about the Tutor
mailing list