[Tutor] Finding all locations of a sequence
nuin at genedrift.org
Thu Jun 14 21:46:51 CEST 2007
You can use two approaches:
1- String method find
This returns a int value with the lowest position of your search on the
string (sequence) you are searching. From the documentation:
*find*( sub[, start[, end]])
Return the lowest index in the string where substring sub is found, such
that sub is contained in the range [start, end]. Optional arguments
start and end are interpreted as in slice notation. Return |-1| if sub
is not found.
position = sequence1.find('chicken')
If your search is not found on your sequence, it will return -1.
You put this in a while loop and you can then search for all occurrences
of your search string.
2- Use regular expression
You have to import the re module and use the finditer method. The
finditer method will return the iterator of your regular expression
matches. Basically you will have to compile a regular expression
mysearch = re.compile('chicken')
and the use the finditer method:
iterator = mysearch.finditer(sequence1)
And then you use a loop to return all the matches
for match in iterator:
... print match.span()
Each span object is a pair of positions (begin and end)
of your regular expression matches in the sequence.
To know more about regex check this page
> Ok, please bear with me, I'm very new to programming and python. And
> my question is rather...convoluted.
> I have a bunch of sequences (about 4100 or so), and want to know where
> they are in a very, very large string of letters. But wait, there's
> more. Some of these sequences match to more than 'word' (for
> example...to be consistent with later, chicken could match to poultry
> or chicken).
> example of what I want to do (on a much smaller scale):
> Say I have chicken and I want to know where it occurs in a string of
> words, but I want it to match to both chicken and poultry and have the
> output of:
> chicken (locations of chicken and poultry in the string)
> or something like that.
> The string I'm dealing with is really large, so whatever will get
> through it the fastest is ideal for me.
> I hope this all makes sense...
> If it's possible to have pseudocode that would be helpful.
> Tutor maillist - Tutor at python.org
More information about the Tutor