[Tutor] Finding all locations of a sequence

Paulo Nuin nuin at genedrift.org
Thu Jun 14 21:46:51 CEST 2007


Hi Lauren

You can use two approaches:

 1- String method find

This returns a int value with the lowest position of your search on the 
string (sequence) you are searching. From the documentation:

*find*( sub[, start[, end]])
Return the lowest index in the string where substring sub is found, such 
that sub is contained in the range [start, end]. Optional arguments 
start and end are interpreted as in slice notation. Return |-1| if sub 
is not found.
Example:

position = sequence1.find('chicken')

If your search is not found on your sequence, it will return -1.
You put this in a while loop and you can then search for all occurrences 
of your search string.

2- Use regular expression
You have to import the re module and use the finditer method. The 
finditer method will return the iterator of your regular expression 
matches. Basically you will have to compile a regular expression

mysearch = re.compile('chicken')

and the use the finditer method:

iterator = mysearch.finditer(sequence1)

And then you use a loop to return all the matches

for match in iterator:
...     print match.span()

Each span object is a pair of positions (begin and end) 
of your regular expression matches in the sequence. 
To know more about regex check this page

http://www.amk.ca/python/howto/regex/

HTH

Paulo









Lauren wrote:
> Ok, please bear with me, I'm very new to programming and python. And
> my question is rather...convoluted.
>
> I have a bunch of sequences (about 4100 or so), and want to know where
> they are in a very, very large string of letters. But wait, there's
> more. Some of these sequences match to more than 'word' (for
> example...to be consistent with later, chicken could match to poultry
> or chicken).
>
> example of what I want to do (on a much smaller scale):
>
> Say I have chicken and I want to know where it occurs in a string of
> words, but I want it to match to both chicken and poultry and have the
> output of:
>
> chicken  (locations of chicken and poultry in the string)
>
> or something like that.
>
> The string I'm dealing with is really large, so whatever will get
> through it the fastest is ideal for me.
>
> I hope this all makes sense...
>
> If it's possible to have pseudocode that would be helpful.
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>   



More information about the Tutor mailing list