[Tutor] Finding text in a long string

Alfred Milgrom fredm@smartypantsco.com
Tue Jun 17 00:22:26 2003


At 10:51 PM 16/06/03 -0400, Timothy M. Brauch wrote:
>I am just having troubles with something I know is easy.
>
>I've got a small script written that pulls the data from a webpage.  What I
>am trying to do is to see if "image1.gif" is in the html or not.  The
>problem I am having is that it might be labelled as
>http://domain.com/image1.gif or it might be ../images/image1.gif or
>something else (it can change with each page) and it is going to be in an
>html tag.  I could do it if "image1.gif" was plain text.  I'm having trouble
>however in this case.
>
>I tried some things like:
>
>data = urllib.urlopen(ADDRESS).read()
>if "image1.gif" in data:
>     print "Success"
>else: print "Failure
>
>That gives me the error
>Traceback (most recent call last):
>   File "<pyshell#23>", line 1, in ?
>     if "offline.gif" in data:
>TypeError: 'in <string>' requires character as left operand
>which tells me it can only check if a certain character is in a string or
>not.
<snip>

I'm sure that there are better ways of doing what you want than what I am 
recommending, but here goes anyway:

1. Use the startswith() method for strings. Unfortunately this means you 
have to test this at every position in the string.

for i in range(len(data)):
         if data[i:].startswith("image1.gif"):
             print "Success"

2. Same approach but reduce the work required by checking first if the 
first letter matches:

for i in range(len(data)):
         if data[i]=='i' and data[i:].startswith("image1.gif"):
             print "Success"

3. As David Broadwell suggested, you can divide data further. Since 
"image1.gif" will always be part of a URL, it will always have '/' before 
it. You can therefore split data using '/'

newdata = data.split('/')
for phrase in newdata:
         if phrase.startswith("image1.gif"):
             print "Success"

Hope these ideas help or trigger some better solution,
Alfred Milgrom