[Tutor] Finding text in a long string
Alfred Milgrom
fredm@smartypantsco.com
Tue Jun 17 00:22:26 2003
At 10:51 PM 16/06/03 -0400, Timothy M. Brauch wrote:
>I am just having troubles with something I know is easy.
>
>I've got a small script written that pulls the data from a webpage. What I
>am trying to do is to see if "image1.gif" is in the html or not. The
>problem I am having is that it might be labelled as
>http://domain.com/image1.gif or it might be ../images/image1.gif or
>something else (it can change with each page) and it is going to be in an
>html tag. I could do it if "image1.gif" was plain text. I'm having trouble
>however in this case.
>
>I tried some things like:
>
>data = urllib.urlopen(ADDRESS).read()
>if "image1.gif" in data:
> print "Success"
>else: print "Failure
>
>That gives me the error
>Traceback (most recent call last):
> File "<pyshell#23>", line 1, in ?
> if "offline.gif" in data:
>TypeError: 'in <string>' requires character as left operand
>which tells me it can only check if a certain character is in a string or
>not.
<snip>
I'm sure that there are better ways of doing what you want than what I am
recommending, but here goes anyway:
1. Use the startswith() method for strings. Unfortunately this means you
have to test this at every position in the string.
for i in range(len(data)):
if data[i:].startswith("image1.gif"):
print "Success"
2. Same approach but reduce the work required by checking first if the
first letter matches:
for i in range(len(data)):
if data[i]=='i' and data[i:].startswith("image1.gif"):
print "Success"
3. As David Broadwell suggested, you can divide data further. Since
"image1.gif" will always be part of a URL, it will always have '/' before
it. You can therefore split data using '/'
newdata = data.split('/')
for phrase in newdata:
if phrase.startswith("image1.gif"):
print "Success"
Hope these ideas help or trigger some better solution,
Alfred Milgrom