[SPAM] VERY simple string comparison issue

MRAB google at mrabarnett.plus.com
Wed Dec 24 20:10:51 CET 2008


Brad Causey wrote:
> Python Version: Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC 
> v.1310 32 bit (Intel)] on win32
> 
> List,
> 
> I am trying to do some basic log parsing, and well, I am absolutely 
> floored at this seemingly simple problem. I am by no means a novice in 
> python, but yet this is really stumping me. I have extracted the 
> pertinent code snippets and modified them to function as a standalone 
> script. Basically I am reading a log file ( in this case, testlog.log) 
> for entries and comparing them to entries in a safe list (in this case, 
> safelist.lst). I have spent numerous hours doing this several ways and 
> this is the most simple way I can come up with:
> 
> <code>
> import string
> 
> safelistfh = file('safelist.lst', 'r')
> safelist = safelistfh.readlines()
> 
> logfh = file('testlog.log', 'r')
> loglines = logfh.readlines()
> 
> def safecheck(line):
>     for entry in safelist:
>         print 'I am searching for\n'
>         print entry
>         print '\n'
>         print 'to exist in\n'
>         print line
>         comp = line.find(entry)
>         if comp <> -1:
>             out = 'Failed'
>         else:
>             out = 'Passed'
>     return out
> 
Unless I've misunderstood what you're doing, wouldn't it be better as:

def safecheck(line):
     for entry in safelist:
         print 'I am searching for\n'
         print entry
         print '\n'
         print 'to exist in\n'
         print line
         if entry in line:
             return 'Passed'
     return 'Failed'

> for log in loglines:
>     finalentry = safecheck(log)
>     if finalentry == 'Failed':
>         print 'This is an internal site'
>     else:
>         print 'This is an external site'
> </code>
> 
Actually, I think it would be better to use True and False instead of 
'Passed' and 'Failed.

> The contents of the two files are as follows:
> 
> <safelist.lst>
> http://www.mysite.com <http://www.mysite.com/>
> </safelist.lst>
> 
> <testlog.log>
> http://www.mysite.com/images/homepage/xmlslideshow-personal.swf
> </testlog.log>
> 
> It seems that no matter what I do, I can't get this to fail the " if 
> comp <> -1:" check. (My goal is for the check to fail so that I know 
> this is just a URL to a safe[internal] site)
> My assumption is that the HTTP:// is somehow affecting the searching 
> capabilities of the string.find function. But I can't seem to locate any 
> documentation online that outlines restrictions when using special 
> characters.
> 
> Any thoughts?
> 
You'll still need to strip off the '\n'.



More information about the Python-list mailing list