<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Looking at the regex you have to match an IP address, I think you
would need to put a range limit on each of the four octets you are
searching for (as each one would be between 1 and 3 digits long.)<br>
<br>
For example: r =
re.match(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",line) has worked
for me.<br>
<br>
I am no expert on regex (it scares me!) I got the example above
from:<br>
<a href="http://www.regular-expressions.info/examples.html">http://www.regular-expressions.info/examples.html</a><br>
<br>
<br>
Hope my semi-coherent ramblings have been of some help<br>
<br>
Regards<br>
<br>
Peter<br>
<br>
On 19/06/11 12:25, Gerhardus Geldenhuis wrote:
<blockquote
cite="mid:BANLkTim+ua0TB2JH0BL8G9-M1gNOtU+W9Q@mail.gmail.com"
type="cite">Hi
<div>I am trying to write a small program that will scan my
access.conf file and update iptables to block anyone looking for
stuff that they are not supposed to.</div>
<div><br>
</div>
<div>The code:</div>
<div>
<div>#!/usr/bin/python</div>
<div>import sys</div>
<div>import re</div>
<div><br>
</div>
<div>def extractoffendingip(filename):</div>
<div> f = open(filename,'r')</div>
<div> filecontents = f.read()</div>
<div>#193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET
/admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"</div>
<div> tuples =
re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP',
filecontents)</div>
<div> iplist = []</div>
<div> for items in tuples:</div>
<div> (ip, getstring) = items</div>
<div> print ip,getstring</div>
<div> #print item</div>
<div> if ip not in iplist:</div>
<div> iplist.append(ip)</div>
<div> for item in iplist:</div>
<div> print item</div>
<div> #ipmatch = re.search(r'', filecontents)</div>
<div><br>
</div>
<div>def main():</div>
<div> extractoffendingip('access_log.1')</div>
<div><br>
</div>
<div>if __name__ == '__main__':</div>
<div> main()</div>
</div>
<div><br>
</div>
<div>logfile=<a moz-do-not-send="true"
href="http://pastebin.com/F3RXDYBW">http://pastebin.com/F3RXDYBW</a></div>
<div><br>
</div>
<div><br>
</div>
<div>I could probably have used ranges to be more correct about
finding ip's but I thought that apache should take care of that.
I am assuming a level or integrity in the log file with regards
to data...</div>
<div><br>
</div>
<div>The first problem I ran into was that I added a ^ to my
search string:</div>
<div>re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP',
filecontents)</div>
<div><br>
</div>
<div>but that finds only two results a lot less than I am
expecting. I am a little bit confused, first I thought that it
might be because the string I am searching is now only one line
because of the method of loading and the ^ should only find one
instance but instead it finds two?</div>
<div><br>
</div>
<div>So removing the ^ works much better but now I get mostly
correct results but I also get a number of ip's with an empty
get string, only thought there should be only one in the log
file. I would really appreciate any pointers as to what is going
on here.</div>
<div><br>
</div>
<div>Regards</div>
<div><br>
-- <br>
Gerhardus Geldenhuis<br>
</div>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Tutor maillist - <a class="moz-txt-link-abbreviated" href="mailto:Tutor@python.org">Tutor@python.org</a>
To unsubscribe or change subscription options:
<a class="moz-txt-link-freetext" href="http://mail.python.org/mailman/listinfo/tutor">http://mail.python.org/mailman/listinfo/tutor</a>
</pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
LinkedIn Profile: <a class="moz-txt-link-freetext" href="http://linkedin.com/in/pmjlavelle">http://linkedin.com/in/pmjlavelle</a>
Twitter: <a class="moz-txt-link-freetext" href="http://twitter.com/pmjlavelle">http://twitter.com/pmjlavelle</a></pre>
</body>
</html>