[Tutor] regex woes in finding an ip and GET string
Peter Lavelle
lists at solderintheveins.co.uk
Sun Jun 19 13:36:06 CEST 2011
Looking at the regex you have to match an IP address, I think you would
need to put a range limit on each of the four octets you are searching
for (as each one would be between 1 and 3 digits long.)
For example: r =
re.match(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",line) has worked for me.
I am no expert on regex (it scares me!) I got the example above from:
http://www.regular-expressions.info/examples.html
Hope my semi-coherent ramblings have been of some help
Regards
Peter
On 19/06/11 12:25, Gerhardus Geldenhuis wrote:
> Hi
> I am trying to write a small program that will scan my access.conf
> file and update iptables to block anyone looking for stuff that they
> are not supposed to.
>
> The code:
> #!/usr/bin/python
> import sys
> import re
>
> def extractoffendingip(filename):
> f = open(filename,'r')
> filecontents = f.read()
> #193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET
> /admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-" "Mozilla/4.0
> (compatible; MSIE 6.0; Windows 98)"
> tuples = re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP',
> filecontents)
> iplist = []
> for items in tuples:
> (ip, getstring) = items
> print ip,getstring
> #print item
> if ip not in iplist:
> iplist.append(ip)
> for item in iplist:
> print item
> #ipmatch = re.search(r'', filecontents)
>
> def main():
> extractoffendingip('access_log.1')
>
> if __name__ == '__main__':
> main()
>
> logfile=http://pastebin.com/F3RXDYBW
>
>
> I could probably have used ranges to be more correct about finding
> ip's but I thought that apache should take care of that. I am assuming
> a level or integrity in the log file with regards to data...
>
> The first problem I ran into was that I added a ^ to my search string:
> re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP', filecontents)
>
> but that finds only two results a lot less than I am expecting. I am a
> little bit confused, first I thought that it might be because the
> string I am searching is now only one line because of the method of
> loading and the ^ should only find one instance but instead it finds two?
>
> So removing the ^ works much better but now I get mostly correct
> results but I also get a number of ip's with an empty get string, only
> thought there should be only one in the log file. I would really
> appreciate any pointers as to what is going on here.
>
> Regards
>
> --
> Gerhardus Geldenhuis
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
--
LinkedIn Profile: http://linkedin.com/in/pmjlavelle
Twitter: http://twitter.com/pmjlavelle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110619/c0178116/attachment-0001.html>
More information about the Tutor
mailing list