[Tutor] regex woes in finding an ip and GET string
Peter Otten
__peter__ at web.de
Mon Jun 20 09:58:13 CEST 2011
Gerhardus Geldenhuis wrote:
> I am trying to write a small program that will scan my access.conf file
> and update iptables to block anyone looking for stuff that they are not
> supposed to.
>
> The code:
> #!/usr/bin/python
> import sys
> import re
>
> def extractoffendingip(filename):
> f = open(filename,'r')
> filecontents = f.read()
> #193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET
> /admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-" "Mozilla/4.0
> (compatible; MSIE 6.0; Windows 98)"
> tuples = re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP',
> filecontents)
If you want to process the whole file at once you have to use the
re.MULTILINE flag for the regex to match the start of a line instead of the
start of the whole text:
tuples = re.compile(r'...', re.MULTILINE).findall(filecontents)
But I think it's better to process the file one line at a time.
> iplist = []
[snip]
> if ip not in iplist:
> iplist.append(ip)
So you want every unique ip appear only once in iplist. Python offers an
efficient data structure for that, the set. With these changes your funtion
becomes something like (untested)
def extractoffendingips(filename):
match = re.compile(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP').match
ipset = set()
with open(filename) as f:
for line in f:
m = match(line)
if m is not None:
ip, getstring = m.groups()
ipset.add(ip)
for item in ipset:
print item
More information about the Tutor
mailing list