<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

    <title></title>

  </head>

  <body bgcolor="#ffffff" text="#000000">

    Looking at the regex you have to match an IP address, I think you

    would need to put a range limit on each of the four octets you are

    searching for (as each one would be between 1 and 3 digits long.)<br>

    <br>

    For example: r =

    re.match(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b",line) has worked

    for me.<br>

    <br>

    I am no expert on regex (it scares me!) I got the example above

    from:<br>

    <a href="http://www.regular-expressions.info/examples.html">http://www.regular-expressions.info/examples.html</a><br>

    <br>

    <br>

    Hope my semi-coherent ramblings have been of some help<br>

    <br>

    Regards<br>

    <br>

    Peter<br>

    <br>

    On 19/06/11 12:25, Gerhardus Geldenhuis wrote:

    <blockquote

      cite="mid:BANLkTim+ua0TB2JH0BL8G9-M1gNOtU+W9Q@mail.gmail.com"

      type="cite">Hi

      <div>I am trying to write a small program that will scan my

        access.conf file and update iptables to block anyone looking for

        stuff that they are not supposed to.</div>

      <div><br>

      </div>

      <div>The code:</div>

      <div>

        <div>#!/usr/bin/python</div>

        <div>import sys</div>

        <div>import re</div>

        <div><br>

        </div>

        <div>def extractoffendingip(filename):</div>

        <div>&nbsp; f = open(filename,'r')</div>

        <div>&nbsp; filecontents = f.read()</div>

        <div>#193.6.135.21 - - [11/Jun/2011:13:58:01 +0000] "GET

          /admin/pma/scripts/setup.php HTTP/1.1" 404 304 "-"

          "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"</div>

        <div>&nbsp; tuples =

          re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP',

          filecontents)</div>

        <div>&nbsp; iplist = []</div>

        <div>&nbsp; for items in tuples:</div>

        <div>&nbsp; &nbsp; (ip, getstring) = items</div>

        <div>&nbsp; &nbsp; print ip,getstring</div>

        <div>&nbsp; &nbsp; #print item</div>

        <div>&nbsp; &nbsp; if ip not in iplist:</div>

        <div>&nbsp; &nbsp; &nbsp; iplist.append(ip)</div>

        <div>&nbsp; for item in iplist:</div>

        <div>&nbsp; &nbsp; print item</div>

        <div>&nbsp; #ipmatch = re.search(r'', filecontents)</div>

        <div><br>

        </div>

        <div>def main():</div>

        <div>&nbsp; extractoffendingip('access_log.1')</div>

        <div><br>

        </div>

        <div>if __name__ == '__main__':</div>

        <div>&nbsp; main()</div>

      </div>

      <div><br>

      </div>

      <div>logfile=<a moz-do-not-send="true"

          href="http://pastebin.com/F3RXDYBW">http://pastebin.com/F3RXDYBW</a></div>

      <div><br>

      </div>

      <div><br>

      </div>

      <div>I could probably have used ranges to be more correct about

        finding ip's but I thought that apache should take care of that.

        I am assuming a level or integrity in the log file with regards

        to data...</div>

      <div><br>

      </div>

      <div>The first problem I ran into was that I added a ^ to my

        search string:</div>

      <div>re.findall(r'^(\d+\.\d+\.\d+\.\d+).*\"GET(.*)HTTP',

        filecontents)</div>

      <div><br>

      </div>

      <div>but that finds only two results a lot less than I am

        expecting. I am a little bit confused, first I thought that it

        might be because the string I am searching is now only one line

        because of the method of loading and the ^ should only find one

        instance but instead it finds two?</div>

      <div><br>

      </div>

      <div>So removing the ^ works much better but now I get mostly

        correct results but I also get a number of ip's with an empty

        get string, only thought there should be only one in the log

        file. I would really appreciate any pointers as to what is going

        on here.</div>

      <div><br>

      </div>

      <div>Regards</div>

      <div><br>

        -- <br>

        Gerhardus Geldenhuis<br>

      </div>

      <pre wrap="">

<fieldset class="mimeAttachmentHeader"></fieldset>

_______________________________________________

Tutor maillist  -  <a class="moz-txt-link-abbreviated" href="mailto:Tutor@python.org">Tutor@python.org</a>

To unsubscribe or change subscription options:

<a class="moz-txt-link-freetext" href="http://mail.python.org/mailman/listinfo/tutor">http://mail.python.org/mailman/listinfo/tutor</a>

</pre>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

LinkedIn Profile: <a class="moz-txt-link-freetext" href="http://linkedin.com/in/pmjlavelle">http://linkedin.com/in/pmjlavelle</a>

Twitter: <a class="moz-txt-link-freetext" href="http://twitter.com/pmjlavelle">http://twitter.com/pmjlavelle</a></pre>

  </body>

</html>