Extract string from log file

josephtys86 at googlemail.com josephtys86 at googlemail.com
Sat Aug 9 18:19:42 CEST 2008


On Aug 9, 11:22 pm, Edwin.Mad... at VerizonWireless.com wrote:
> from each line separate out url and request parts. split the request into key-value pairs, use urllib to unquote key-value pairs......as show below...
>
> import urllib
> line = "GET /stat.gif?stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton%20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton%20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7BNorton%20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BMicrosoft%20Windows%20XP+insecure%3BMicrosoft%20Windows%20XP%20Professional+f%3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1"
> words = line.split()
> for word in words:
> if word.find('?') >= 0:
>         req = word[word.find('?') + 1:]
>       kwds = req.split('&')
>       for kv in kwds:
>         print urllib.unquote(kv)
>
> stat=v
> c=F-Secure
> v=1.1 Build 14231
> s=av{Norton 360 (Symantec Corporation)+69;}sw{Norton 360 (Symantec Corporation)+69;}fw{Norton 360 (Symantec Corporation)+5;}v{Microsoft Windows XP+insecure;Microsoft Windows XP Professional+f;26027;26447;26003;22452;}
> r=0.9496
>
> good luck
> Edwin
>
> -----Original Message-----
> From: python-list-bounces+edwin.madari=verizonwireless.... at python.org
>
> [mailto:python-list-bounces+edwin.madari=verizonwireless.... at python.org]
> On Behalf Of josephty... at googlemail.com
> Sent: Saturday, August 09, 2008 10:48 AM
> To: python-l... at python.org
> Subject: Extract string from log file
>
> 203.114.10.66 - - [01/Aug/2008:05:41:21 +0300] "GET /stat.gif?
> stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton
> %20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton
> %20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7BNorton
> %20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BMicrosoft%20Windows
> %20XP+insecure%3BMicrosoft%20Windows%20XP%20Professional+f
> %3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1" 200 43
> "http://dfstage1.f-secure.com/fshc/1.1/release/devbw/1.1.14231/
> card.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
> SV1; .NET CLR 2.0.50727)"
>
> does anyone know how can i extract certain string from this log file
> using regular expression in python or using XML. can teach me.
> --http://mail.python.org/mailman/listinfo/python-list
>
> The information contained in this message and any attachment may be
> proprietary, confidential, and privileged or subject to the work
> product doctrine and thus protected from disclosure.  If the reader
> of this message is not the intended recipient, or an employee or
> agent responsible for delivering this message to the intended
> recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited.
> If you have received this communication in error, please notify me
> immediately by replying to this message and deleting it and all
> copies and backups thereof.  Thank you.
>
>

do you mind to explain further. based on the source code that you gave
me. what will it output. i wonder. Sorry i am new to string
extraction. i do understand your python coding. the only thing i don't
understand is this part.


for word in words:
if word.find('?') >= 0:
       req = word[word.find('?') + 1:]
     kwds = req.split('&')
     for kv in kwds:
       print urllib.unquote(kv)

what does this code do?
anyway, is this code automatic. what i mean is can it extract the
string everytime when a new log file is being output by the sever?



More information about the Python-list mailing list