Extract string from log file
josephtys86 at googlemail.com
josephtys86 at googlemail.com
Sat Aug 9 12:19:42 EDT 2008
On Aug 9, 11:22 pm, Edwin.Mad... at VerizonWireless.com wrote:
> from each line separate out url and request parts. split the request into key-value pairs, use urllib to unquote key-value pairs......as show below...
> import urllib
> line = "GET /stat.gif?stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton%20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton%20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7BNorton%20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BMicrosoft%20Windows%20XP+insecure%3BMicrosoft%20Windows%20XP%20Professional+f%3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1"
> words = line.split()
> for word in words:
> if word.find('?') >= 0:
> req = word[word.find('?') + 1:]
> kwds = req.split('&')
> for kv in kwds:
> print urllib.unquote(kv)
> stat=v
> c=F-Secure
> v=1.1 Build 14231
> s=av{Norton 360 (Symantec Corporation)+69;}sw{Norton 360 (Symantec Corporation)+69;}fw{Norton 360 (Symantec Corporation)+5;}v{Microsoft Windows XP+insecure;Microsoft Windows XP Professional+f;26027;26447;26003;22452;}
> r=0.9496
> good luck
> Edwin
> -----Original Message-----
> From: python-list-bounces+edwin.madari=verizonwireless.... at python.org
> [mailto:python-list-bounces+edwin.madari=verizonwireless.... at python.org]
> On Behalf Of josephty... at googlemail.com
> Sent: Saturday, August 09, 2008 10:48 AM
> To: python-l... at python.org
> Subject: Extract string from log file
> - - [01/Aug/2008:05:41:21 +0300] "GET /stat.gif?
> stat=v&c=F-Secure&v=1.1%20Build%2014231&s=av%7BNorton
> %20360%20%28Symantec%20Corporation%29+69%3B%7Dsw%7BNorton
> %20360%20%28Symantec%20Corporation%29+69%3B%7Dfw%7BNorton
> %20360%20%28Symantec%20Corporation%29+5%3B%7Dv%7BMicrosoft%20Windows
> %20XP+insecure%3BMicrosoft%20Windows%20XP%20Professional+f
> %3B26027%3B26447%3B26003%3B22452%3B%7D&r=0.9496 HTTP/1.1" 200 43
> "http://dfstage1.f-secure.com/fshc/1.1/release/devbw/1.1.14231/
> card.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
> SV1; .NET CLR 2.0.50727)"
> does anyone know how can i extract certain string from this log file
> using regular expression in python or using XML. can teach me.
> --http://mail.python.org/mailman/listinfo/python-list
> The information contained in this message and any attachment may be
> proprietary, confidential, and privileged or subject to the work
> product doctrine and thus protected from disclosure. If the reader
> of this message is not the intended recipient, or an employee or
> agent responsible for delivering this message to the intended
> recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited.
> If you have received this communication in error, please notify me
> immediately by replying to this message and deleting it and all
> copies and backups thereof. Thank you.
do you mind to explain further. based on the source code that you gave
me. what will it output. i wonder. Sorry i am new to string
extraction. i do understand your python coding. the only thing i don't
understand is this part.
for word in words:
if word.find('?') >= 0:
req = word[word.find('?') + 1:]
kwds = req.split('&')
for kv in kwds:
print urllib.unquote(kv)
what does this code do?
anyway, is this code automatic. what i mean is can it extract the
string everytime when a new log file is being output by the sever?
More information about the Python-list
mailing list