[Tutor] regex

Kent Johnson kent37 at tds.net
Tue Dec 27 13:28:10 CET 2005


Danny Yoo wrote:
>>Dec 18 10:04:45 dragon logger: TCPWRAP: SERVICE=sshd@::ffff:192.168.0.1
>>,TYPE=ALL_DENY,HOST_ADDRESS=::ffff:195.145.94.75,HOST_INFO=::ffff:
>>195.145.94.75,HOST_NAME=unknown,USER_NAME=unknown,OTHERINFO=
> 
> 
> Hi Will,
> 
> Observation: the output above looks comma delimited, at least the stuff
> after the 'TCPWRAP:' part.
> 
> 
>>self.twist_fail_re =
>>rc('SERVICE=\S*\sHOST_ADDRESS=\S*\sHOST_INFO=\S*\sHOST_NAME=\S*\sUSER_NAME=\S*\s')
> 
> 
> The line given as example doesn't appear to have whitespace in the places
> that the regular expression expects.  It does contain commas as delimiters
> between the key/value pairs encoded in the line.

Expanding on Danny's comment...

\S*\s matches any amount of non-whitespace followed by one whitespace. 
This doesn't match your sample. It looks like you want to match 
non-comma followed by comma. For example this will match the first field:
SERVICE=[^,]*,

Presumably you will want to pull out the value of the field so enclose 
it in parenthesis to make a group:

SERVICE=([^,]*),

Another thing I notice about your regex is it doesn't include all the 
fields in the sample, for example TYPE. If the fields are always the 
same you can just include them in your regex. If they vary you can try 
to make the regex skip them, use a different regex for each field, or 
try Danny's approach of using str.split() to break apart the data.

The Regex Demo program that comes with Python is handy for creating and 
testing regexes. Look in C:\Python24\Tools\Scripts\redemo.py or the 
equivalent.

Kent



More information about the Tutor mailing list