Using a function for regular expression substitution

naugiedoggie michael.a.powe at gmail.com
Sun Aug 29 10:22:33 EDT 2010


Hello,

I'm having a problem with using a function as the replacement in
re.sub().

Here is the function:

def normalize(s) :
    return
urllib.quote(string.capwords(urllib.unquote(s.group('provider'))))

The purpose of this function is to proper-case the words contained in
a URL query string parameter value.  I'm massaging data in web log
files.

In case it matters, the regex pattern looks like this:

provider_pattern = r'(?P<search>Search_Provider)=(?P<provider>[^&]+)'

The call looks like this:

<code>
re.sub(matcher,normalize,line)
</code>

Where line is the log line entry.

What I get back is first the entire line with the normalization of the
parameter value, but missing the parameter; then appended to that
string is the entire line again, with the query parameter back in
place pointing to the normalized string.

<code>
>>> fileReader = open(log,'r')
>>>
>>> lines = fileReader.readlines()
>>> for line in lines:
	if line.find('Search_Type') != -1 and line.find('Search_Provider') !=
-1 :
		re.sub(provider_matcher,normalize,line)
		print line,'\n'
</code>

The output of the print is like this:

<code>
'log-entry parameter=value&normalized-string&parameter=value\n
log-entry parameter=value&parameter=normalized-string&parameter=value'
</code>

The goal is to massage the specified entries in the log files and
write the entire log back into a new file.  The new file has to be
exactly the same as the old one, with the exception of the entries
I've altered with my function.

No doubt I'm doing something trivially wrong, but I've tried to
reproduce the structure as defined in the documentation.

Thanks.

mp



More information about the Python-list mailing list