replace random matches of regexp

Thu Sep 8 16:43:15 EDT 2011

gry wrote:

> [Python 2.7]
> I have a body of text (~1MB) that I need to modify.   I need to look
> for matches of a regular expression and replace a random selection of
> those matches with a new string.  There may be several matches on any
> line, and a random selection of them should be replaced.  The
> probability of replacement should be adjustable.  Performance is not
> an issue.  E.g: if I have:
> 
> SELECT max(PUBLIC.TT.I) AS SEL_0 FROM (SCHM.T RIGHT OUTER JOIN
> PUBLIC.TT ON (SCHM.T.I IS NULL)) WHERE (NOT(NOT((power(PUBLIC.TT.F,
> PUBLIC.TT.F) = cast(ceil((         SELECT 22 AS SEL_0        FROM
> (PUBLIC.TT AS PUBLIC_TT_0 JOIN PUBLIC.TT AS PUBLIC_TT_1 ON (ceil(0.46)
> =sin(PUBLIC_TT_1.F)))        WHERE ((zeroifnull(PUBLIC_TT_0.I) =
> sqrt((0.02 + PUBLIC_TT_1.F))) OR
> 
> I might want to replace '(max|min|cos|sqrt|ceil' with "public.\1", but
> only with probability 0.7.  I looked and looked for some computed
> thing in re's that I could stick and expression, but could not find
> such(for good reasons, I know).
> Any ideas how to do this?  I would go for simple, even if it's wildly
> inefficient, though elegance is always admired...

def make_sub(text, probability):
    def sub(match):
        if random.random() < probability:
            return text + match.group(1)
        return match.group(1)
    return sub

print re.compile("(max|min|cos|sqrt|ceil)").sub(make_sub(r"public.", .7), 
sample)

or even

def make_sub(text, probability):
    def sub(match):
        if random.random() < probability:
            def group_sub(m):
                return match.group(int(m.group(1)))
            return re.compile(r"[\\](\d+)").sub(group_sub, text)
        return match.group(0)
    return sub

print re.compile("(max|min|cos|sqrt|ceil)").sub(make_sub(r"public.\1", .7), 
sample)