trouble with regex with escaped metachars (URGENT please O:-)

Roel Mathys rm at rm.net
Thu Nov 20 11:13:19 EST 2003


Fernando Rodriguez wrote:
> Hi,
> 
> I have a filewhose contents looks like this:
> 
> Compression=bzip/9
> OutputBaseFilename=$<OutputFileName>
> OutputDir=$<OutputDir>
> LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt
> 
> The tokens $<...> must be susbtituted by some user-provided string.  The
> problem is that those user-provided strings might contain metacharacters, so I
> escape them. And that's where I get into trouble.
> 
> Here's the code I'm using:
> 
>         def substitute(name, value, cts):
>             """
>             Finds all the occs in  cts of $<name>
>             and replaces them with value
>             """
>             
>             pat = re.compile("\$<" + name + ">", re.IGNORECASE)
> 
>             return pat.sub(val, cts)  # this line causes the error (see below)
> 
>         def escapeMetachars( s ):
>             """
>             All metacharacters in the user provided substitution must
>             be escaped
>             """
>             meta = r'\.^$+*?{[|()'
>             esc = ''
> 
>             for c in s:
>                 if c in meta:
>                     esc += '\\' + c
>                 else:
>                     esc += c
> 
>             return esc
> 
> cts = """Compression=bzip/9
> OutputBaseFilename=$<OutputFileName>
> OutputDir=$<OutputDir>
> LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt"""
> 
> name = 'OutputDir'
> value = "c:\\apps\\whatever\\"  # contains the backslash metachar
> 
> print substitute( escapeMetachars(name), value,  cts)
> 
> I get this error:
> Traceback (most recent call last):
>   File "<pyshell#38>", line 1, in -toplevel-
>     pat.sub(s,cts)
>   File "C:\ARCHIV~1\python23\Lib\sre.py", line 257, in _subx
>     template = _compile_repl(template, pattern)
>   File "C:\ARCHIV~1\python23\Lib\sre.py", line 244, in _compile_repl
>     raise error, v # invalid expression
> error: bogus escape (end of line)
> 
> What on earth is this? O:-)
> 
> PS: I can't use string.replace() for the susbtitution,because it must be
> case-insensitive: the user might enter OUTPUTDIR, and it should still work.

it's the value of "value" that gives trouble (ending with a "bogus" \ 
followed by an (invisible) end-of-line.
This little patch will do the trick, and apparantly

def substitute(name, value, cts):
     pat = re.compile("\$<" + name + ">", re.IGNORECASE)
     if value[-1:] == '\\' :
         value , suffix = value[:-1] , '\\'
     else :
         suffix = ''
     return pat.sub(value[:-1], cts) + suffix

can't explain it though :-)

you could try this as well:

def substitute2( name , value , cts ) :
     ucts = cts.upper()
     uname = name.upper()
     parts = ucts.split( r'$<' + uname + '>' )
     if len( parts ) != 2 :
         raise 'Something'
     return value.join( [ cts[:len(parts[0])] , cts[-len(parts[1]):]])


bye,
rm





More information about the Python-list mailing list