trouble with regex with escaped metachars (URGENT please O:-)
Roel Mathys
rm at rm.net
Thu Nov 20 11:13:19 EST 2003
Fernando Rodriguez wrote:
> Hi,
>
> I have a filewhose contents looks like this:
>
> Compression=bzip/9
> OutputBaseFilename=$<OutputFileName>
> OutputDir=$<OutputDir>
> LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt
>
> The tokens $<...> must be susbtituted by some user-provided string. The
> problem is that those user-provided strings might contain metacharacters, so I
> escape them. And that's where I get into trouble.
>
> Here's the code I'm using:
>
> def substitute(name, value, cts):
> """
> Finds all the occs in cts of $<name>
> and replaces them with value
> """
>
> pat = re.compile("\$<" + name + ">", re.IGNORECASE)
>
> return pat.sub(val, cts) # this line causes the error (see below)
>
> def escapeMetachars( s ):
> """
> All metacharacters in the user provided substitution must
> be escaped
> """
> meta = r'\.^$+*?{[|()'
> esc = ''
>
> for c in s:
> if c in meta:
> esc += '\\' + c
> else:
> esc += c
>
> return esc
>
> cts = """Compression=bzip/9
> OutputBaseFilename=$<OutputFileName>
> OutputDir=$<OutputDir>
> LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt"""
>
> name = 'OutputDir'
> value = "c:\\apps\\whatever\\" # contains the backslash metachar
>
> print substitute( escapeMetachars(name), value, cts)
>
> I get this error:
> Traceback (most recent call last):
> File "<pyshell#38>", line 1, in -toplevel-
> pat.sub(s,cts)
> File "C:\ARCHIV~1\python23\Lib\sre.py", line 257, in _subx
> template = _compile_repl(template, pattern)
> File "C:\ARCHIV~1\python23\Lib\sre.py", line 244, in _compile_repl
> raise error, v # invalid expression
> error: bogus escape (end of line)
>
> What on earth is this? O:-)
>
> PS: I can't use string.replace() for the susbtitution,because it must be
> case-insensitive: the user might enter OUTPUTDIR, and it should still work.
it's the value of "value" that gives trouble (ending with a "bogus" \
followed by an (invisible) end-of-line.
This little patch will do the trick, and apparantly
def substitute(name, value, cts):
pat = re.compile("\$<" + name + ">", re.IGNORECASE)
if value[-1:] == '\\' :
value , suffix = value[:-1] , '\\'
else :
suffix = ''
return pat.sub(value[:-1], cts) + suffix
can't explain it though :-)
you could try this as well:
def substitute2( name , value , cts ) :
ucts = cts.upper()
uname = name.upper()
parts = ucts.split( r'$<' + uname + '>' )
if len( parts ) != 2 :
raise 'Something'
return value.join( [ cts[:len(parts[0])] , cts[-len(parts[1]):]])
bye,
rm
More information about the Python-list
mailing list