[Tutor] Substring substitution

Bernard Lebel 3dbernard at gmail.com
Thu Sep 8 22:15:02 CEST 2005


Hi Kent,

This is nice!

There is one thing though. When I run the oRe.sub() call, I get an error:


Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "\\Linuxserver\prod\XSI\WORKGROUP_4.0\Data\Scripts\pipeline\filesystem\bb_processshotdigits.py",
line 63, in ?
    processPath( r'C:\temp\MT_03_03_03\allo.txt', False )
  File "\\Linuxserver\prod\XSI\WORKGROUP_4.0\Data\Scripts\pipeline\filesystem\bb_processshotdigits.py",
line 45, in processPath
    else: sSubString = matchShot( sSubString )
  File "\\Linuxserver\prod\XSI\WORKGROUP_4.0\Data\Scripts\pipeline\filesystem\bb_processshotdigits.py",
line 11, in matchShot
    sNewString = oRe.sub( r'\10\2', sSubString )
  File "D:\Python24\Lib\sre.py", line 260, in filter
    return sre_parse.expand_template(template, match)
  File "D:\Python24\Lib\sre_parse.py", line 781, in expand_template
raise error, "invalid group reference"
sre_constants.error: invalid group reference



 This is my new match function:



def matchShot( sSubString ):
	
	# Create regular expression object
	oRe = re.compile( "(\d\d_\d\d\_)(\d\d)" )
	
	oMatch = oRe.search( sSubString )
	if oMatch != None:
		sNewString = oRe.sub( r'\10\2', sSubString )
		return sNewString
	else:
		return sSubString



I have read the sub() documentation entry but I have to confess that
made things more confusing for me...

Thanks again
Bernard



On 9/8/05, Kent Johnson <kent37 at tds.net> wrote:
> Bernard Lebel wrote:
> > Hello,
> >
> > I have a string, and I use a regular expression to search a match in
> > it. When I find one, I would like to break down the string, using the
> > matched part of it, to be able to perform some formatting and to later
> > build a brand new string with the separate parts.
> >
> > The regular expression part works ok, but my problem is to extract the
> > matched pattern from the string. I'm not sure how to do that...
> >
> >
> > sString = 'mt_03_04_04_anim'
> >
> > # Create regular expression object
> > oRe = re.compile( "\d\d_\d\d\_\d\d" )
> >
> > # Break-up path
> > aString = sString.split( os.sep )
> >
> > # Iterate individual components
> > for i in range( 0, len( aString ) ):
> >
> >       sSubString = aString[i]
> >
> >       # Search with shot number of 2 digits
> >       oMatch = oRe.search( sSubString )
> >
> >       if oMatch != None:
> >               # Replace last sequence of two digits by 3 digits!!
> 
> Hi Bernard,
> 
> It sounds like you need to put some groups into your regex and use re.sub().
> 
> By putting groups in the regex you can refer to pieces of the match. For example
> 
>  >>> import re
>  >>> s  = 'mt_03_04_04_anim'
>  >>> oRe = re.compile( "(\d\d_\d\d\_)(\d\d)" )
>  >>> m = oRe.search(s)
>  >>> m.group(1)
> '03_04_'
>  >>> m.group(2)
> '04'
> 
> With re.sub(), you provide a replacement pattern that can refer to the groups from the match pattern. So to insert new characters between the groups is easy:
> 
>  >>> oRe.sub(r'\1XX\2', s)
> 'mt_03_04_XX04_anim'
> 
> This may be enough power to do what you want, I'm not sure from your description. But re.sub() has another trick up its sleeve - the replacement 'expression' can be a callable which is passed the match object and returns the string to replace it with. For example, if you wanted to find all the two digit numbers in a string and add one to them, you could do it like this:
> 
>  >>> def incMatch(m):
>  ...   s = m.group(0) # use the whole match
>  ...   return str(int(s)+1).zfill(2)
>  ...
>  >>> re.sub(r'\d\d', incMatch, '01_09_23')
> '02_10_24'
> 
> This capability can be used to do complicated replacements.
> 
> Kent
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list