[Tutor] Another regular expression question

Kent Johnson kent37 at tds.net
Wed Sep 14 00:18:51 CEST 2005


Bernard Lebel wrote:
> Hello, yet another regular expression question :-)
> 
> So I have this xml file that I'm trying to find a specific tag in. For
> this I'm using a regular expression. Right now, the tag I'm trying to
> find looks like this:
> 
> <sceneobject name="Camera_Root_bernard" type="CameraRoot">
> 
> So I'm using a regular expression to find:
> sceneobject
> type="CameraRoot"
> 
> 
> My code looks like this:
> 
> 
> import os, re
> 
> 
> def searchTag( sPattern, sFile ):
> 	
> 	"""
> 	Scans a xml file to try to find a line that matches search criterias.
> 	
> 	ARGUMENTS:
> 	sPattern (string): regular expression pattern string
> 	sFile (string): full file path to scan
> 	
> 	RETURN VALUE: text line (string) or None
> 	"""
> 	
> 	oRe = re.compile( sPattern )
> 	
> 	if os.path.exists( sFile ) == False: return None

No need to compare to False, you can just say
 	if not os.path.exists( sFile ): return None

> 	else:
> 		oFile = file( sFile, 'r' )
> 		
> 		for sLine in oFile.xreadlines(): # read text

  for sLine in oFile:
is more idiomatic and avoids reading the whole file at once.

> 			oMatch = oRe.search( sLine ) # attempt a search
> 			if oMatch != None: # check if search returned success
> 				oFile.close()
> 				return sLine
> 		
> 		# Scan has yield no result, return None
> 		oFile.close()
> 		return None
> 
> 
> sLine = searchTag( r'(sceneobject)(type="CameraRoot")', sFile )
> 
> 
> The thing is that I suspect my regular expression pattern to be
> incorrect because I always get None, but am at a loss here. Any advice
> would be welcomed.

You need something in the regex to match the part between 'sceneobject' and 'type="CameraRoot"'. The regex you are using expects them to be adjacent. Try
sLine = searchTag( r'(sceneobject).*?(type="CameraRoot")', sFile )

which means, match anything between the two strings, but the smallest amount possible (non-greedy).

It's also possible that the tag you are looking for spans multiple lines. In this case you should look at an XML parsing library.

Kent



More information about the Tutor mailing list