[Tutor] Another regular expression question

Kent Johnson kent37 at tds.net
Wed Sep 14 00:18:51 CEST 2005

Bernard Lebel wrote:
> Hello, yet another regular expression question :-)
> So I have this xml file that I'm trying to find a specific tag in. For
> this I'm using a regular expression. Right now, the tag I'm trying to
> find looks like this:
> <sceneobject name="Camera_Root_bernard" type="CameraRoot">
> So I'm using a regular expression to find:
> sceneobject
> type="CameraRoot"
> My code looks like this:
> import os, re
> def searchTag( sPattern, sFile ):
> 	"""
> 	Scans a xml file to try to find a line that matches search criterias.
> 	sPattern (string): regular expression pattern string
> 	sFile (string): full file path to scan
> 	RETURN VALUE: text line (string) or None
> 	"""
> 	oRe = re.compile( sPattern )
> 	if os.path.exists( sFile ) == False: return None

No need to compare to False, you can just say
 	if not os.path.exists( sFile ): return None

> 	else:
> 		oFile = file( sFile, 'r' )
> 		for sLine in oFile.xreadlines(): # read text

  for sLine in oFile:
is more idiomatic and avoids reading the whole file at once.

> 			oMatch = oRe.search( sLine ) # attempt a search
> 			if oMatch != None: # check if search returned success
> 				oFile.close()
> 				return sLine
> 		# Scan has yield no result, return None
> 		oFile.close()
> 		return None
> sLine = searchTag( r'(sceneobject)(type="CameraRoot")', sFile )
> The thing is that I suspect my regular expression pattern to be
> incorrect because I always get None, but am at a loss here. Any advice
> would be welcomed.

You need something in the regex to match the part between 'sceneobject' and 'type="CameraRoot"'. The regex you are using expects them to be adjacent. Try
sLine = searchTag( r'(sceneobject).*?(type="CameraRoot")', sFile )

which means, match anything between the two strings, but the smallest amount possible (non-greedy).

It's also possible that the tag you are looking for spans multiple lines. In this case you should look at an XML parsing library.


More information about the Tutor mailing list