[python-win32] regular expressions question

David.Cantrell@Gunter.AF.mil David.Cantrell@Gunter.AF.mil
Wed, 14 Aug 2002 12:37:22 -0500


Hi all,

Do you know a way to wrap regular expressions around newlines, but STOP when
a certain pattern is reached?

In other words, given the following source:

	Some leading text here.

	Block 1
	Symbol:  text here
	Symbol:  text here
	Symbol:  text here
	Other stuff here
	End Block

	Block 2
	Symbol:  text here
	Symbol:  text here
	Symbol:  text here
	Other stuff here
	End Block

	Some trailing text here.

(I'm parsing VBScript files and extracting method comments, but the above is
simpler to deal with)

I have a regexp that retrieves a list of all Blocks, so given the above the
list looks like:

	methodlist = [ "Block 1", "Block 2" ]

If I use re.DOTALL:

	for item in methodlist:
		print item, "\n-----\n"
		print re.search( item + ".*End Block", s, re.DOTALL
).group()
		print "\n"

I get the following (of course):

	Block 1 
	-----
	Block 1
	Symbol:  text here
	Symbol:  text here
	Symbol:  text here
	Other stuff here
	End Block

	Block 2
	Symbol:  text here
	Symbol:  text here
	Symbol:  text here
	Other stuff here
	End Block

	Block 2 
	-----
	Block 2
	Symbol:  text here
	Symbol:  text here
	Symbol:  text here
	Other stuff here
	End Block

But I eventually want to build a list that looks like this:

	[	(	"Block 1",
			"Symbol:  text here\nSymbol:  text here\nSymbol:
text here"
		)
		(	"Block 2",
			"Symbol:  text here\nSymbol:  text here\nSymbol:
text here"
		)
	]

In order to do that, I need to know how to make the regexp engine STOP once
it gets past the last "Symbol: " line after each Block declaration.

(I know the regexp I gave goes from Block..End Block, but that's only
because I don't know how to "get all Symbol lines that come immediately
after a Block declaration")

Any help is much appreciated!!  :D

Thanks,
-dave