How to use mxTextTools

Mike Fletcher mfletch at
Thu Dec 14 18:01:03 CET 2000

Something you might find useful would be to look at the mcf.vrml.parser
module, which uses simpleparse (which just spits out mxTextTools tuples) to
process a file into an in-memory node graph.

See for the
mcf.vrml distribution.  Here's some code from there...

	def readNext( self):
		'''Read the next root-level construct'''
		success, tags, next = TextTools.tag(,
ROOTITEMPARSER, self.position )
##		print 'readnext', success
		if self.position >= self.datalength:
			print 'reached file end'
			return None
		if success:
			#print '  successful parse'
			self.position = next
			if self.parseOnly:
				return success
			map (self.rootItem_Item, tags )
			return success
			return None
	def rootItem (self, (type, start, stop, (item,))):
		''' Process a single root item '''
		self.rootItem_Item( item )
	def rootItem_Item( self, item ):
		result = self._dispatch(item)
		if result is not None:
##			print "non-null result"
##			print id( self.sceneGraphStack[-1] ), id(self.result
			self.sceneGraphStack[-1].children.append( result )
	def _getString (self, (tag, start, stop, sublist)):
		''' Return the raw string for a given interval in the data
		return [start: stop]
	def _dispatch (self, (tag, left, right, sublist)):
		''' Dispatch to the appropriate processing function based on
tag value '''
##		print "dispatch", tag
			function = getattr (self, tag)
		except AttributeError:
			raise AttributeError( '''Unknown parse tag "%s"
found! Check the parser definition!'''%(tag))
		return function( (tag, left, right, sublist) )
	def Proto(self, (tag, start, stop, sublist)):
		''' Create a new prototype in the current sceneGraph '''
		# first entry is always GI
		GI = self._getString ( sublist [0])
##		print "PROTO",GI
		newNode =Prototype (GI)
##		print "\t",newNode
		setattr ( self.sceneGraphStack [-1].protoTypes, GI, newNode)
		self.prototypeStack.append( newNode )
		# process the rest of the entries with the given stack
		map ( self._dispatch, sublist [1:] )
		self.prototypeStack.pop( )
	def fieldDecl(self,(tag, left, right, (exposure, datatype, name,
		''' Create a new field declaration for the current
		# get the definition in recognizable format
		exposure = self._getString (exposure) == "exposedField"
		datatype = self._getString (datatype)
		name = self._getString (name)
		# get the vrml value for the field
		self.fieldTypeStack.append( datatype )
		field = self._dispatch (field)
		self.fieldTypeStack.pop( )
		self.prototypeStack[-1].addField ((name, datatype,
exposure), field)


-----Original Message-----
From: Paul Moore [mailto:paul.moore at]
Sent: Thursday, December 14, 2000 8:36 AM
To: python-list at
Subject: How to use mxTextTools

I'm looking at mxTextTools to see if it would be suitable for some
types of text parsing work I am interested in (nothing concrete yet,
so I can't give specifics...)

The example in the documentation of tagging HTML looks fine - I
understand what's going on there, and as I understand it, this will
give me back a taglist, which is (effectively) the text stream with
portions tagged as I ask.

What I dont't see (yet), and I can't find any good examples for, is
what to do with the resulting taglist. There seem to be no functions
for working with taglists, and the lists themselves seem like
relatively complex data structures, so is it right that I should be
manipulating them "by hand"?

More information, or better still, some complete examples, would be
very helpful. (All the examples in the distribution just use
print_tags() to display the tags, and don't do anything with them...)



More information about the Python-list mailing list