[Tutor] making a custom file parser?
Lie Ryan
lie.1296 at gmail.com
Sat Jan 7 22:51:58 CET 2012
On 01/08/2012 04:53 AM, Alex Hall wrote:
> Hello all,
> I have a file with xml-ish code in it, the definitions for units in a
> real-time strategy game. I say xml-ish because the tags are like xml,
> but no quotes are used and most tags do not have to end. Also,
> comments in this file are prefaced by an apostrophe, and there is no
> multi-line commenting syntax. For example:
>
> <unit>
> <number=1>
> <name=my unit>
> <canMove=True>
> <canCarry=unit2, unit3, unit4>
> 'this line is a comment
> </unit>
>
The format is closer to sgml than to xml, except for the tag being able
to have values. I'd say you probably would have a better chance of
transforming this into sgml than transforming it to xml.
Try this re:
s = re.sub('<([a-zA-Z]+)=([^>]+)>', r'<\1 __attribute__="\2">', s)
and use an SGML parser to parse the result. I find Fredrik Lundh's
sgmlop to be easier to use for this one, just use easy_install or pip to
install sgmlop.
import sgmlop
class Unit(object): pass
class handler:
def __init__(self):
self.units = {}
def finish_starttag(self, tag, attrs):
attrs = dict(attrs)
if tag == 'unit':
self.current = Unit()
elif tag == 'number':
self.current.number = int(attrs['__attribute__'])
elif tag == 'canmove':
self.current.canmove = attrs['__attribute__'] == 'True'
elif tag in ('name', 'cancarry'):
setattr(self.current, tag, attrs['__attribute__'])
else:
print 'unknown tag', tag, attrs
def finish_endtag(self, tag):
if tag == 'unit':
self.units[self.current.name] = self.current
del self.current
def handle_data(self, data):
if not data.isspace(): print data.strip()
s = '''
<unit>
<number=1>
<name=my unit>
<canMove=True>
<canCarry=your unit, her unit, his unit>
'this line is a comment
</unit>
<unit>
<number=2>
<name=your unit>
<canMove=False>
<canCarry=her unit, his unit>
'this line is a comment
</unit>
<unit>
<number=3>
<name=her unit>
<canMove=True>
<canCarry=her unit>
'this line is a comment
</unit>
<unit>
<number=4>
<name=his unit>
<canMove=True>
<canCarry=his unit, her unit>
'this line is a comment
</unit>
'''
s = re.sub('<([a-zA-Z]+)=([^>]+)>', r'<\1 __attribute__="\2">', s)
parser = sgmlop.SGMLParser()
h = handler()
parser.register(h)
parser.parse(s)
print h.units
More information about the Tutor
mailing list