[Tutor] making a custom file parser?

Alex Hall mehgcap at gmail.com
Sat Jan 7 20:22:54 CET 2012


I had planned to parse myself, but am not sure how to go about it. I
assume regular expressions, but I couldn't even find the amount of
units in the file by using:
unitReg=re.compile(r"\<unit\>(*)\</unit\>")
unitCount=unitReg.search(fileContents)
print "number of units: "+unitCount.len(groups())

I just get an exception that "None type object has no attribute
groups", meaning that the search was unsuccessful. What I was hoping
to do was to grab everything between the opening and closing unit
tags, then read it one at a time and parse further. There is a tag
inside a unit tag called AttackTable which also terminates, so I would
need to pull that out and work with it separately. I probably just
have misunderstood how regular expressions and groups work...


On 1/7/12, Chris Fuller <cfuller084 at thinkingplanet.net> wrote:
>
> If it's unambiguous as to which tags are closed and which are not, then it's
> pretty easy to preprocess the file into valid XML.  Scan for the naughty
> bits
> (single quotes) and insert escape characters, replace with something else,
> etc., then scan for the unterminated tags and throw in a "/" at the end.
>
> Anyhow, if there's no tree structure, or its only one level deep, using
> ElementTree is probably overkill and just gives you lots of leaking
> abstractions to plug for little benefit.  Why not just scan the file
> directly?
>
> Cheers
>
> On Saturday 07 January 2012, Alex Hall wrote:
>> Hello all,
>> I have a file with xml-ish code in it, the definitions for units in a
>> real-time strategy game. I say xml-ish because the tags are like xml,
>> but no quotes are used and most tags do not have to end. Also,
>> comments in this file are prefaced by an apostrophe, and there is no
>> multi-line commenting syntax. For example:
>>
>> <unit>
>> <number=1>
>> <name=my unit>
>> <canMove=True>
>> <canCarry=unit2, unit3, unit4>
>> 'this line is a comment
>> </unit>
>>
>> The game is not mine, but I would like to put together a python
>> interface to more easily manage custom units for it. To do that, I
>> have to be able to parse these files, but elementtree does not seem to
>> like them very much. I imagine it is due to the lack of quotes, the
>> non-standard commenting method, and the lack of closing tags. I think
>> my only recourse here is to create my own parser and tell elementtree
>> to use that. The docs say this is possible, but they also seem to
>> indicate that the parser has to already exist in the elementtree
>> package and there is no mention of making one's own method for
>> parsing. Even if this were possible, though, I am not sure how to go
>> about it. I can of course strip comments, but that is as far as I have
>> gotten.
>>
>> Bottom line: can I create a method and tell elementtree to parse using
>> it, and what would such a function look like (generally) if I can?
>> Thanks!
>
>


-- 
Have a great day,
Alex (msg sent from GMail website)
mehgcap at gmail.com; http://www.facebook.com/mehgcap


More information about the Tutor mailing list