[Tutor] module to parse XMLish text?
Terry Carroll
carroll at tjc.com
Fri Jan 14 23:42:55 CET 2011
On Fri, 14 Jan 2011, Stefan Behnel wrote:
> Terry Carroll, 14.01.2011 03:55:
>> Does anyone know of a module that can parse out text with XML-like tags as
>> in the example below? I emphasize the "-like" in "XML-like". I don't think
>> I can parse this as XML (can I?).
>>
>> Sample text between the dashed lines::
>>
>> ---------------------------------
>> Blah, blah, blah
>> <AAA>
>> <BING ZEBRA>
>> <BANG ROOSTER>
>> <BOOM GARBONZO BEAN>
>> <BLIP>SOMETHING ELSE</BLIP>
>> <BASH>SOMETHING DIFFERENT</BASH>
>> </AAA>
>> ---------------------------------
>
> You can't parse this as XML because it's not XML. The three initial child
> tags are not properly closed.
Yeah, that's what I figured.
> If the format is really as you describe, i.e. one line per tag, regular
> expressions will work nicely.
Now there's an idea! I hadn't thought of using regexs, probably because
I'm terrible at all but the most simple ones.
As it happens, I'm only interested in four of the tags' contents, so I
could probably manage to write a seried of regexes that even I could
maintain, one for each of the pieces of data I want to extract; if I try
to write a grand unified regex, I'm bound to shoot myself in the foot.
Thanks very much.
On Fri, 14 Jan 2011, Karim wrote:
> from xml.etree.ElementTree import ElementTree
I don't think straight XML parsing will work on this, as it's not valid
XML; it just looks XML-like enough to cause confusion.
More information about the Tutor
mailing list