[BangPypers] parsing xml
Dhananjay Nene
dhananjay.nene at gmail.com
Sun Jul 31 19:28:58 CEST 2011
On Thu, Jul 28, 2011 at 3:18 PM, Kenneth Gonsalves <lawgon at gmail.com> wrote:
> hi,
>
> here is a simplified version of an xml file:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <gpx >
> <metadata>
> <author>
> <name>CloudMade</name>
> <email id="support" domain="cloudmade.com" />
> <link href="http://maps.cloudmade.com"></link>
> </author>
> <copyright author="CloudMade">
> <license>http://cloudmade.com/faq#license</license>
> </copyright>
> <time>2011-07-28T07:04:01</time>
> </metadata>
> <extensions>
> <distance>1489</distance>
> <time>344</time>
> <start>Sägerstraße</start>
> <end>Im Gisinger Feld</end>
> </extensions>
> </gpx>
>
> I want to get the value of the distance element - 1489. What is the
> simplest way of doing this?
>
re.search("<distance>\s*(\d+)\s*</distance>",data).group(1)
would appear to be the most succinct and quite fast. Adjust for whitespace
as and if necessary.
Yet I would probably use the minidom based approach, if I was sure the input
was likely to be continue to be xml. Anand C's solution (elsewhere in the
thread) reflects the programmers intent in a simpler, less obfuscated form
(both correctly working solutions will communicate the intent with exactly
the same precision - the precision required to make the program work).
As far as optimisation goes - I can see at least 3 options
a. the minidom performance is acceptable - no further optimisation required
b. minidom performance is not acceptable - try the regex one
c. python library performance is not acceptable - switch to 'c'
I can imagine people starting with a and then deciding to move along the
path a->b->c if and as necessary.
I believe starting with b risks obfuscating code (imo regex is obfuscated
compared to xml nodes - YMMV)
I don't know of any python programmers who are speed-maniacs. I am worried
anytime someone programs in something else than assembly/machine code and
uses the latter word. The rest of us are just trading off development speed
vs. runtime speed.
Dhananjay
More information about the BangPypers
mailing list