[XML-SIG] Help Needed (Will pay if someone is interested)

Robert Rawlins - Think Blue robert.rawlins at thinkbluemedia.co.uk
Mon Jul 2 17:19:02 CEST 2007


Hi Stefan.

Thanks for getting back to me so quickly, I've been tearing my hair out on this one :-)

>> Why not read the file into a more suitable in-memory data structure and search from there?

I'd be more than happy to do something like this, I just have no idea how, what type of data structure are you thinking would be simple?

Thanks for the cElementTree example the code already look a lot cleaner than that of the minidom stuff I was working on, jeez that stuff was messy lol.

Thanks, You'll have to excuse me on any naivety as I'm relatively new to both XML and Python, mixing the two is making my head spin :-D

Rob


-----Original Message-----
From: Stefan Behnel [mailto:stefan_ml at behnel.de] 
Sent: 02 July 2007 16:13
To: Robert Rawlins - Think Blue
Cc: xml-sig at python.org
Subject: Re: [XML-SIG] Help Needed (Will pay if someone is interested)



Robert Rawlins - Think Blue wrote:
> I’m looking for some help with XML parsing, I’ve been playing around
> with this over the past few days and the only solution I can come up
> with seems to be a little slow and also leaves what I think is a memory
> leak in my application, which causes all kinds of problems.
> 
>  
> 
> I have a very simple XML file which I need to loop over the elements and
> extract the attribute information from, but the loop must be conditional
> as the attributes must meet a certain criteria.
> 
>  
> 
> My current solution is using minidom,

That's not the solution, that's the problem. Use cElementTree.


> which I’ve read isn’t one of the
> better parsers, if anyone knows of any that are better for the task I
> would love to hear it, the content is extracted regularly so whatever we
> chose needs to be quick, and validation isn’t so important. Take a look
> at this brief example of the XML we’re dealing with:
> 
>  
> 
> <schedules name="Default event" location="this is the location of the
> event">
> 
>                 <event name="This is an event" location="At my house"
> type="1" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
>                 <event name="And Another" location="At work" type="2"
> start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
>                 <event name="This is some more" location="At the cafe"
> type="1" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
>                 <event name="And one last one" location="At my house"
> type="3" start="2007-01-01 00:00:00" end="2007-01-01 00:00:00" />
> 
> </schedules>
> 
>  
> 
> Now this file details events which are possibly going to occur over the
> next couple of weeks. Now what I need to do is have a function which is
> called ‘getCurrentEvent()’ which will return any events that should be
> occurring at this point in time, or now().

  from xml.etree import celementtree as et # Python 2.5

  # untested
  search_date = "2007-02-03 00:00:00"
  for _, element in et.iterparse("event-file.xml"):
      if element.tag == event:
          start = element.get("start")
          end   = element.get("end")
          if start > search_date:
               continue
          if end != start and end < search_date:
               continue

      print et.tostring(element)

or something like that. You'll love the performance.


> The ‘Type’ attribute details
> how often the event it likely to reoccur, 1 being daily, 2 being weekly
> and so on, if no elements are found which are occurring in this time and
> date then I would like it to return the default event which is defined
> in the attributes of the ‘schedules’ tag.

That's much harder, as it requires real date calculation in general. Are you
sure you want an XML tree as a database? Why not read the file into a more
suitable in-memory data structure and search from there?

Stefan



More information about the XML-SIG mailing list