extract occurrence of regular expression from elements of XML documents

Steve Holden steve at holdenweb.com
Mon Mar 15 13:29:51 EDT 2010


Martin Schmidt wrote:
> Hi,
> 
> I have just started to use Python a few weeks ago and until last week I
> had no knowledge of XML.
> Obviously my programming knowledge is pretty basic.
> Now I would like to use Python in combination with ca. 2000 XML
> documents (about 30 kb each) to search for certain regular expression
> within specific elements of these documents.
> I would then like to record the number of occurrences of the regular
> expression within these elements.
> Moreover I would like to count the total number of words contained
> within these, and record the attribute of a higher level element that
> contains them.
> I was trying to figure out the best way how to do this, but got
> overwhelmed by the available information (e.g. posts using different
> approaches based on dom, sax, xpath, elementtree, expat).
> The outcome should be a file that lists the extracted attribute, the
> number of occurrences of the regular expression, and the total number of
> words.
> I did not find a post that addresses my problem.
> If someone could help me with this I would really appreciate it.
> 
You would get more specific help if you could post an example of the XML
and describe the regex searching you want to do in a little more detail,
I suspect.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
See PyCon Talks from Atlanta 2010  http://pycon.blip.tv/
Holden Web LLC                 http://www.holdenweb.com/
UPCOMING EVENTS:        http://holdenweb.eventbrite.com/




More information about the Python-list mailing list