extract occurrence of regular expression from elements of XML documents

Martin Schmidt martin.schmidt1 at gmail.com
Mon Mar 15 13:16:01 EDT 2010


Hi,

I have just started to use Python a few weeks ago and until last week I had
no knowledge of XML.
Obviously my programming knowledge is pretty basic.
Now I would like to use Python in combination with ca. 2000 XML documents
(about 30 kb each) to search for certain regular expression within specific
elements of these documents.
I would then like to record the number of occurrences of the regular
expression within these elements.
Moreover I would like to count the total number of words contained within
these, and record the attribute of a higher level element that contains
them.
I was trying to figure out the best way how to do this, but got overwhelmed
by the available information (e.g. posts using different approaches based on
dom, sax, xpath, elementtree, expat).
The outcome should be a file that lists the extracted attribute, the number
of occurrences of the regular expression, and the total number of words.
I did not find a post that addresses my problem.
If someone could help me with this I would really appreciate it.

  Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100315/722547b3/attachment.html>


More information about the Python-list mailing list