[Tutor] Fwd: find second occurance of string in line

Albert-Jan Roskam sjeik_appie at hotmail.com
Tue Sep 8 21:08:22 CEST 2015


> To: tutor at python.org
> From: __peter__ at web.de
> Date: Tue, 8 Sep 2015 19:40:03 +0200
> Subject: Re: [Tutor] Fwd:  find second occurance of string in line
> 
> richard kappler wrote:
> 
> >> Do you want to find just the second occurence in the *file* or the second
> > occurence within a given tag in the file (and there could be multiple such
> > tags)?
> > 
> > There are multiple objectdata lines in the file and I wish to find the
> > second occurence of timestamp in each of those lines.
> > 
> >> Is objectdata within a specific tag? Usually when parsing XML its the
> > tags you look for first since "lines" can be broken over multiple lines
> > and multiple tags can exist on one literal line.
> > 
> > objectdata is within a tag as is timestamp. Here's an example:
> > 
> > <objectdata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> > xsi:noNamespaceSchemaLocation="Logging.xsd"
> > version="1.0"><devicename>0381UDI1</devicename><deviceid>32
> > </deviceid><timestamp>2015-06-18T14:28:06.570</timestamp>
> > <incr>53163</incr><tokenid>0381UDI12015-06-18T14:27:50379</tokenid>
> > <seqnb>1306</seqnb><general> oi="607360" on="379" ox="02503" oc="0" 
> is="49787" ie="50312" lftf="N"
> > lfts="7" errornb="0"
> > iostate="DC00"><timestamp>2015-06-18T14:27:50.811</timestamp><otl
> > unit="inch"><value>51.45</value></otl><tt
> > unit="ms"><value>0</value></tt><oga
> 
> [snip]
> 
> I'm inferring from the above that you do not really want the "second" 
> timestamp in the line -- there is no line left intace anyway;) -- but rather 
> the one in the <general>...</general> part.
> 
> Here's a way to get these (all of them) with lxml:
> 
> import lxml.etree
> 
> tree = lxml.etree.parse("example.xml")
> print tree.xpath("//objectdata/general/timestamp/text()")

Nice. I do need to try lxml some time. Is the "text()" part xpath as well?
When I try it with elementtree, I get

* a SyntaxError. 

root.find("//objectdata/general/timestamp/text()")
  File "<string>", line unknown
SyntaxError: cannot use absolute path on element

root.find("./general/timestamp/text()")
Traceback (most recent call last):

  File "<ipython-input-47-af8fc012f0f0>", line 1, in <module>
    root.find("./general/timestamp/text()")

  File "/usr/lib/python2.7/xml/etree/ElementPath.py", line 285, in find
    return iterfind(elem, path, namespaces).next()

  File "/usr/lib/python2.7/xml/etree/ElementPath.py", line 263, in iterfind
    selector.append(ops[token[0]](next, token))

*  a KeyError

root.find("./general/timestamp/text()")
Traceback (most recent call last):

  File "<ipython-input-47-af8fc012f0f0>", line 1, in <module>
    root.find("./general/timestamp/text()")

  File "/usr/lib/python2.7/xml/etree/ElementPath.py", line 285, in find
    return iterfind(elem, path, namespaces).next()

  File "/usr/lib/python2.7/xml/etree/ElementPath.py", line 263, in iterfind
    selector.append(ops[token[0]](next, token))

KeyError: '()'


Anyway, I would have done it like below. I added the pretty-printing because I think it's a really useful function.

import xml.etree.cElementTree as et
import xml.dom.minidom

data = """\
* copy-paste xml here
"""
data = xml.dom.minidom.parseString(data).toprettyxml()
print data
root = et.fromstring(data)
#root = et.parse(xml_filename).getroot()
timestamp = root.findtext(".//general/timestamp")


Best wishes,
Albert-Jan





 		 	   		  


More information about the Tutor mailing list