[Tutor] find second occurance of string in line

Alan Gauld alan.gauld at btinternet.com
Tue Sep 8 18:23:31 CEST 2015


On 08/09/15 17:00, richard kappler wrote:
> I need to find the index of the second occurance of a string in an xml file
> for parsing.

Do you want to find just the second occurence in the *file*
or the second occurence within a given tag in the file (and
there could be multiple such tags)?


>  I understand re well enough to do what I want to do

Using re to parse XML is usually the wrong way to go about it.
Fortunately you are not using re in the code below.
However, a real XML parser such as etree(from the std lib)
or lxml might work better.

> first instance, but despite many Google searches have yet to find something
> to get the index of the second instance, because split won't really work on
> my xml file (if I understand split properly) as there are no spaces.

split can split on any character you want, whitespace
just happens to be the default.

> Specifically I'm looking for the second <timestamp> in an objectdata line.

Is objectdata within a specific tag? Usually when parsing XML its
the tags you look for first since "lines" can be broken over
multiple lines and multiple tags can exist on one literal line.

> Not all lines are objectdata lines, though all objectdata lines do have
> more than one <timestamp>.

This implies there are many objectdata lines within your file? See the 
first comment above... do you want the second index for the first 
objectdata line or do you want it for every objectdata line?

> import re

You don't use this.

> with open("example.xml", 'r') as f:
>      for line in f:
>          if "objectdata" in line:
>              if "<timestamp>" in line:
>                  x = "<timestamp>"

You should assign this once above the loops, it saves a lot of 
duplicated work.

>                  first = x.index(line)

This is looking for the index of line within x.
I suspect you really want

first = line.index(x)

>                  second = x[first+1:].index(line)

You can specify a start position for index directly:

second = line.index(x,first+1)


>                  print first, second
>              else:
>                  print "no timestamp"
>          else:
>              print "no objectdata"
>
> my traceback:
>
> Traceback (most recent call last):
>    File "2iter.py", line 10, in <module>
>      first = x.index(line)
> ValueError: substring not found

That's what you get when the search fails.
You should use try/except when using index()

Alternatively try using str.find() which returns -1
when no index is found. But you need to check before
using it because -1 is, of course, a valid string index!

HTH

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list