NewB question on text manipulation

ProvoWallis gshepherd281281 at
Wed May 3 19:29:55 CEST 2006

Thanks very much for this I really appreciate it. I've pasted what I've
got now thanks to you.

I only have one issue that I can't figure out. When I print the new
string I'm getting all of the values in the lt list rather than just
the one that corresponds to the original entry.


My original data looks like this:

<1><SC>FAM LAW ENF<XC>259-232<LT>-687

<1><SC>APPEAL<XC>40-38; 40-44; 44-18; 45-15<LT>1

I want my output to look like this:

<1><SC>FAM LAW ENF<XC>259-232<LT>-687

But istead I'm getting this -- all of the entries in the lt list are
being added to my string when I just want one. I'm not sure how to
select just the entry in the lt list that I want.

<1><SC>FAM LAW ENF<XC>259-232<LT>-687<LT>1


Here's what I've got so far:

s_space = " "  # a single space
s_empty = ""  # empty string

pat = re.compile("\s*<SC>([^<]+)<XC>([^<]+)")

lst = []

while True:
    m =
    if not m:

    title =
    xc =
    xc = xc.replace(s_space, s_empty)
    tup = (title, xc)
    s = pat.sub(s_empty, s, 1)

lt = s.strip()

for title, xc in lst:
    lst_pp = xc.split(";")
    for pp in lst_pp:
        print "<1><SC>%s<XC>%s%s" % (title, pp, lt)

