NewB question on text manipulation

Steve R. Hastings steve at
Wed May 3 21:16:57 CEST 2006

On Wed, 03 May 2006 10:29:55 -0700, ProvoWallis wrote:
> I only have one issue that I can't figure out. When I print the new
> string I'm getting all of the values in the lt list rather than just
> the one that corresponds to the original entry.

I did not realize that each entry would have its own LT value.  I had
thought that there were several sets of <SC> and <XC> with one <LT>.  You
only showed one example...

I have modified the program to collect LT values at the same time it
collects SC and XC values. Also, it now collects whatever code appears
before the first SC code.  I don't know what this code is for so I just
called the variable "before".

Notes on the code:

* Instead of doing this:

title =
title = title.strip()

I just do this:

title =

You can apply string methods on any string, and it's convenient to do it
all in one line.  There are several lines like that.

* There are two patterns to detect the LT code.  The first one is for
finding it, and the second one is only for removing it.  The second one
uses '^' to anchor the pattern, so it will only remove the LT code if the
LT code is the first thing in the string.  The first pattern does not have
the '^' anchor so it will look ahead, past any number of <SC> codes, to
find the next <LT> code.

* Otherwise this is pretty much like the first version.  It collects data,
saves it in a list, and then prints its output from the list.

I am busy now, so I won't have any time to make any more versions of this
for you. I hope you can study what I have done and understand how to apply
the ideas to your problems.  Good luck!

-- cut here -- cut here -- cut here -- cut here -- cut here --
import re

s = "<1><SC>APPEAL<XC>40-24; 40-46; 42-46; 42-48; 42-62; 42-63 " + \
    "<1><SC>PROC GUIDE<XC>92<LT>1(b)(1)" + \
    "<1><SC>FAM LAW ENF<XC>259-232<LT>-687" + \
    "<1><SC>APPEAL<XC>40-38; 40-44; 44-18; 45-15<LT>1"

s_space = " "  # a single space
s_empty = ""  # empty string

pat_sc = re.compile("\s*(<[^<]+)<SC>([^<]+)<XC>([^<]+)")
pat_lt = re.compile("<LT>([^<]+)")
pat_lt_remove = re.compile("^<LT>([^<]+)")

lst = []
lt = None

while True:
    m =
    if not m:

    before =
    title =
    xc =, s_empty)

    s = pat_sc.sub(s_empty, s, 1)

    m =
    if m:
        lt =
        lt = lt.strip()

    s = pat_lt_remove.sub(s_empty, s, 1)

    tup = (before, title, xc, lt)

for before, title, xc, lt in lst:
    lst_pp = xc.split(";")
    for pp in lst_pp:
        print "%s<SC>%s<XC>%s<LT>%s" % (before, title, pp, lt)
-- cut here -- cut here -- cut here -- cut here -- cut here --

Steve R. Hastings    "Vita est"
steve at

More information about the Python-list mailing list