efficiently splitting up strings based on substrings

per perfreem at gmail.com
Sat Sep 5 17:54:41 EDT 2009


I'm trying to efficiently "split" strings based on what substrings
they are made up of.
i have a set of strings that are comprised of known substrings.
For example, a, b, and c are substrings that are not identical to each
other, e.g.:
a = "0" * 5
b = "1" * 5
c = "2" * 5

Then my_string might be:

my_string = a + b + c

i am looking for an efficient way to solve the following problem.
suppose i have a short
string x that is a substring of my_string.  I want to "split" the
string x into blocks based on
what substrings (i.e. a, b, or c) chunks of s fall into.

to illustrate this, suppose x = "00111". Then I can detect where x
starts in my_string
using my_string.find(x).  But I don't know how to partition x into
blocks depending
on the substrings.  What I want to get out in this case is: "00",
"111".  If x were "001111122",
I'd want to get out "00","11111", "22".

is there an easy way to do this?  i can't simply split x on a, b, or c
because these might
not be contained in x.  I want to avoid doing something inefficient
like looking at all substrings
of my_string etc.

i wouldn't mind using regular expressions for this but i cannot think
of an easy regular
expression for this problem.  I looked at the string module in the
library but did not see
anything that seemd related but i might have missed it.

any help on this would be greatly appreciated.  thanks.



More information about the Python-list mailing list