[Python-Dev] RFD: how to build strings from lots of slices?

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Sun, 27 Feb 2000 13:01:38 +0100


when hacking on SRE's substitution code, I stumbled
upon a problem.  to do a substitution, SRE needs to
merge slices from the target strings and from the sub-
stitution pattern.

here's a simple example:

    re.sub(
        "(perl|tcl|java)",
        "python (not \\1)",
        "perl rules"
    )

contains a "substitution pattern" consisting of three
parts:

    "python (not " (a slice from the substitution string)
    group 1 (a slice from the target string)
    ")" (a slice from the substitution string)

PCRE implements this by doing the slicing (thus creating
three new strings), and then doing a "join" by hand into
a PyString buffer.

this isn't very efficient, and it also doesn't work for uni-
code strings.

in other words, this needs to be fixed.  but how?

...

here's one proposal, off the top of my head:

1. introduce a PySliceListObject, which behaves like a
simple sequence of strings, but stores them as slices.
the type structure looks something like this:

    typedef struct {
        PyObject* string;
        int start;
        int end;
    } PySliceListItem;

    typedef struct {
        PyObject_VAR_HEAD
        PySliceListItem item[1];
    } PySliceListObject;

where start and end are normalized (0..len(string))

    __len__ returns self->ob_size
    __getitem__ calls PySequence_GetSlice()

PySliceListObjects are only used internally; they
have no Python-level interface.

2. tweak string.join and unicode.join to look for
PySliceListObject's, and have special code that
copies slices directly from the source strings.

(note that a slice list can still be used with any
method that expects a sequence of strings, but
at a cost)

...

give the above, the substitution engine can now
create a slice list by combining slices from the match
object and the substitution object, and hand the
result off to the string implementation; e.g:

    sep =3D PySequence_GetSlice(subst_string, 0, 0):
    result =3D PyObject_CallMethod(sep, "join", "O", slice_list)
    Py_DECREF(sep);

(can anyone come up with something more elegant
than the [0:0] slice?)

comments?  better ideas?

</F>