[Python-Dev] RFD: how to build strings from lots of slices?
Fredrik Lundh
Fredrik Lundh" <effbot@telia.com
Sun, 27 Feb 2000 13:01:38 +0100
when hacking on SRE's substitution code, I stumbled
upon a problem. to do a substitution, SRE needs to
merge slices from the target strings and from the sub-
stitution pattern.
here's a simple example:
re.sub(
"(perl|tcl|java)",
"python (not \\1)",
"perl rules"
)
contains a "substitution pattern" consisting of three
parts:
"python (not " (a slice from the substitution string)
group 1 (a slice from the target string)
")" (a slice from the substitution string)
PCRE implements this by doing the slicing (thus creating
three new strings), and then doing a "join" by hand into
a PyString buffer.
this isn't very efficient, and it also doesn't work for uni-
code strings.
in other words, this needs to be fixed. but how?
...
here's one proposal, off the top of my head:
1. introduce a PySliceListObject, which behaves like a
simple sequence of strings, but stores them as slices.
the type structure looks something like this:
typedef struct {
PyObject* string;
int start;
int end;
} PySliceListItem;
typedef struct {
PyObject_VAR_HEAD
PySliceListItem item[1];
} PySliceListObject;
where start and end are normalized (0..len(string))
__len__ returns self->ob_size
__getitem__ calls PySequence_GetSlice()
PySliceListObjects are only used internally; they
have no Python-level interface.
2. tweak string.join and unicode.join to look for
PySliceListObject's, and have special code that
copies slices directly from the source strings.
(note that a slice list can still be used with any
method that expects a sequence of strings, but
at a cost)
...
give the above, the substitution engine can now
create a slice list by combining slices from the match
object and the substitution object, and hand the
result off to the string implementation; e.g:
sep =3D PySequence_GetSlice(subst_string, 0, 0):
result =3D PyObject_CallMethod(sep, "join", "O", slice_list)
Py_DECREF(sep);
(can anyone come up with something more elegant
than the [0:0] slice?)
comments? better ideas?
</F>