when hacking on SRE's substitution code, I stumbled upon a problem. to do a substitution, SRE needs to merge slices from the target strings and from the sub- stitution pattern. here's a simple example: re.sub( "(perl|tcl|java)", "python (not \\1)", "perl rules" ) contains a "substitution pattern" consisting of three parts: "python (not " (a slice from the substitution string) group 1 (a slice from the target string) ")" (a slice from the substitution string) PCRE implements this by doing the slicing (thus creating three new strings), and then doing a "join" by hand into a PyString buffer. this isn't very efficient, and it also doesn't work for uni- code strings. in other words, this needs to be fixed. but how? ... here's one proposal, off the top of my head: 1. introduce a PySliceListObject, which behaves like a simple sequence of strings, but stores them as slices. the type structure looks something like this: typedef struct { PyObject* string; int start; int end; } PySliceListItem; typedef struct { PyObject_VAR_HEAD PySliceListItem item[1]; } PySliceListObject; where start and end are normalized (0..len(string)) __len__ returns self->ob_size __getitem__ calls PySequence_GetSlice() PySliceListObjects are only used internally; they have no Python-level interface. 2. tweak string.join and unicode.join to look for PySliceListObject's, and have special code that copies slices directly from the source strings. (note that a slice list can still be used with any method that expects a sequence of strings, but at a cost) ... give the above, the substitution engine can now create a slice list by combining slices from the match object and the substitution object, and hand the result off to the string implementation; e.g: sep = PySequence_GetSlice(subst_string, 0, 0): result = PyObject_CallMethod(sep, "join", "O", slice_list) Py_DECREF(sep); (can anyone come up with something more elegant than the [0:0] slice?) comments? better ideas? </F>