Insert characters into string based on re ?

John Machin sjmachin at
Fri Oct 13 00:38:07 CEST 2006

Matt wrote:
> I am attempting to reformat a string, inserting newlines before certain
> phrases. For example, in formatting SQL, I want to start a new line at
> each JOIN condition. Noting that strings are immutable, I thought it
> best to spllit the string at the key points, then join with '\n'.
> Regexps can seem the best way to identify the points in the string
> ('LEFT.*JOIN' to cover 'LEFT OUTER JOIN' and 'LEFT JOIN'), since I need
> to identify multiple locationg in the string. However, the re.split
> method returns the list without the split phrases

Not without some minor effort on your part :-)
See below.

> and re.findall does
> not seem useful for this operation.
> Suggestions?

Read the fine manual:
split( pattern, string[, maxsplit = 0])

Split string by the occurrences of pattern. If capturing parentheses
are used in pattern, then the text of all groups in the pattern are
also returned as part of the resulting list. If maxsplit is nonzero, at
most maxsplit splits occur, and the remainder of the string is returned
as the final element of the list. (Incompatibility note: in the
original Python 1.5 release, maxsplit was ignored. This has been fixed
in later releases.)

>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']

# Now see what happens when you use capturing parentheses:

>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']


More information about the Python-list mailing list