deleting texts between patterns
John Machin
sjmachin at lexicon.net
Sun Jun 4 19:36:14 EDT 2006
On 5/06/2006 2:51 AM, Baoqiu Cui wrote:
> John Machin <sjmachin at lexicon.net> writes:
>
>> Uh-oh.
>>
>> Try this:
>>
>>>>> pat = re.compile('(?<=abc\n).*?(?=xyz\n)', re.DOTALL)
>>>>> re.sub(pat, '', linestr)
>> 'blahfubarabc\nxyz\nxyzzy'
>
> This regexp still has a problem. It may remove the lines between two
> lines like 'aaabc' and 'xxxyz' (and also removes the first two 'x's in
> 'xxxyz').
>
> The following regexp works better:
>
> pattern = re.compile('(?<=^abc\n).*?(?=^xyz\n)', re.DOTALL | re.MULTILINE)
>
You are quite correct. Your reply, and the rejoinder below, only add to
the proposition that regexes are not necessarily the best choice for
every text-processing job :-)
Just in case the last line is 'xyz' but is not terminated by '\n':
pattern = re.compile('(?<=^abc\n).*?(?=^xyz$)', re.DOTALL | re.MULTILINE)
Cheers,
John
More information about the Python-list
mailing list