python regex: variable length of positive lookbehind assertion
alister
alister.ware at ntlworld.com
Wed Jun 15 08:27:52 EDT 2016
On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:
> Hi everyone,
> I am struggling writing a right regex that match what I want:
>
> Problem Description:
>
> Given a string like this:
>
> >>>string = "false_head <a>aaa</a> <a>bbb</a> false_tail \
> true_head some_text_here <a>ccc</a> <a>ddd</a> <a>eee</a>
> true_tail"
>
> I want to match the all the text surrounded by those "<a> </a>",
> but only if those "<a> </a>" locate **in some distance** behind
> "true_head". That is, I expect to result to be like this:
>
> >>>import re result = re.findall("the_regex",string)
> >>>print result
> ["ccc","ddd","eee"]
>
> How can I write a regex to match that?
> I have try to use the **positive lookbehind assertion** in python regex,
> but it does not allowed variable length of lookbehind.
>
> Thanks in advance,
> Ruan
don't try to use regex to parse html it wont work reliably
i am surprised no one has mentioned beautifulsoup yet, which is probably
what you require.
--
What we anticipate seldom occurs; what we least expect generally happens.
-- Bengamin Disraeli
More information about the Python-list
mailing list