python regex: variable length of positive lookbehind assertion
Jussi Piitulainen
jussi.piitulainen at helsinki.fi
Wed Jun 15 08:55:42 EDT 2016
alister writes:
> On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:
>
>> Hi everyone,
>> I am struggling writing a right regex that match what I want:
>>
>> Problem Description:
>>
>> Given a string like this:
>>
>> >>>string = "false_head <a>aaa</a> <a>bbb</a> false_tail \
>> true_head some_text_here <a>ccc</a> <a>ddd</a> <a>eee</a>
>> true_tail"
>>
>> I want to match the all the text surrounded by those "<a> </a>",
>> but only if those "<a> </a>" locate **in some distance** behind
>> "true_head". That is, I expect to result to be like this:
>>
>> >>>import re result = re.findall("the_regex",string)
>> >>>print result
>> ["ccc","ddd","eee"]
>>
>> How can I write a regex to match that?
>> I have try to use the **positive lookbehind assertion** in python regex,
>> but it does not allowed variable length of lookbehind.
>>
>> Thanks in advance,
>> Ruan
>
> don't try to use regex to parse html it wont work reliably
> i am surprised no one has mentioned beautifulsoup yet, which is probably
> what you require.
Nothing in the question indicates that the data is HTML.
More information about the Python-list
mailing list