doubling the number of tests, but not taking twice as long

MRAB python at mrabarnett.plus.com
Wed Jul 18 19:59:42 EDT 2018


On 2018-07-18 22:40, Larry Martell wrote:
> On Tue, Jul 17, 2018 at 11:43 AM, Neil Cerutti <neilc at norwich.edu> wrote:
>> On 2018-07-16, Larry Martell <larry.martell at gmail.com> wrote:
>>> I had some code that did this:
>>>
>>> meas_regex = '_M\d+_'
>>> meas_re = re.compile(meas_regex)
>>>
>>> if meas_re.search(filename):
>>>     stuff1()
>>> else:
>>>     stuff2()
>>>
>>> I then had to change it to this:
>>>
>>> if meas_re.search(filename):
>>>     if 'MeasDisplay' in filename:
>>>         stuff1a()
>>>     else:
>>>         stuff1()
>>> else:
>>>     if 'PatternFov' in filename:
>>>         stuff2a()
>>>    else:
>>>         stuff2()
>>>
>>> This code needs to process many tens of 1000's of files, and it
>>> runs often, so it needs to run very fast. Needless to say, my
>>> change has made it take 2x as long. Can anyone see a way to
>>> improve that?
>>
>> Can you expand/improve the regex pattern so you don't have rescan
>> the string to check for the presence of MeasDisplay and
>> PatternFov? In other words, since you're already using the giant,
>> Swiss Army sledgehammer of the re module, go ahead and use enough
>> features to cover your use case.
> 
> Yeah, that was my first thought, but I haven't been able to come up
> with a regex that works.
> 
> There are 4 cases I need to detect:
> 
> case1 = 'spam_M123_eggs_MeasDisplay_sausage'
> case2 = 'spam_M123_eggs_sausage_and_spam'
> case3 = 'spam_spam_spam_PatternFov_eggs_sausage_and_spam'
> case4 = 'spam_spam_spam_eggs_sausage_and_spam'
> 
> I thought this regex would work:
> 
> '(_M\d+_){0,1}.*?(MeasDisplay|PatternFOV){0,1}'
> 
> And then I could look at the match objects and see which of the 4
> cases it was. But try as I might, I could not get it to work. Any
> regex gurus want to tell me what I am doing wrong here?
> 
The trick to capturing both of the parts when they are both optional is 
to use a lookahead and make it optional:

r'(?=.*?(_M\d+_))?(?=.*?(MeasDisplay|PatternFov))?'


More information about the Python-list mailing list