re.search much slower then grep on some regular expressions

Sat Jul 5 00:08:59 EDT 2008

On Fri, 4 Jul 2008 20:34:03 -0700 (PDT), Carl Banks wrote:
> On Jul 4, 4:43 pm, "Filipe Fernandes" <fernandes... at gmail.com> wrote:
>> On Fri, Jul 4, 2008 at 8:36 AM, Peter Otten <__pete... at web.de> wrote:
>> > Henning_Thornblad wrote:
>>
>> >> This script takes about 5 min to run on my computer:
>> >> #!/usr/bin/env python
>> >> import re
>>
>> >> row=""
>> >> for a in range(156000):
>> >>     row+="a"
>> >> print re.search('[^ "=]*/',row)
[snip]
>> >> Is this a bug in python?
>
> This behavior is showing that you're getting n-squared performance;
> the regexp seems to be checking 156000*(156000-1)/2 substrings for a
> match.

I did this:

  $ python -m timeit -s "import re" "re.search( '[^13]*x', 900*'a' )"
  100 loops, best of 3: 16.7 msec per loop

for values of 900 ranging from 300 to 1000, and the time taken
per loop was indeed quadratic.

-- 
To email me, substitute nowhere->spamcop, invalid->net.