mal wrote:
Just for compares: would you mind running the search routines in mxTextTools on the same machine ?
searching for "spam" in a string padded with "spaz" (1000 bytes on each side of the target):
string.find 0.112 ms
texttools.find 0.080 ms
sre8.search 0.059 pre.search 0.122
unicode.find 0.130 sre16.search 0.065
same test, without any false matches (padded with "-"):
string.find 0.035 ms
texttools.find 0.083 ms
sre8.search 0.050 pre.search 0.116
unicode.find 0.031 sre16.search 0.055
Those results are probably due to the fact that string.find does a brute force search. If it would do a last match char first search or even Boyer-Moore (this only pays off for long search targets) then it should be a lot faster than [s|p]re.
does the TextTools algorithm work with arbitrary character set sizes, btw? </F>