[Cython] How to improve the performance when doing string/unicode replace and search?

Stefan Behnel stefan_ml at behnel.de
Wed Mar 30 07:02:00 CEST 2011


Yunfan Jiang, 30.03.2011 05:33:
> hi, i used to ask some string process question here, and found  a bug, it
> seems you guys fix the bug but not use it

Not sure what you mean by "not use it".


> and this time , my problem is about the performance,
> i need to wrote  a filter which search sorts of keywords in the target
> string , and stop if matched,
> this act require unicode input/output  , so i wrote a trie like module to
> done it, it works ,but i found its too slower than using regex module
> so could you guys give some tips on string process performance?

Note that the right place to ask usage related questions is the Cython 
users mailing list, not the core developers mailing list. I set a follow-up 
to point you there.

Generally speaking, a trie isn't necessarily fast, and it's certainly not 
the best algorithmic approach for keyword search. You should read up on 
Aho-Corasick and friends. I also wrote a simple Cython module that 
implements a keyword search algorithm ("acora", it's on PyPI), but it's 
unusable for large sets of keywords due to state explosion. It's pretty 
fast for smaller sets though.

Stefan


More information about the cython-devel mailing list