[Cython] How to improve the performance when doing string/unicode replace and search?
stefan_ml at behnel.de
Wed Mar 30 07:02:00 CEST 2011
Yunfan Jiang, 30.03.2011 05:33:
> hi, i used to ask some string process question here, and found a bug, it
> seems you guys fix the bug but not use it
Not sure what you mean by "not use it".
> and this time , my problem is about the performance,
> i need to wrote a filter which search sorts of keywords in the target
> string , and stop if matched,
> this act require unicode input/output , so i wrote a trie like module to
> done it, it works ,but i found its too slower than using regex module
> so could you guys give some tips on string process performance?
Note that the right place to ask usage related questions is the Cython
users mailing list, not the core developers mailing list. I set a follow-up
to point you there.
Generally speaking, a trie isn't necessarily fast, and it's certainly not
the best algorithmic approach for keyword search. You should read up on
Aho-Corasick and friends. I also wrote a simple Cython module that
implements a keyword search algorithm ("acora", it's on PyPI), but it's
unusable for large sets of keywords due to state explosion. It's pretty
fast for smaller sets though.
More information about the cython-devel