Replace Several Items
fredrik at pythonware.com
Thu Aug 14 08:27:53 CEST 2008
Steven D'Aprano wrote:
> While I'm gratified that my prediction was so close to the results I
> found, I welcome any suggestions to better/faster/more efficient code.
> more things to try:
- Factor out the creation of the regular expression from the tests:
"escape" and "compile" are relatively expensive, and neither throw-away
code (using the RE function forms) nor production code will end up doing
them both for each string.
- Same w. the translation table for "translate"
- Use Unicode strings instead of byte strings (we're moving towards 3.0,
test data variations:
- Try dropping the number of actual replacements and see what happens --
if you're escaping user-provided data (e.g. HTML), for example, it's not
that unlikely that you end up doing only a few replacements for each
string you're processing, or no replacements at all.
- Also try shorter and longer strings ("human-sized" data is often
provided in shorter chunks than 216 characters per string; the typical
size and distribution depends on your actual application, of course).
Unicode will affect translate more than the others; the last two will
most likely affect in-replace instead (that approach gets faster the
shorter the strings are, and the fewer calls to replace that you
actually end up doing).
Finally, if you want the sub-lambda form to look better, try inserting a
character before or after each special character using a template string
or a lambda (e.g. a backslash).
More information about the Python-list