Replace Several Items

Fredrik Lundh fredrik at pythonware.com
Thu Aug 14 08:27:53 CEST 2008


Steven D'Aprano wrote:

 > While I'm gratified that my prediction was so close to the results I
 > found, I welcome any suggestions to better/faster/more efficient code.
 > more things to try:

code tweaks:

- Factor out the creation of the regular expression from the tests: 
"escape" and "compile" are relatively expensive, and neither throw-away 
code (using the RE function forms) nor production code will end up doing 
them both for each string.

- Same w. the translation table for "translate"

- Use Unicode strings instead of byte strings (we're moving towards 3.0, 
after all).

test data variations:

- Try dropping the number of actual replacements and see what happens -- 
if you're escaping user-provided data (e.g. HTML), for example, it's not 
that unlikely that you end up doing only a few replacements for each 
string you're processing, or no replacements at all.

- Also try shorter and longer strings ("human-sized" data is often 
provided in shorter chunks than 216 characters per string; the typical 
size and distribution depends on your actual application, of course).

Unicode will affect translate more than the others; the last two will 
most likely affect in-replace instead (that approach gets faster the 
shorter the strings are, and the fewer calls to replace that you 
actually end up doing).

Finally, if you want the sub-lambda form to look better, try inserting a 
character before or after each special character using a template string 
or a lambda (e.g. a backslash).

</F>




More information about the Python-list mailing list