question about speed of sequential string replacement vs regex or

Xah Lee xahlee at gmail.com
Wed Sep 28 15:14:33 CEST 2011


here's more detail about the origin of this problem. Relevant to emacs
lisp only.

------------------------------

in the definition of “replace-regexp-in-string”, there's this comment:

  ;; To avoid excessive consing from multiple matches in long strings,
  ;; don't just call `replace-match' continually.  Walk down the
  ;; string looking for matches of REGEXP and building up a (reversed)
  ;; list MATCHES.  This comprises segments of STRING which weren't
  ;; matched interspersed with replacements for segments that were.
  ;; [For a `large' number of replacements it's more efficient to
  ;; operate in a temporary buffer; we can't tell from the function's
  ;; args whether to choose the buffer-based implementation, though it
  ;; might be reasonable to do so for long enough STRING.]

note: «For a `large' number of replacements it's more efficient to
operate in a temporary buffer».

my question is, anyone got some idea, how “large” is large?

currently, i have a function replace-pairs-in-string which is
implemented by repeatedly calling “replace-pairs-in-string” like this:

(while (< ii (length pairs))
      (setq mystr (replace-regexp-in-string
                   (elt tempMapPoints ii)
                   (elt (elt pairs ii) 1)
                   mystr t t))
      (setq ii (1+ ii))
      )

When there are 260 pairs of strings to be replaced on a file that's
26k in size, my function takes about 3 seconds (which i think is too
slow). I'm at pain deciding whether my function should be implemented
like this or whether it should create a temp buffer. The problem with
temp buffer is that, if you repeatedly call it, the overhead of
creating buffer is going to make it much slower.

i was actually surprised that replace-regexp-in-string isn't written
in C, which i thought it was.

is there technical reason the replace-regexp-in-string isn't C? (i
suppose only because nobody every did it and the need for speed didn't
suffice?) and also, shouldn't there also be a replace-in-string
(literal, not regex)? because i thought implementing replacement for
string should be much simpler and faster, because buffers comes with
it a whole structure such as “point”, text properties, buffer names,
buffier modifier, etc.

 Xah

On Sep 28, 5:28 am, Xah Lee <xah... at gmail.com> wrote:
> On Sep 28, 3:57 am, mer... at stonehenge.com (Randal L. Schwartz) wrote:
>
> > >>>>> "Xah" == Xah Lee <xah... at gmail.com> writes:
>
> > Xah> curious question.
> > Xah> suppose you have 300 different strings and they need all be replaced
> > Xah> to say "aaa".
>
> > And then suppose this isn't the *real* question, but one entirely of
> > Fiction by Xah Lee.
>
> > How helpful do you want to be?
>
> it's a interesting question anyway.
>
> the question originally came from when i was coding elisp of a
> function that changes html entities to unicode char literal. The
> problem is slightly complicated, involving a few questions about speed
> in emacs. e.g. string vs buffer, and much more... i spent several
> hours on this but it's probably too boring to detail (but i'll do so
> if anyone wishes). But anyway, while digging these questions that's
> not clear in my mind, i thought of why not generate a regex or
> construct and do it in one shot, and wondered if that'd be faster. But
> afterwards, i realized this wouldn't be applicable to my problem
> because for my problem each string needs to be changed to a unique
> string, not all to the same string.
>
>  Xah




More information about the Python-list mailing list