question about speed of sequential string replacement vs regex or
xahlee at gmail.com
Wed Sep 28 15:14:33 CEST 2011
here's more detail about the origin of this problem. Relevant to emacs
in the definition of “replace-regexp-in-string”, there's this comment:
;; To avoid excessive consing from multiple matches in long strings,
;; don't just call `replace-match' continually. Walk down the
;; string looking for matches of REGEXP and building up a (reversed)
;; list MATCHES. This comprises segments of STRING which weren't
;; matched interspersed with replacements for segments that were.
;; [For a `large' number of replacements it's more efficient to
;; operate in a temporary buffer; we can't tell from the function's
;; args whether to choose the buffer-based implementation, though it
;; might be reasonable to do so for long enough STRING.]
note: «For a `large' number of replacements it's more efficient to
operate in a temporary buffer».
my question is, anyone got some idea, how “large” is large?
currently, i have a function replace-pairs-in-string which is
implemented by repeatedly calling “replace-pairs-in-string” like this:
(while (< ii (length pairs))
(setq mystr (replace-regexp-in-string
(elt tempMapPoints ii)
(elt (elt pairs ii) 1)
mystr t t))
(setq ii (1+ ii))
When there are 260 pairs of strings to be replaced on a file that's
26k in size, my function takes about 3 seconds (which i think is too
slow). I'm at pain deciding whether my function should be implemented
like this or whether it should create a temp buffer. The problem with
temp buffer is that, if you repeatedly call it, the overhead of
creating buffer is going to make it much slower.
i was actually surprised that replace-regexp-in-string isn't written
in C, which i thought it was.
is there technical reason the replace-regexp-in-string isn't C? (i
suppose only because nobody every did it and the need for speed didn't
suffice?) and also, shouldn't there also be a replace-in-string
(literal, not regex)? because i thought implementing replacement for
string should be much simpler and faster, because buffers comes with
it a whole structure such as “point”, text properties, buffer names,
buffier modifier, etc.
On Sep 28, 5:28 am, Xah Lee <xah... at gmail.com> wrote:
> On Sep 28, 3:57 am, mer... at stonehenge.com (Randal L. Schwartz) wrote:
> > >>>>> "Xah" == Xah Lee <xah... at gmail.com> writes:
> > Xah> curious question.
> > Xah> suppose you have 300 different strings and they need all be replaced
> > Xah> to say "aaa".
> > And then suppose this isn't the *real* question, but one entirely of
> > Fiction by Xah Lee.
> > How helpful do you want to be?
> it's a interesting question anyway.
> the question originally came from when i was coding elisp of a
> function that changes html entities to unicode char literal. The
> problem is slightly complicated, involving a few questions about speed
> in emacs. e.g. string vs buffer, and much more... i spent several
> hours on this but it's probably too boring to detail (but i'll do so
> if anyone wishes). But anyway, while digging these questions that's
> not clear in my mind, i thought of why not generate a regex or
> construct and do it in one shot, and wondered if that'd be faster. But
> afterwards, i realized this wouldn't be applicable to my problem
> because for my problem each string needs to be changed to a unique
> string, not all to the same string.
More information about the Python-list