emacs lisp text processing example (html5 figure/figcaption)

Xah Lee xahlee at gmail.com
Tue Jul 5 16:37:18 EDT 2011

On Jul 5, 12:17 pm, Ian Kelly <ian.g.ke... at gmail.com> wrote:
> On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee <xah... at gmail.com> wrote:
> > So, a solution by regex is out.
> Actually, none of the complications you listed appear to exclude
> regexes.  Here's a possible (untested) solution:
> <div class="img">
> ((?:\s*<img src="[^.]+\.(?:jpg|png|gif)" alt="[^"]+" width="[0-9]+"
> height="[0-9]+">)+)
> \s*<p class="cpt">((?:[^<]|<(?!/p>))+)</p>
> \s*</div>
> and corresponding replacement string:
> <figure>
> \1
> <figcaption>\2</figcaption>
> </figure>
> I don't know what dialect Emacs uses for regexes; the above is the
> Python re dialect.  I assume it is translatable.  If not, then the
> above should at least work with other editors, such as Komodo's
> "Find/Replace in Files" command.  I kept the line breaks here for
> readability, but for completeness they should be stripped out of the
> final regex.
> The possibility of nested HTML in the caption is allowed for by using
> a negative look-ahead assertion to accept any tag except a closing
> </p>.  It would break if you had nested <p> tags, but then that would
> be invalid html anyway.
> Cheers,
> Ian

emacs regex supports shygroup (the 「(?:…)」) but it doesn't support the
negative assertion 「?!…」 though.

but in anycase, i can't see how this part would work
<p class="cpt">((?:[^<]|<(?!/p>))+)</p>



More information about the Python-list mailing list