emacs lisp text processing example (html5 figure/figcaption)

Ian Kelly ian.g.kelly at gmail.com
Tue Jul 5 21:17:25 CEST 2011


On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee <xahlee at gmail.com> wrote:
> So, a solution by regex is out.

Actually, none of the complications you listed appear to exclude
regexes.  Here's a possible (untested) solution:

<div class="img">
((?:\s*<img src="[^.]+\.(?:jpg|png|gif)" alt="[^"]+" width="[0-9]+"
height="[0-9]+">)+)
\s*<p class="cpt">((?:[^<]|<(?!/p>))+)</p>
\s*</div>

and corresponding replacement string:

<figure>
\1
<figcaption>\2</figcaption>
</figure>

I don't know what dialect Emacs uses for regexes; the above is the
Python re dialect.  I assume it is translatable.  If not, then the
above should at least work with other editors, such as Komodo's
"Find/Replace in Files" command.  I kept the line breaks here for
readability, but for completeness they should be stripped out of the
final regex.

The possibility of nested HTML in the caption is allowed for by using
a negative look-ahead assertion to accept any tag except a closing
</p>.  It would break if you had nested <p> tags, but then that would
be invalid html anyway.

Cheers,
Ian



More information about the Python-list mailing list