emacs lisp text processing example (html5 figure/figcaption)
Ian Kelly
ian.g.kelly at gmail.com
Tue Jul 5 15:17:25 EDT 2011
On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee <xahlee at gmail.com> wrote:
> So, a solution by regex is out.
Actually, none of the complications you listed appear to exclude
regexes. Here's a possible (untested) solution:
<div class="img">
((?:\s*<img src="[^.]+\.(?:jpg|png|gif)" alt="[^"]+" width="[0-9]+"
height="[0-9]+">)+)
\s*<p class="cpt">((?:[^<]|<(?!/p>))+)</p>
\s*</div>
and corresponding replacement string:
<figure>
\1
<figcaption>\2</figcaption>
</figure>
I don't know what dialect Emacs uses for regexes; the above is the
Python re dialect. I assume it is translatable. If not, then the
above should at least work with other editors, such as Komodo's
"Find/Replace in Files" command. I kept the line breaks here for
readability, but for completeness they should be stripped out of the
final regex.
The possibility of nested HTML in the caption is allowed for by using
a negative look-ahead assertion to accept any tag except a closing
</p>. It would break if you had nested <p> tags, but then that would
be invalid html anyway.
Cheers,
Ian
More information about the Python-list
mailing list