Double replace or single re.sub?
iainking at gmail.com
Wed Oct 26 15:57:39 CEST 2005
Mike Meyer wrote:
> "Iain King" <iainking at gmail.com> writes:
> > I have some code that converts html into xhtml. For example, convert
> > all <i> tags into <em>. Right now I need to do to string.replace calls
> > for every tag:
> > html = html.replace('<i>','<em>')
> > html = html.replace('</i>','</em>')
> > I can change this to a single call to re.sub:
> > html = re.sub('<([/]*)i>', r'<\1em>', html)
> > Would this be a quicker/better way of doing it?
> Maybe. You could measure it and see. But neither will work in the face
> of attributes or whitespace in the tag.
> If you're going to parse [X]HTML, you really should use tools that are
> designed for the job. If you have well-formed HTML, you can use the
> htmllib parser in the standard library. If you have the usual crap one
> finds on the web, I recommend BeautifulSoup.
Thanks. My initial post overstates the program a bit - what I actually
have is a cgi script which outputs my LIveJournal, which I then
server-side include in my home page (so my home page also displays the
latest X entries in my livejournal). The only html I need to convert
is the stuff that LJ spews out, which, while bad, isn't terrible, and
is fairly consistent. The stuff I need to convert is mostly stuff I
write myself in journal entries, so it doesn't have to be so
comprehensive that I'd need something like BeautifulSoup. I'm not
trying to parse it, just clean it up a little.
More information about the Python-list