Newbie regular expression and whitespace question

googleboy mynews44 at
Thu Sep 22 21:58:49 CEST 2005


I am trying to collapse an html table into a single line.  Basically,
anytime I see ">" & "<" with nothing but whitespace between them,  I'd
like to remove all the whitespace, including newlines. I've read the
how-to and I have tried a bunch of things,  but nothing seems to work
for me:


table = open(r'D:\path\to\tabletest.txt', 'rb')
strTable =

#Below find the different sort of things I have tried, one at a time:

strTable = strTable.replace(">\s<", "><") #I got this from the module

strTable = strTable.replace(">.<", "><")

strTable = ">\s+<".join(strTable)

strTable = ">\s<".join(strTable)

print strTable


The table in question looks like this:

<table width="80%"  border="0">
    <td> </td>
    <td colspan="2">Introduction</td>
    <td><div align="right">3</div></td>
    <td> </td>
    <td colspan="2">Childraising for Parrots</td>
    <td><div align="right">11</div></td>

For extra kudos (and I confess I have been so stuck on the above
problem I haven't put much thought into how to do this one) I'd like to
be able to measure the number of characters between the <p> & </p>
tags, and then insert a newline character at the end of the next word
after an arbitrary number of characters.....   I am reading in to a
script a bunch of paragraphs formatted for a webpage, but they're all
on one big long line and I would like to split them for readability.



More information about the Python-list mailing list