# Newbie regular expression and whitespace question

Thu Sep 22 21:58:49 CEST 2005

Hi.

I am trying to collapse an html table into a single line.  Basically,
anytime I see ">" & "<" with nothing but whitespace between them,  I'd
like to remove all the whitespace, including newlines. I've read the
how-to and I have tried a bunch of things,  but nothing seems to work
for me:

--

table = open(r'D:\path\to\tabletest.txt', 'rb')

#Below find the different sort of things I have tried, one at a time:

strTable = strTable.replace(">\s<", "><") #I got this from the module
docs

strTable = strTable.replace(">.<", "><")

strTable = ">\s+<".join(strTable)

strTable = ">\s<".join(strTable)

print strTable

--

The table in question looks like this:

<table width="80%"  border="0">
<tr>
<td> </td>
<td colspan="2">Introduction</td>
<td><div align="right">3</div></td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td><i>ONE</i></td>
<td colspan="2">Childraising for Parrots</td>
<td><div align="right">11</div></td>
</tr>
</table>

For extra kudos (and I confess I have been so stuck on the above
problem I haven't put much thought into how to do this one) I'd like to
be able to measure the number of characters between the <p> & </p>
tags, and then insert a newline character at the end of the next word
after an arbitrary number of characters.....   I am reading in to a
script a bunch of paragraphs formatted for a webpage, but they're all
on one big long line and I would like to split them for readability.

TIA