Remove spaces and line wraps from html?

RiGGa rigga at hasnomail.com
Fri Jun 18 20:25:19 CEST 2004


Hi,

I have a html file that I need to process and it contains text in this
format:

<TD><SPAN class=xf id=EmployeeNo 
title="Employee Number">0123456</SPAN></TD></TR>

(Note split over two lines is as it appears in the source file.)

I would like to use Python (or anything else really) to have it all on one
line i.e.
 
<TD><SPAN class=xf id=EmployeeNo title="Employee
Number">0123456</SPAN></TD></TR>

(Note this has wrapped to the 2nd line)

Reason I would like to do this is so it is easier to pull back the
information from the file, I am interested in the contents of the title=
field and the data immediately after the >  (in this case 0123456).  I have
a basic Python program I have written to handle this however with the
script in its current format it goes wrong when its split over a line like
my first example.

Hope this all makes sense.

Any help appreciated.





More information about the Python-list mailing list