[newbie] how to remove empty lines from webpage/file

Dan Stromberg drsalists at gmail.com
Tue Feb 27 10:25:53 EST 2018


Perhaps replace:
lines=soup.get_text()
file.write(lines)

...with something like:
text = soup.get_text()
lines = text.split('\n')
for line in lines:
    if line.strip():
        file.write('%s\n' % (line, ))

(untested)


On Tue, Feb 27, 2018 at 2:50 AM,  <jenswaelkens at gmail.com> wrote:
> Dear all,
> I try to get the numerical data from the following webpage:
> http://www.astro.oma.be/GENERAL/INFO/nzon/zon_2018.html
>
> With the following code-fragment I was already able to get a partial result:
>
> #!/usr/bin/env python
> #memo: install bs4 as follows: sudo easy_install bs4
> # -*- coding: utf-8 -*-
> #3 lines below necessary to avoid encoding problem
> import sys
> reload(sys)
> sys.setdefaultencoding('utf8')
> import urllib2
> file = open("testfile.txt","w")
> source = "http://www.astro.oma.be/GENERAL/INFO/nzon/zon_2018.html"
> page = urllib2.urlopen(source)
> from bs4 import BeautifulSoup
> soup = BeautifulSoup(page,'lxml')
> lines=soup.get_text()
> file.write(lines)
> file.close()
>
> I tried to delete the empty lines but I am totally stuck at this moment, can anyone help me further?
>
> thanks in advance
> jens
> --
> https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list