[Tutor] remove blank list items

Michael Langford mlangford.cs03 at gtalumni.org
Fri Sep 14 16:11:38 CEST 2007


First off:
  What you want is a set, not a list. Lucky for you, a python dict uses a
set for its keys, so cheat and use that.

change:
for j in input:
    print j,

to
mySet={}
for j in input:
   mySet[j]=j
for item in mySet.keys():
   print item

Secondly, to not print blank lines, strip items and only add them if they
aren't == ""


change:
for j in input:
    print j,

to
mySet={}
for j in input:
   cleaned = j.strip()
   if j!="":
       mySet[cleaned]=cleaned

for item in mySet.keys():
   print item

On 9/14/07, sacha rook <sacharook at hotmail.co.uk> wrote:
>
> Hi
>
> i was expanding my program to write urls parsed from a html page and write
> them to a file so i chose www.icq.com to extract the urls from.
>
> when i wrote these out to a file and then read the file back I noticed a
> list of urls then some blank lines then some more urls then some blank
> lines, does this mean that one of the functions called has for some reason
> added some whitespace into some of the list items so that i wrote them out
> to disk?
>
> I also noticed that there are duplicate hosts/urls that have been written
> to the file.
>
> So my two questions are;
> 1. how and where do I tackle removing the whitespace from being written
> out to disk?
>
> 2. how do i tackle checking for duplicate entries in a list before writing
> them out to disk?
>
> My code is below
> from BeautifulSoup import BeautifulSoup
> import urllib2
> import urlparse
> file = urllib2.urlopen("http://www.icq.com")
> soup = BeautifulSoup(''.join(file))
> alist = soup.findAll('a')
> output = open("fqdns.txt","w")
> for a in alist:
>     href = a['href']
>     output.write(urlparse.urlparse(href)[1] + "\n")
> output.close()
> input = open("fqdns.txt","r")
> for j in input:
>     print j,
> input.close()
>
> the chopped output is here
>
> chat.icq.com
> chat.icq.com
> chat.icq.com
> chat.icq.com
> chat.icq.com
>
>
> labs.icq.com
> download.icq.com
> greetings.icq.com
> greetings.icq.com
> greetings.icq.com
> games.icq.com
> games.icq.com
>
> ------------------------------
> Get free emoticon packs and customisation from Windows Live. Pimp My Live!<http://www.pimpmylive.co.uk>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>


-- 
Michael Langford
Phone: 404-386-0495
Consulting: http://www.TierOneDesign.com/
Entertaining: http://www.ThisIsYourCruiseDirectorSpeaking.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070914/73147512/attachment.htm 


More information about the Tutor mailing list