[Tutor] String Attribute

Alan Gauld alan.gauld at btinternet.com
Fri Jul 31 18:08:49 CEST 2015


On 31/07/15 15:39, ltc.hotspot at gmail.com wrote:

> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> for line in fh:
>      if not line.startswith('From'): continue
>      line2 = line.strip()
>      line3 = line2.split()
>      line4 = line3[1]
>      addresses = set()

Notice I said you had to create and initialize the set
*above* the loop.
Here you are creating a new set every time round the
loop and throwing away the old one.

>      addresses.add(line4)
>      count = count + 1
>      print addresses

And notice I said to move the print statement
to *after* the loop so as to print the complete set,
not just the current status.

> print "There were", count, "lines in the file with From as the first word"
>
> The code produces the following out put:
>
> In [15]: %run _8_5_v_13.py
> Enter file name: mbox-short.txt
> set(['stephen.marquard at uct.ac.za'])
> set(['stephen.marquard at uct.ac.za'])
> set(['louis at media.berkeley.edu'])

Thats correct because you create a new set each time
and add precisely one element to it before throwing
it away and starting over next time round.

> Question no. 1: is there a build in function for set that parses the data for duplicates.

No because thats what a set does. it is a collection of
unique items. It will not allow duplicates.

Your problem is you create a new set of one item for
every line. So you have multiple sets with the same
data in them.

>   Question no. 2: Why is there not a building function for append?

add() is the equivalent of append for a set.
If you try to add() a value that already exists it
will be ignored.

> Question no. 3: If all else fails, i.e., append & set,  my only option is the slice the data set?

No there are lots of other options but none of them are necessary 
because a set is a collection of unique values. You just need to
use it properly. Read my instructions again, carefully:

> You do that by first creating an empty set above
> the loop, let's call it addresses:
>
> addresses = set()
>
> Then replace your print statement with the set add()
> method:
>
> addresses.add(line4)
>
> This means that at the end of your loop you will have
> a set containing all of the unique addresses you found.
> You now print the set.



-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list