[Tutor] String Attribute
Alan Gauld
alan.gauld at btinternet.com
Fri Jul 31 18:08:49 CEST 2015
On 31/07/15 15:39, ltc.hotspot at gmail.com wrote:
> fname = raw_input("Enter file name: ")
> if len(fname) < 1 : fname = "mbox-short.txt"
> fh = open(fname)
> count = 0
> for line in fh:
> if not line.startswith('From'): continue
> line2 = line.strip()
> line3 = line2.split()
> line4 = line3[1]
> addresses = set()
Notice I said you had to create and initialize the set
*above* the loop.
Here you are creating a new set every time round the
loop and throwing away the old one.
> addresses.add(line4)
> count = count + 1
> print addresses
And notice I said to move the print statement
to *after* the loop so as to print the complete set,
not just the current status.
> print "There were", count, "lines in the file with From as the first word"
>
> The code produces the following out put:
>
> In [15]: %run _8_5_v_13.py
> Enter file name: mbox-short.txt
> set(['stephen.marquard at uct.ac.za'])
> set(['stephen.marquard at uct.ac.za'])
> set(['louis at media.berkeley.edu'])
Thats correct because you create a new set each time
and add precisely one element to it before throwing
it away and starting over next time round.
> Question no. 1: is there a build in function for set that parses the data for duplicates.
No because thats what a set does. it is a collection of
unique items. It will not allow duplicates.
Your problem is you create a new set of one item for
every line. So you have multiple sets with the same
data in them.
> Question no. 2: Why is there not a building function for append?
add() is the equivalent of append for a set.
If you try to add() a value that already exists it
will be ignored.
> Question no. 3: If all else fails, i.e., append & set, my only option is the slice the data set?
No there are lots of other options but none of them are necessary
because a set is a collection of unique values. You just need to
use it properly. Read my instructions again, carefully:
> You do that by first creating an empty set above
> the loop, let's call it addresses:
>
> addresses = set()
>
> Then replace your print statement with the set add()
> method:
>
> addresses.add(line4)
>
> This means that at the end of your loop you will have
> a set containing all of the unique addresses you found.
> You now print the set.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
More information about the Tutor
mailing list