<div dir="ltr"><br><div class="gmail_extra"><br clear="all"><div><div dir="ltr"><div><font face="courier new, monospace"><br></font></div><font face="courier new, monospace"><br></font><div><font face="courier new, monospace">I guess this may help you</font></div>

<div><font face="courier new, monospace">--------------------------</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"><div>

import operator</div><div><br></div><div>from string import whitespace as space</div><div>from string import punctuation as punc</div><div><br></div><div>class TextProcessing(object):</div><div>    """."""</div>

<div>    def __init__(self):</div><div>        """."""</div><div>        self.file = None</div><div>        self.sorted_list = []</div><div>        self.words_and_occurence = {}</div><div><br>

</div><div>    def __sort_dict_by_value(self):</div><div>        """."""</div><div>        sorted_in_rev = sorted(self.words_and_occurence.items(), key=lambda x: x[1])</div><div>        self.sorted_list = sorted_in_rev[::-1]</div>

<div><br></div><div>    def __validate_words(self, word):</div><div>        """."""</div><div>        if word in self.words_and_occurence:</div><div>            self.words_and_occurence[word] += 1</div>

<div>        else:</div><div>            self.words_and_occurence[word] = 1</div><div><br></div><div>    def __parse_file(self, file_name):</div><div>        """."""</div><div>        fp = open(file_name, 'r')</div>

<div>        line = fp.readline()</div><div>        while line:</div><div>            split_line = [self.__validate_words(word.strip(punc + space)) \</div><div>                          for word in line.split()</div><div>

                          if word.strip(punc + space)]</div><div>            </div><div>            line = fp.readline()</div><div>        fp.close()</div><div><br></div><div>    def parse_file(self, file_name=None):</div>

<div>        """."""</div><div>        if file_name is None:</div><div>            raise Exception("Please pass the file to be parsed")</div><div>        if not file_name.endswith(r".txt"):</div>

<div>            raise Exception("*** Error *** Not a valid text file")</div><div><br></div><div>        self.__parse_file(file_name)</div><div>        </div><div>        self.__sort_dict_by_value()</div><div><br>

</div><div>    def print_top_n(self, n):</div><div>        """."""</div><div>        print "Top {0} words:".format(n), [self.sorted_list[i][0] for i in xrange(n)]</div><div><br></div>

<div>    def print_unique_words(self):</div><div>        """."""</div><div>        print "Unique words:", [self.sorted_list[i][0] for i in xrange(len(self.sorted_list))]</div><div><br>

</div><div>if __name__ == "__main__":</div><div>    """."""</div><div>    obj = TextProcessing()</div><div>    obj.parse_file(r'test_input.txt')</div><div>    obj.print_top_n(4)</div>

<div>    obj.print_unique_words()</div><div><br></div></font></div><div><br></div><div><font face="courier new, monospace"><br></font></div><div><br></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"><b>-- Regards --</b></font></div>

<div><font face="courier new, monospace"><b><br></b></font></div><div><font face="courier new, monospace"><b>   Siva Cn</b></font></div><div><font face="courier new, monospace" size="1"><b>Python Developer</b></font></div>

<div><font face="courier new, monospace"><b><br></b></font></div><div><font face="courier new, monospace" size="1"><b>+91 9620339598</b></font></div><div><font face="courier new, monospace" size="1"><b><a href="http://www.cnsiva.com" target="_blank">http://www.cnsiva.com</a></b></font></div>

<div><font face="courier new, monospace">---------------------</font><br></div></div></div>

<br><br><div class="gmail_quote">On Thu, Oct 17, 2013 at 7:58 PM,  <span dir="ltr"><<a href="mailto:tutor-request@python.org" target="_blank">tutor-request@python.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

Send Tutor mailing list submissions to<br>

        <a href="mailto:tutor@python.org">tutor@python.org</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="https://mail.python.org/mailman/listinfo/tutor" target="_blank">https://mail.python.org/mailman/listinfo/tutor</a><br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:tutor-request@python.org">tutor-request@python.org</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:tutor-owner@python.org">tutor-owner@python.org</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of Tutor digest..."<br>

<br>

<br>

Today's Topics:<br>

<br>

   1. Re: Help please (Alan Gauld)<br>

   2. Re: Help please (Peter Otten)<br>

   3. Re: Help please (Dominik George)<br>

   4. Re: Help please (Kengesbayev, Askar)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Thu, 17 Oct 2013 14:13:07 +0100<br>

From: Alan Gauld <<a href="mailto:alan.gauld@btinternet.com">alan.gauld@btinternet.com</a>><br>

To: <a href="mailto:tutor@python.org">tutor@python.org</a><br>

Subject: Re: [Tutor] Help please<br>

Message-ID: <l3onop$oin$<a href="mailto:1@ger.gmane.org">1@ger.gmane.org</a>><br>

Content-Type: text/plain; charset=ISO-8859-1; format=flowed<br>

<br>

On 16/10/13 19:49, Pinedo, Ruben A wrote:<br>

> I was given this code and I need to modify it so that it will:<br>

><br>

> #1. Error handling for the files to ensure reading only .txt file<br>

<br>

I'm not sure what is meant here since your code only ever opens<br>

'emma.txt', so it is presumably a text file... Or are you<br>

supposed to make the filename a user provided value maybe<br>

(using raw_input maybe?)<br>

<br>

> #2. Print a range of top words... ex: print top 10-20 words<br>

<br>

I assume 'top' here means the most common? Whoever is writing the<br>

specification for this problem needs to be a bit more specific<br>

in their definitions.<br>

<br>

If so you need to fix the bugs in process_line() and<br>

process_file(). I don;t know if these are deliberate bugs<br>

or somebody is just sloppy. But neither work as expected<br>

right now. (Hint: Consider the return values of each)<br>

<br>

Once you've done that you can figure out how to extract<br>

the required number of words from your (unsorted) dictionary.<br>

and put that in a reporting function and print the output.<br>

You might be able to use the two common words functions,<br>

although watch out because they don't do exactly what<br>

you want and one of them is basically broken...<br>

<br>

> #3. Print only the words with > 3 characters<br>

<br>

Modify the above to discard words of 3 letters or less.<br>

<br>

> #4. Modify the printing function to print top 1 or 2 or 3 ....<br>

<br>

I assume this means take a parameter that speciffies the<br>

number of words to print. Or it could be the length of<br>

word to ignore. Again the specification is woolly<br>

In either case its a small modification to your<br>

reporting function.<br>

<br>

> #5. How many unique words are there in the book of length 1, 2, 3 etc<br>

<br>

This is slicing the data slightly differently but<br>

again not that different to the earlier requirement.<br>

<br>

> I am fairly new to python and am completely lost, i looked in my book as<br>

> to how to do number one but i cannot figure out what to modify and/or<br>

> delete to add the print selection. This is the code:<br>

<br>

You need to modify the two brokemn functions and add a<br>

new reporting function. (Despite the reference to a<br>

printing function I'd suggest keeping the data extraction<br>

and printing seperate.<br>

<br>

> import string<br>

><br>

> def process_file(filename):<br>

>      hist = dict()<br>

>      fp = open(filename)<br>

>      for line in fp:<br>

>          process_line(line, hist)<br>

>      return hist<br>

><br>

> def process_line(line, hist):<br>

>      line = line.replace('-', ' ')<br>

>      for word in line.split():<br>

>          word = word.strip(string.punctuation + string.whitespace)<br>

>          word = word.lower()<br>

>          hist[word] = hist.get(word, 0) + 1<br>

><br>

> def common_words(hist):<br>

>      t = []<br>

>      for key, value in hist.items():<br>

>          t.append((value, key))<br>

>      t.sort(reverse=True)<br>

>      return t<br>

><br>

> def most_common_words(hist, num=100):<br>

>      t = common_words(hist)<br>

>      print 'The most common words are:'<br>

>      for freq, word in t[:num]:<br>

>          print freq, '\t', word<br>

> hist = process_file('emma.txt')<br>

> print 'Total num of Words:', sum(hist.values())<br>

> print 'Total num of Unique Words:', len(hist)<br>

> most_common_words(hist, 50)<br>

><br>

> Any help would be greatly appreciated because i am struggling in this<br>

> class. Thank you in advance<br>

<br>

hth<br>

--<br>

Alan G<br>

Author of the Learn to Program web site<br>

<a href="http://www.alan-g.me.uk/" target="_blank">http://www.alan-g.me.uk/</a><br>

<a href="http://www.flickr.com/photos/alangauldphotos" target="_blank">http://www.flickr.com/photos/alangauldphotos</a><br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Thu, 17 Oct 2013 15:37:49 +0200<br>

From: Peter Otten <__<a href="mailto:peter__@web.de">peter__@web.de</a>><br>

To: <a href="mailto:tutor@python.org">tutor@python.org</a><br>

Subject: Re: [Tutor] Help please<br>

Message-ID: <l3op59$8n6$<a href="mailto:1@ger.gmane.org">1@ger.gmane.org</a>><br>

Content-Type: text/plain; charset="ISO-8859-1"<br>

<br>

Alan Gauld wrote:<br>

<br>

[Ruben Pinedo]<br>

<br>

> def process_file(filename):<br>

>     hist = dict()<br>

>     fp = open(filename)<br>

>     for line in fp:<br>

>         process_line(line, hist)<br>

>     return hist<br>

><br>

> def process_line(line, hist):<br>

>     line = line.replace('-', ' ')<br>

><br>

>     for word in line.split():<br>

>         word = word.strip(string.punctuation + string.whitespace)<br>

>         word = word.lower()<br>

><br>

>         hist[word] = hist.get(word, 0) + 1<br>

<br>

[Alan Gauld]<br>

<br>

> If so you need to fix the bugs in process_line() and<br>

> process_file(). I don;t know if these are deliberate bugs<br>

> or somebody is just sloppy. But neither work as expected<br>

> right now. (Hint: Consider the return values of each)<br>

<br>

I fail to see the bug.<br>

<br>

process_line() mutates its `hist` argument, so there's no need to return<br>

something. Or did you mean something else that escapes me?<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Thu, 17 Oct 2013 16:17:27 +0200<br>

From: Dominik George <<a href="mailto:nik@naturalnet.de">nik@naturalnet.de</a>><br>

To: Todd Matsumoto <<a href="mailto:c.t.matsumoto@gmail.com">c.t.matsumoto@gmail.com</a>>,<a href="mailto:tutor@python.org">tutor@python.org</a><br>

Subject: Re: [Tutor] Help please<br>

Message-ID: <<a href="mailto:f310f0be-858d-48e2-ae88-5ad720518888@email.android.com">f310f0be-858d-48e2-ae88-5ad720518888@email.android.com</a>><br>

Content-Type: text/plain; charset=UTF-8<br>

<br>

-----BEGIN PGP SIGNED MESSAGE-----<br>

Hash: SHA512<br>

<br>

Todd Matsumoto <<a href="mailto:c.t.matsumoto@gmail.com">c.t.matsumoto@gmail.com</a>> schrieb:<br>

>> #1. Error handling for the files to ensure reading only .txt file<br>

>Look up exceptions.<br>

>Find out what the string method endswith() does.<br>

<br>

One should note that the OP probably meant files of the type text/plain rather than .txt files. File name extensions are a convenience to identify a file on first glance, but they tell absolutely nothing about the contents.<br>


<br>

So, look up MIME types as well ;)!<br>

<br>

- -nik<br>

-----BEGIN PGP SIGNATURE-----<br>

Version: APG v1.0.8-fdroid<br>

<br>

iQFNBAEBCgA3BQJSX/F3MBxEb21pbmlrIEdlb3JnZSAobW9iaWxlIGtleSkgPG5p<br>

a0BuYXR1cmFsbmV0LmRlPgAKCRAvLbGk0zMOJZxHB/9TGh6F1vRzgZmSMHt48arc<br>

jruTRfvOK9TZ5MWm6L2ZpxqKr3zBP7KSf1ZWSeXIovat9LetETkEwZ9bzHBuN8Ve<br>

m8YsOVX3zR6VWqGkRYYer3MbWo9DCONlJUKGMs/qjB180yxxhQ12Iw9WAHqam1Ti<br>

n0CCWsf4l5B3WBe+t2aTOlQNmo//6RuBK1LfCrnYX0XV2Catv1075am0KaTvbxfB<br>

rfHHnR4tdIYmZ8P/SkO3t+9JzJU9e+H2W90++K9EkMTBJxUhsa4AuZIEr8WqEfSe<br>

EheQMUp23tlMgKRp6UHiRJBljEsQJ0XFuYa+zj6hXCXoru/9ReHTRWcvJEpfXxEC<br>

=hJ0m<br>

-----END PGP SIGNATURE-----<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Thu, 17 Oct 2013 14:21:17 +0000<br>

From: "Kengesbayev, Askar" <<a href="mailto:askar.kengesbayev@etrade.com">askar.kengesbayev@etrade.com</a>><br>

To: "Pinedo, Ruben A" <<a href="mailto:rapinedo@miners.utep.edu">rapinedo@miners.utep.edu</a>>, "<a href="mailto:tutor@python.org">tutor@python.org</a>"<br>

        <<a href="mailto:tutor@python.org">tutor@python.org</a>><br>

Subject: Re: [Tutor] Help please<br>

Message-ID:<br>

        <<a href="mailto:6FAD14604B087B438F6FF64D9875A40C68F5ADCA@atl1ex10mbx4.corp.etradegrp.com">6FAD14604B087B438F6FF64D9875A40C68F5ADCA@atl1ex10mbx4.corp.etradegrp.com</a>><br>

<br>

Content-Type: text/plain; charset="us-ascii"<br>

<br>

Ruben,<br>

<br>

#1 you can try something like this<br>

  try:<br>

        with open('my_file.txt') as file:<br>

            pass<br>

    except IOError as e:<br>

        print "Unable to open file"  #Does not exist or you do not have read permission<br>

<br>

#2. I would try to use regular expression push words to array and then you can manipulate array. Not sure if it is efficient way but it should work.<br>

#3 . easy way would be to use regular expression. Re module.<br>

#4. Once you will have array in #2 you can sort it and print whatever top words you need.<br>

#5.  I am not sure the best way on this but you can play with array from #2.<br>

<br>

Thanks,<br>

Askar<br>

<br>

From: Pinedo, Ruben A [mailto:<a href="mailto:rapinedo@miners.utep.edu">rapinedo@miners.utep.edu</a>]<br>

Sent: Wednesday, October 16, 2013 2:49 PM<br>

To: <a href="mailto:tutor@python.org">tutor@python.org</a><br>

Subject: [Tutor] Help please<br>

<br>

I was given this code and I need to modify it so that it will:<br>

<br>

#1. Error handling for the files to ensure reading only .txt file<br>

#2. Print a range of top words... ex: print top 10-20 words<br>

#3. Print only the words with > 3 characters<br>

#4. Modify the printing function to print top 1 or 2 or 3 ....<br>

#5. How many unique words are there in the book of length 1, 2, 3 etc<br>

<br>

I am fairly new to python and am completely lost, i looked in my book as to how to do number one but i cannot figure out what to modify and/or delete to add the print selection. This is the code:<br>

<br>

<br>

import string<br>

<br>

def process_file(filename):<br>

    hist = dict()<br>

    fp = open(filename)<br>

    for line in fp:<br>

        process_line(line, hist)<br>

    return hist<br>

<br>

def process_line(line, hist):<br>

    line = line.replace('-', ' ')<br>

<br>

    for word in line.split():<br>

        word = word.strip(string.punctuation + string.whitespace)<br>

        word = word.lower()<br>

<br>

        hist[word] = hist.get(word, 0) + 1<br>

<br>

def common_words(hist):<br>

    t = []<br>

    for key, value in hist.items():<br>

        t.append((value, key))<br>

<br>

    t.sort(reverse=True)<br>

    return t<br>

<br>

def most_common_words(hist, num=100):<br>

    t = common_words(hist)<br>

    print 'The most common words are:'<br>

    for freq, word in t[:num]:<br>

        print freq, '\t', word<br>

<br>

hist = process_file('emma.txt')<br>

print 'Total num of Words:', sum(hist.values())<br>

print 'Total num of Unique Words:', len(hist)<br>

most_common_words(hist, 50)<br>

<br>

Any help would be greatly appreciated because i am struggling in this class. Thank you in advance<br>

<br>

Respectfully,<br>

<br>

Ruben Pinedo<br>

Computer Information Systems<br>

College of Business Administration<br>

University of Texas at El Paso<br>

-------------- next part --------------<br>

An HTML attachment was scrubbed...<br>

URL: <<a href="http://mail.python.org/pipermail/tutor/attachments/20131017/ea525e7b/attachment.html" target="_blank">http://mail.python.org/pipermail/tutor/attachments/20131017/ea525e7b/attachment.html</a>><br>

<br>

------------------------------<br>

<br>

Subject: Digest Footer<br>

<br>

_______________________________________________<br>

Tutor maillist  -  <a href="mailto:Tutor@python.org">Tutor@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/tutor" target="_blank">https://mail.python.org/mailman/listinfo/tutor</a><br>

<br>

<br>

------------------------------<br>

<br>

End of Tutor Digest, Vol 116, Issue 37<br>

**************************************<br>

</blockquote></div><br></div></div>