[Tutor] NLTK

Sat Aug 29 20:08:48 CEST 2009

Hi,
    Yes! It works! I guess I am asking how did you know to use wordlists.words('IM50re.txt')? Is this a specific command, as I believe it was not in the book?
        Thanks.

________________________________
From: Kent Johnson <kent37 at tds.net>
To: Ishan Puri <ballerz4ishi at sbcglobal.net>
Cc: *tutor python <tutor at python.org>
Sent: Saturday, August 29, 2009 3:34:09 AM
Subject: Re: [Tutor] NLTK

On Fri, Aug 28, 2009 at 10:16 PM, Ishan Puri<ballerz4ishi at sbcglobal.net> wrote:

>>>> emma = nltk.corpus.gutenberg.words('austen-emma.txt')
>>>> len(emma)
> 192427
>
> So this is the number of words in a particular 'austen-emma.txt'. How would
> I do this
> with my IM50re.txt? It
>  seems the code "nltk.corpus.gutenberg.words" is specific to some Gutenberg
> corpus installed with NLTK.
> Like this many examples are given for different analyses that can be done
> with NLTK. However they all seem to be specific
> to one of the texts above or another one already installed with NLTK. I am
> not sure how to apply these examples to my own corpus.

This is pretty much the next line in the "Loading your own Corpus"
example. After
>>> from nltk.corpus import PlaintextCorpusReader
>>> corpus_root='C:\Users\Ishan\Documents'
>>> wordlists = PlaintextCorpusReader(corpus_root, 'IM50re.txt')
>>> wordlists.fileids()
['IM50re.txt']

you should be able to do
my_words = wordlists.words('IM50re.txt')
len(my_words)

Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090829/ac4abdcc/attachment.htm>