[Tutor] simple python scrip for collocation discovery

Emad Nawfal (عماد نوفل) emadnawfal at gmail.com
Mon Aug 18 02:47:21 CEST 2008


Hello Kent, Bob, Steve
Thank you all for your help and suggestions. I believe I 'll have a good
program soon.

2008/8/17 Kent Johnson <kent37 at tds.net>

> On 8/16/08, Emad Nawfal (عماد نوفل) <emadnawfal at gmail.com> wrote:
> > #! usr/bin/python
> > # Chi-squared collocation discovery
> > # Important definitions first. Let's suppose that we
> > # are trying to find whether "powerful computers" is a collocation
> > # N = The number of all bigrams in the corpus
> > # O11 = how many times the bigram "powerful computers" occurs in the
> corpus
> > # O22 = the number of bigrams not having either word in our collocation =
> N
> > - O11
> > #  O12 = The number of bigrams whose second word is our second word
> > # but whose first word is not "powerful"
>
> This is just the number of occurrances of the second word - O11, isn't it?
>
> > # O21 = The number of bigrams whose first word is our first word, but
> whose
> > second word
> > # is different from oour second word
>
> This is the number of occurrances of the first word - O11.
>
> So one way to solve this would be to make two dictionaries - one which
> counts bigrams and one which counts words. Then you would get the
> numbers with just three dictionary lookups.
>
> Kent
>



-- 
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington
http://emnawfal.googlepages.com
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20080817/b59e0cad/attachment.htm>


More information about the Tutor mailing list