[Tutor] finding words that contain some letters in their respective order

Emad Nawfal (عماد نوفل) emadnawfal at gmail.com
Sat Jan 24 00:02:13 CET 2009


Hello Tutors,
Arabic words are build around a root of 3 or 4 consonants with lots of
letters in between, and also prefixes and suffixes.
The root ktb (write) for example, could be found in words like:
ktab : book
mktob: letter, written
wktabhm: and their book
yktb: to write
lyktbha: in order for him to write it

I need to find all the word forms made up of a certain root in a corpus. My
idea, which is not completely right, but nonetheless works most of the
time,  is to find words that have the letters of the root in their
respective order. For example, the  words that contain  k followed by  t
then followed by b, no matter whether there is something in between. I came
up with following which works fine. For learning purposes, please let me
know whether this is a good way, and how else I can achieve that.
I appreciate your help, as I always did.



def getRoot(root, word):
    result = ""

    for letter in word:
        if letter not in root:
            continue
        result +=letter
    return result

# main

infile = open("myCorpus.txt").read().split()
query = "ktb"
outcome = set([word for word in infile if query == getRoot(query, word)])
for word in outcome:

    print(word)
-- 
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington
http://emnawfal.googlepages.com
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090123/4c4844c3/attachment.htm>


More information about the Tutor mailing list