[Tutor] finding words that contain some letters in their respective order

Andre Engels andreengels at gmail.com
Sat Jan 24 00:55:51 CET 2009


On Sat, Jan 24, 2009 at 12:02 AM, Emad Nawfal (عماد نوفل)
<emadnawfal at gmail.com> wrote:
> Hello Tutors,
> Arabic words are build around a root of 3 or 4 consonants with lots of
> letters in between, and also prefixes and suffixes.
> The root ktb (write) for example, could be found in words like:
> ktab : book
> mktob: letter, written
> wktabhm: and their book
> yktb: to write
> lyktbha: in order for him to write it
>
> I need to find all the word forms made up of a certain root in a corpus. My
> idea, which is not completely right, but nonetheless works most of the
> time,  is to find words that have the letters of the root in their
> respective order. For example, the  words that contain  k followed by  t
> then followed by b, no matter whether there is something in between. I came
> up with following which works fine. For learning purposes, please let me
> know whether this is a good way, and how else I can achieve that.
> I appreciate your help, as I always did.
>
>
>
> def getRoot(root, word):
>     result = ""
>
>     for letter in word:
>         if letter not in root:
>             continue
>         result +=letter
>     return result
>
> # main
>
> infile = open("myCorpus.txt").read().split()
> query = "ktb"
> outcome = set([word for word in infile if query == getRoot(query, word)])
> for word in outcome:
>
>     print(word)

This gets into problems if the letters of the root occur somewhere
else in the word as well. For example, if there would be a word bktab,
then getRoot("ktb","bktab") would be "bktb", not "ktb".

I would use the find method of the string class here - if A and B are
strings, and n is a number, then

A.find(B,n)

is the first location, starting at n, where B is a substring of A, or
-1 if there isn't any.

Using this, I get:

def hasRoot(word, root): # This order I find more logical
    loc = 0
    for letter in root:
         loc = word.find(letter)
         if loc == -1:
             return false
    return true

# main

infile = open("myCorpus.txt").read().split()
query = "ktb"
outcome = [word for word in infile if hasRoot(word,query)]

for word in outcome:
    print(word)


-- 
André Engels, andreengels at gmail.com


More information about the Tutor mailing list