[Tutor] finding words that contain some letters in their respective order
Andre Engels
andreengels at gmail.com
Sat Jan 24 00:55:51 CET 2009
On Sat, Jan 24, 2009 at 12:02 AM, Emad Nawfal (عماد نوفل)
<emadnawfal at gmail.com> wrote:
> Hello Tutors,
> Arabic words are build around a root of 3 or 4 consonants with lots of
> letters in between, and also prefixes and suffixes.
> The root ktb (write) for example, could be found in words like:
> ktab : book
> mktob: letter, written
> wktabhm: and their book
> yktb: to write
> lyktbha: in order for him to write it
>
> I need to find all the word forms made up of a certain root in a corpus. My
> idea, which is not completely right, but nonetheless works most of the
> time, is to find words that have the letters of the root in their
> respective order. For example, the words that contain k followed by t
> then followed by b, no matter whether there is something in between. I came
> up with following which works fine. For learning purposes, please let me
> know whether this is a good way, and how else I can achieve that.
> I appreciate your help, as I always did.
>
>
>
> def getRoot(root, word):
> result = ""
>
> for letter in word:
> if letter not in root:
> continue
> result +=letter
> return result
>
> # main
>
> infile = open("myCorpus.txt").read().split()
> query = "ktb"
> outcome = set([word for word in infile if query == getRoot(query, word)])
> for word in outcome:
>
> print(word)
This gets into problems if the letters of the root occur somewhere
else in the word as well. For example, if there would be a word bktab,
then getRoot("ktb","bktab") would be "bktb", not "ktb".
I would use the find method of the string class here - if A and B are
strings, and n is a number, then
A.find(B,n)
is the first location, starting at n, where B is a substring of A, or
-1 if there isn't any.
Using this, I get:
def hasRoot(word, root): # This order I find more logical
loc = 0
for letter in root:
loc = word.find(letter)
if loc == -1:
return false
return true
# main
infile = open("myCorpus.txt").read().split()
query = "ktb"
outcome = [word for word in infile if hasRoot(word,query)]
for word in outcome:
print(word)
--
André Engels, andreengels at gmail.com
More information about the Tutor
mailing list