[Tutor] finding words that contain some letters in their respective order

Sat Jan 24 02:04:22 CET 2009

2009/1/24 Emad Nawfal (عماد نوفل) <emadnawfal at gmail.com>:
>
>
> 2009/1/23 Emad Nawfal (عماد نوفل) <emadnawfal at gmail.com>
>>
>>
>> On Fri, Jan 23, 2009 at 6:57 PM, Andre Engels <andreengels at gmail.com>
>> wrote:
>>>
>>> I made an error in my program... Sorry, it should be:
>>>
>>> def hasRoot(word, root): # This order I find more logical
>>>   loc = 0
>>>   for letter in root:
>>>        loc = word.find(letter,loc) # I missed the ,loc here...
>>>        if loc == -1:
>>>            return false
>>>   return true
>>>
>>> # main
>>>
>>> infile = open("myCorpus.txt").read().split()
>>> query = "ktb"
>>> outcome = [word for word in infile if hasRoot(word,query)]
>>>
>>>
>>> --
>>> André Engels, andreengels at gmail.com
>>
>>
>> Thank you so much.  bktab is a legal Arabic word. I also found the word
>> bmktbha in the corpus. I would have missed that.
>> Thank you again.
>> --
>> لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
>> الغزالي
>> "No victim has ever been more repressed and alienated than the truth"
>>
>> Emad Soliman Nawfal
>> Indiana University, Bloomington
>> http://emnawfal.googlepages.com
>> --------------------------------------------------------
>
> Hi again,
> If I want to use a regular expression to find the root ktb in all its
> derivations, would this be a good way around it:
>
>>>> x = re.compile("[a-z]*k[a-z]*t[a-z]*b[a-z]*")
>>>> text = "hw syktbha ghda wlktab ktb"
>>>> re.findall(x, text)
> ['syktbha', 'wlktab', 'ktb']
>>>>

Yes, that looks correct - and a regular expression solution also is
easier to adapt - for example, the little that I know of Arab makes me
believe that _between_ the letters of a root there may only be vowels.
If that's correct, the RE can be changed to

"[a-z]*k[aeiou]*t[aeiou]*b[a-z]*"

-- 
André Engels, andreengels at gmail.com