<br><br><div class="gmail_quote">2009/12/21 Alan Gauld <span dir="ltr">&lt;<a href="mailto:alan.gauld@btinternet.com">alan.gauld@btinternet.com</a>&gt;</span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

&quot;Emad Nawfal (عمـ نوفل ـاد)&quot; &lt;<a href="mailto:emadnawfal@gmail.com" target="_blank">emadnawfal@gmail.com</a>&gt; wrote<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

def devocalize(word):<br>

    vowels = &quot;aiou&quot;<br>

</blockquote></div>

Should this include &#39;e&#39;?<div class="im"><br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

    return &quot;&quot;.join([letter for letter in word if letter not in vowels])<br>

</blockquote>

<br></div>

Its probably faster to use a regular expression replacement.<br>

Simply replace any vowel with the empty string.<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

vowelled = [&#39;him&#39;, &#39;ham&#39;, &#39;hum&#39;, &#39;fun&#39;, &#39;fan&#39;] # input, usually a large list<br>

of around 500,000 items<br>

vowelled = set(vowelled)<br>

</blockquote>

<br>

<br></div>

How do you process the file? Do you read it all into memory and<br>

then convert it to a set? Or do you process each line (one word<br>

per line?) and add the words to the set one by one? The latter<br>

is probably faster.<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

unvowelled = set([devocalize(word) for word in vowelled])<br>

for lex in unvowelled:<br>

    d = {}<br>

   d[lex] = [word for word in vowelled if devocalize(word) == lex]<br>

</blockquote>

<br></div>

I think you could remove the comprehensions and do all of<br>

this inside a single loop. One of those cases where a single<br>

explicit loop is faster than 2 comprehesions and a loop.<br>

<br>

But the only way to be sure is to test/profile to see whee the slowdown occurs.<br>

<br>

HTH,<br>

<br>

-- <br>

Alan Gauld<br>

Author of the Learn to Program web site<br>

<a href="http://www.alan-g.me.uk/" target="_blank">http://www.alan-g.me.uk/</a> <br>

<br>

_______________________________________________<br>

Tutor maillist  -  <a href="mailto:Tutor@python.org" target="_blank">Tutor@python.org</a><br>

To unsubscribe or change subscription options:<br>

<a href="http://mail.python.org/mailman/listinfo/tutor" target="_blank">http://mail.python.org/mailman/listinfo/tutor</a><br>

</blockquote></div><br><br clear="all">Thank you so much Bob and Alan<br>The script is meant to process Semitic languages, so I was just giving examples from English. I totally forgot the &#39;e&#39;.<br><br>Bob&#39;s script runs perfectly. <br>

<br>I&#39;m a non-programmer in the sense that I know how to do basic things, but not a professional. For example, my script does what I want, but when I needed to look into effeciency, I got stuck.<br><br>Thank you all for the help. <br>

-- <br>لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد الغزالي<br>&quot;No victim has ever been more repressed and alienated than the truth&quot;<br><br>Emad Soliman Nawfal<br>Indiana University, Bloomington<br>

--------------------------------------------------------<br>