<br><br><div class="gmail_quote">2009/12/21 Alan Gauld <span dir="ltr"><<a href="mailto:alan.gauld@btinternet.com">alan.gauld@btinternet.com</a>></span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
"Emad Nawfal (عمـ نوفل ـاد)" <<a href="mailto:emadnawfal@gmail.com" target="_blank">emadnawfal@gmail.com</a>> wrote<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
def devocalize(word):<br>
vowels = "aiou"<br>
</blockquote></div>
Should this include 'e'?<div class="im"><br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
return "".join([letter for letter in word if letter not in vowels])<br>
</blockquote>
<br></div>
Its probably faster to use a regular expression replacement.<br>
Simply replace any vowel with the empty string.<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
vowelled = ['him', 'ham', 'hum', 'fun', 'fan'] # input, usually a large list<br>
of around 500,000 items<br>
vowelled = set(vowelled)<br>
</blockquote>
<br>
<br></div>
How do you process the file? Do you read it all into memory and<br>
then convert it to a set? Or do you process each line (one word<br>
per line?) and add the words to the set one by one? The latter<br>
is probably faster.<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
unvowelled = set([devocalize(word) for word in vowelled])<br>
for lex in unvowelled:<br>
d = {}<br>
d[lex] = [word for word in vowelled if devocalize(word) == lex]<br>
</blockquote>
<br></div>
I think you could remove the comprehensions and do all of<br>
this inside a single loop. One of those cases where a single<br>
explicit loop is faster than 2 comprehesions and a loop.<br>
<br>
But the only way to be sure is to test/profile to see whee the slowdown occurs.<br>
<br>
HTH,<br>
<br>
-- <br>
Alan Gauld<br>
Author of the Learn to Program web site<br>
<a href="http://www.alan-g.me.uk/" target="_blank">http://www.alan-g.me.uk/</a> <br>
<br>
_______________________________________________<br>
Tutor maillist - <a href="mailto:Tutor@python.org" target="_blank">Tutor@python.org</a><br>
To unsubscribe or change subscription options:<br>
<a href="http://mail.python.org/mailman/listinfo/tutor" target="_blank">http://mail.python.org/mailman/listinfo/tutor</a><br>
</blockquote></div><br><br clear="all">Thank you so much Bob and Alan<br>The script is meant to process Semitic languages, so I was just giving examples from English. I totally forgot the 'e'.<br><br>Bob's script runs perfectly. <br>
<br>I'm a non-programmer in the sense that I know how to do basic things, but not a professional. For example, my script does what I want, but when I needed to look into effeciency, I got stuck.<br><br>Thank you all for the help. <br>
-- <br>لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد الغزالي<br>"No victim has ever been more repressed and alienated than the truth"<br><br>Emad Soliman Nawfal<br>Indiana University, Bloomington<br>
--------------------------------------------------------<br>