<html>
<head>
<meta content="text/html; charset=ISO-8859-15"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 15-Aug-2012 02:19, Steven D'Aprano
wrote:<br>
</div>
<blockquote
cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"
type="cite">
<pre wrap="">On Tue, 14 Aug 2012 21:40:10 +0200, Virgil Stokes wrote:
</pre>
<blockquote type="cite">
<pre wrap="">You might find the following useful:
def testFunc(startingList):
xOnlyList = []; j = -1
for xl in startingList:
if (xl[0] == 'x'):
</pre>
</blockquote>
<pre wrap="">
That's going to fail in the starting list contains an empty string. Use
xl.startswith('x') instead.</pre>
</blockquote>
Yes, but this was by design (tacitly assumed that <tt>startingList</tt>
was both a list and non-empty).<br>
<blockquote
cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"
type="cite">
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap=""> xOnlyList.append(xl)
else:
j += 1
startingList[j] = xl
</pre>
</blockquote>
<pre wrap="">
Very cunning, but I have to say that your algorithm fails the "is this
obviously correct without needing to study it?" test. Sometimes that is
unavoidable, but for something like this, there are simpler ways to solve
the same problem.</pre>
</blockquote>
Sorry, but I do not sure what you mean here.<br>
<blockquote
cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"
type="cite">
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap=""> if j == -1:
startingList = []
else:
del startingList[j:-1]
return(xOnlyList)
</pre>
</blockquote>
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap="">And here is another version using list comprehension that I prefer
</pre>
</blockquote>
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap="">def testFunc2(startingList):
return([x for x in startingList if x[0] == 'x'], [x for x in
startingList if x[0] != 'x'])
</pre>
</blockquote>
<pre wrap="">
This walks over the starting list twice, doing essentially the same thing
both times. It also fails to meet the stated requirement that
startingList is modified in place, by returning a new list instead. </pre>
</blockquote>
This can meet the requirement that <tt>startingList</tt> is
modified in place via the call to this function (see the attached
code). <br>
<blockquote
cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"
type="cite">
<pre wrap="">
Here's an example of what I mean:
py> mylist = mylist2 = ['a', 'x', 'b', 'xx', 'cx'] # two names for one
list
py> result, mylist = testFunc2(mylist)
py> mylist
['a', 'b', 'cx']
py> mylist2 # should be same as mylist
['a', 'x', 'b', 'xx', 'cx']</pre>
</blockquote>
Yes, I had a typo in my original posting --- sorry about that!<br>
<blockquote
cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"
type="cite">
<pre wrap="">
Here is the obvious algorithm for extracting and removing words starting
with 'x'. It walks the starting list only once, and modifies it in place.
The only trick needed is list slice assignment at the end.
def extract_x_words(words):
words_with_x = []
words_without_x = []
for word in words:
if word.startswith('x'):
words_with_x.append(word)
else:
words_without_x.append(word)
words[:] = words_without_x # slice assignment
return words_with_x
</pre>
</blockquote>
Suppose <tt>words</tt> was not a list --- you have tacitly assumed
that <tt>words</tt> is a list.<br>
<blockquote
cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"
type="cite">
<pre wrap="">
The only downside of this is that if the list of words is so enormous
that you can fit it in memory *once* but not *twice*, this may fail. But
the same applies to the list comprehension solution.</pre>
</blockquote>
But, this is not the only downside if speed is important --- it is
slower than the list comprehension method (see results that
follows).<br>
<br>
Here is a summary of three algorithms (algorithm-1, algorithm-2,
algorithm-2A) that I tested (see attached code). Note, algorithm-2A
was obtained by removing the slice assignment in the above code and
modifying the return as follows <br>
<pre wrap="">def extract_x_words(words):
words_with_x = []
words_without_x = []
for word in words:
if word.startswith('x'):
words_with_x.append(word)
else:
words_without_x.append(word)
#words[:] = words_without_x # slice assignment
return words_with_x, words_without_x
<font face="Helvetica, Arial, sans-serif">
</font></pre>
Of course, one needs to modify the call for "in-place" update of <tt>startingList</tt>
as follows:<br>
<br>
xOnlyList,startingList = extract_x_words(startingList) <br>
<br>
Here is a summary of my timing results obtained for 3 different
algorithms for lists with 100,000 strings of length 4 in each list:<br>
<br>
<table border="1" cellpadding="2" cellspacing="2" height="162"
width="490">
<tbody>
<tr>
<td valign="top">Method<br>
</td>
<td valign="top">average (sd) time in seconds<br>
</td>
</tr>
<tr>
<td valign="top">algorithm-1 (list comprehension)<br>
</td>
<td valign="top">0.11630 (0.0014)<br>
</td>
</tr>
<tr>
<td valign="top">algorithm-2 (S. D'Aprano)<br>
</td>
<td valign="top">0.17594 (0.0014)<br>
</td>
</tr>
<tr>
<td valign="top">algorithm-2A (modified S. D'Aprano)<br>
</td>
<td valign="top">0.18217 (0.0023)<br>
</td>
</tr>
</tbody>
</table>
<br>
These values were obtained from 100 independent runs (MC
simulations) on lists that contain 100,000 strings. Approximately
50% of these strings contained a leading 'x'. Note, that the results
show that algorithm-2 (suggested by S. D'Aprano) is approximately
51% slower than algorithm-1 (list comprehensions) and algorithm-2A
(simple modification of algorithm-2) is approximately 57% slower
than algorithm-1. Why is algorithm-2A slower than algorithm-2?<br>
<br>
I would be interested in seeing code that is faster than algorithm-1
--- any suggestions are welcomed. And of course, if there are any
errors in my attached code please inform me of them and I will try
to correct them as soon as possible. Note, some of the code is
actually irrelevant for the original "Strange behavior" post.<br>
<br>
Have a good day!<br>
<br>
</body>
</html>