<html>

  <head>

    <meta content="text/html; charset=ISO-8859-15"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 15-Aug-2012 02:19, Steven D'Aprano

      wrote:<br>

    </div>

    <blockquote

      cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"

      type="cite">

      <pre wrap="">On Tue, 14 Aug 2012 21:40:10 +0200, Virgil Stokes wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">You might find the following useful:

def testFunc(startingList):

     xOnlyList = []; j = -1

     for xl in startingList:

         if (xl[0] == 'x'):

</pre>

      </blockquote>

      <pre wrap="">

That's going to fail in the starting list contains an empty string. Use 

xl.startswith('x') instead.</pre>

    </blockquote>

    Yes, but this was by design (tacitly assumed that <tt>startingList</tt>

    was both a list and non-empty).<br>

    <blockquote

      cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"

      type="cite">

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">             xOnlyList.append(xl)

         else:

             j += 1

             startingList[j] = xl

</pre>

      </blockquote>

      <pre wrap="">

Very cunning, but I have to say that your algorithm fails the "is this 

obviously correct without needing to study it?" test. Sometimes that is 

unavoidable, but for something like this, there are simpler ways to solve 

the same problem.</pre>

    </blockquote>

    Sorry, but I do not sure what you mean here.<br>

    <blockquote

      cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"

      type="cite">

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">     if j == -1:

         startingList = []

     else:

         del startingList[j:-1]

     return(xOnlyList)

</pre>

      </blockquote>

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">And here is another version using list comprehension that I prefer

</pre>

      </blockquote>

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">def testFunc2(startingList):

     return([x for x in startingList if x[0] == 'x'], [x for x in

startingList if x[0] != 'x'])

</pre>

      </blockquote>

      <pre wrap="">

This walks over the starting list twice, doing essentially the same thing 

both times. It also fails to meet the stated requirement that 

startingList is modified in place, by returning a new list instead. </pre>

    </blockquote>

    This can meet the requirement that <tt>startingList</tt> is

    modified in place via the call to this function (see the attached

    code). <br>

    <blockquote

      cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"

      type="cite">

      <pre wrap="">

Here's an example of what I mean:

py> mylist = mylist2 = ['a', 'x', 'b', 'xx', 'cx']  # two names for one 

list

py> result, mylist = testFunc2(mylist)

py> mylist

['a', 'b', 'cx']

py> mylist2  # should be same as mylist

['a', 'x', 'b', 'xx', 'cx']</pre>

    </blockquote>

    Yes, I had a typo in my original posting --- sorry about that!<br>

    <blockquote

      cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"

      type="cite">

      <pre wrap="">

Here is the obvious algorithm for extracting and removing words starting 

with 'x'. It walks the starting list only once, and modifies it in place. 

The only trick needed is list slice assignment at the end.

def extract_x_words(words):

    words_with_x = []

    words_without_x = []

    for word in words:

        if word.startswith('x'):

            words_with_x.append(word)

        else:

            words_without_x.append(word)

    words[:] = words_without_x  # slice assignment

    return words_with_x

</pre>

    </blockquote>

    Suppose <tt>words</tt> was not a list --- you have tacitly assumed

    that <tt>words</tt> is a list.<br>

    <blockquote

      cite="mid:502aeb2b$0$29978$c3e8da3$5496439d@news.astraweb.com"

      type="cite">

      <pre wrap="">

The only downside of this is that if the list of words is so enormous 

that you can fit it in memory *once* but not *twice*, this may fail. But 

the same applies to the list comprehension solution.</pre>

    </blockquote>

    But, this is not the only downside if speed is important --- it is

    slower than the list comprehension method (see results that

    follows).<br>

    <br>

    Here is a summary of three algorithms (algorithm-1, algorithm-2,

    algorithm-2A) that I tested (see attached code). Note, algorithm-2A

    was obtained by removing the slice assignment in the above code and

    modifying the return as follows <br>

    <pre wrap="">def extract_x_words(words):

    words_with_x = []

    words_without_x = []

    for word in words:

        if word.startswith('x'):

            words_with_x.append(word)

        else:

            words_without_x.append(word)

    #words[:] = words_without_x  # slice assignment

    return words_with_x, words_without_x

<font face="Helvetica, Arial, sans-serif">

</font></pre>

    Of course, one needs to modify the call for "in-place" update of <tt>startingList</tt>

    as follows:<br>

    <br>

       xOnlyList,startingList = extract_x_words(startingList)  <br>

    <br>

    Here is a summary of my timing results obtained for 3 different

    algorithms for lists with 100,000 strings of length 4 in each list:<br>

    <br>

    <table border="1" cellpadding="2" cellspacing="2" height="162"

      width="490">

      <tbody>

        <tr>

          <td valign="top">Method<br>

          </td>

          <td valign="top">average (sd) time in seconds<br>

          </td>

        </tr>

        <tr>

          <td valign="top">algorithm-1 (list comprehension)<br>

          </td>

          <td valign="top">0.11630 (0.0014)<br>

          </td>

        </tr>

        <tr>

          <td valign="top">algorithm-2 (S. D'Aprano)<br>

          </td>

          <td valign="top">0.17594 (0.0014)<br>

          </td>

        </tr>

        <tr>

          <td valign="top">algorithm-2A (modified S. D'Aprano)<br>

          </td>

          <td valign="top">0.18217 (0.0023)<br>

          </td>

        </tr>

      </tbody>

    </table>

    <br>

    These values  were obtained from 100 independent runs (MC

    simulations) on lists that contain 100,000 strings. Approximately

    50% of these strings contained a leading 'x'. Note, that the results

    show that algorithm-2 (suggested by S. D'Aprano) is approximately

    51% slower than algorithm-1 (list comprehensions) and algorithm-2A

    (simple modification of algorithm-2) is approximately 57% slower

    than algorithm-1. Why is algorithm-2A slower than algorithm-2?<br>

    <br>

    I would be interested in seeing code that is faster than algorithm-1

    --- any suggestions are welcomed.  And of course, if there are any

    errors in my attached code please inform me of them and I will try

    to correct them as soon as possible. Note, some of the code is

    actually irrelevant for the original "Strange behavior" post.<br>

    <br>

    Have a good day!<br>

    <br>

  </body>

</html>