Strange behavior

Virgil Stokes vs at it.uu.se
Thu Aug 16 13:18:59 CEST 2012

```On 15-Aug-2012 02:19, Steven D'Aprano wrote:
> On Tue, 14 Aug 2012 21:40:10 +0200, Virgil Stokes wrote:
>
>> You might find the following useful:
>>
>> def testFunc(startingList):
>>       xOnlyList = []; j = -1
>>       for xl in startingList:
>>           if (xl[0] == 'x'):
> That's going to fail in the starting list contains an empty string. Use
Yes, but this was by design (tacitly assumed that startingList was both a list
and non-empty).
>
>
>>               xOnlyList.append(xl)
>>           else:
>>               j += 1
>>               startingList[j] = xl
> Very cunning, but I have to say that your algorithm fails the "is this
> obviously correct without needing to study it?" test. Sometimes that is
> unavoidable, but for something like this, there are simpler ways to solve
> the same problem.
Sorry, but I do not sure what you mean here.
>
>
>>       if j == -1:
>>           startingList = []
>>       else:
>>           del startingList[j:-1]
>>       return(xOnlyList)
>
>> And here is another version using list comprehension that I prefer
>> def testFunc2(startingList):
>>       return([x for x in startingList if x[0] == 'x'], [x for x in
>> startingList if x[0] != 'x'])
> This walks over the starting list twice, doing essentially the same thing
> both times. It also fails to meet the stated requirement that
> startingList is modified in place, by returning a new list instead.
This can meet the requirement that startingList is modified in place via the
call to this function (see the attached code).
> Here's an example of what I mean:
>
> py> mylist = mylist2 = ['a', 'x', 'b', 'xx', 'cx']  # two names for one
> list
> py> result, mylist = testFunc2(mylist)
> py> mylist
> ['a', 'b', 'cx']
> py> mylist2  # should be same as mylist
> ['a', 'x', 'b', 'xx', 'cx']
Yes, I had a typo in my original posting --- sorry about that!
>
> Here is the obvious algorithm for extracting and removing words starting
> with 'x'. It walks the starting list only once, and modifies it in place.
> The only trick needed is list slice assignment at the end.
>
> def extract_x_words(words):
>      words_with_x = []
>      words_without_x = []
>      for word in words:
>          if word.startswith('x'):
>              words_with_x.append(word)
>          else:
>              words_without_x.append(word)
>      words[:] = words_without_x  # slice assignment
>      return words_with_x
Suppose words was not a list --- you have tacitly assumed that words is a list.
>
> The only downside of this is that if the list of words is so enormous
> that you can fit it in memory *once* but not *twice*, this may fail. But
> the same applies to the list comprehension solution.
But, this is not the only downside if speed is important --- it is slower than
the list comprehension method (see results that follows).

Here is a summary of three algorithms (algorithm-1, algorithm-2, algorithm-2A)
that I tested (see attached code). Note, algorithm-2A was obtained by removing
the slice assignment in the above code and modifying the return as follows

def extract_x_words(words):
words_with_x = []
words_without_x = []
for word in words:
if word.startswith('x'):
words_with_x.append(word)
else:
words_without_x.append(word)
#words[:] = words_without_x  # slice assignment
return words_with_x, words_without_x

Of course, one needs to modify the call for "in-place" update of startingList as
follows:

xOnlyList,startingList = extract_x_words(startingList)

Here is a summary of my timing results obtained for 3 different algorithms for
lists with 100,000 strings of length 4 in each list:

Method
average (sd) time in seconds
algorithm-1 (list comprehension)
0.11630 (0.0014)
algorithm-2 (S. D'Aprano)
0.17594 (0.0014)
algorithm-2A (modified S. D'Aprano)
0.18217 (0.0023)

These values  were obtained from 100 independent runs (MC simulations) on lists
that contain 100,000 strings. Approximately 50% of these strings contained a
leading 'x'. Note, that the results show that algorithm-2 (suggested by S.
D'Aprano) is approximately 51% slower than algorithm-1 (list comprehensions) and
algorithm-2A (simple modification of algorithm-2) is approximately 57% slower
than algorithm-1. Why is algorithm-2A slower than algorithm-2?

I would be interested in seeing code that is faster than algorithm-1 --- any
suggestions are welcomed.  And of course, if there are any errors in my attached
code please inform me of them and I will try to correct them as soon as
possible. Note, some of the code is actually irrelevant for the original
"Strange behavior" post.

Have a good day!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120816/9a1f6a64/attachment.html>
-------------- next part --------------
'''
Purpose: Time three different algorithms for the same task

Author: V. Stokes (vs at it.uu.se, 2012-08-16)
Refs:
python-list at python.org list
* Strange behavior, 14-Aug-2012 17:38, light1quark at gmail.com
* Re: Strange behavior, 14-Aug-2012 21:40, Stokes, Virgil
* Re: Strange behavior, 15-Aug-2012 02:19, Steven D'Aprano

Note:
1. The mean and standard deviation over the runs (MC simulations)
are estimated using recursive equations.
2. A seed (syd) is used with the RNG for repeatability. Each run is
started with a new seed to force the generation of independent
random sequences.
3. Warning! No checks are made on the parameters passed to
the functions (this was by design).
4. No effort has been made to make this code elegant. My focus was
to make the code clear and easy to understand.
5. This was executed on a Windows Vista 32-bit platform with Python 2.6.6
Processor: Intel(R) core(TM)2 Duo CPU E8500 at 3.16GHz 3.17GHz
6. The estimated time to completion is displayed after each run.

'''
import random as random
import math as math
from time import clock # clock gives good resolution on MS Windows

def testFunc1(startingList):
'''
Algorithm-1
Note:
One should check for an empty startingList before
calling testFunc1 -- If this possibility exists!
'''
return([x for x in startingList if x[0] == 'x'],
[x for x in startingList if x[0] != 'x'])

def testFunc2(words):
'''
Algorithm-2
'''
words_with_x = []
words_without_x = []
for word in words:
if word.startswith('x'):
words_with_x.append(word)
else:
words_without_x.append(word)
words[:] = words_without_x  # slice assignment
return words_with_x

def testFunc2A(words):
'''
Algorithm-2A
'''
words_with_x = []
words_without_x = []
for word in words:
if word.startswith('x'):
words_with_x.append(word)
else:
words_without_x.append(word)
#words[:] = words_without_x  # slice assignment
return words_with_x, words_without_x

'''
Purpose: Generate a list of NStrng elements with each element a string
of length NChar and constrained such that approx. 50% of the
strings will begin with the character (symbol) leadChr
Inputs:
NChar -- number of characters in each element
NStrng -- number of elements in list (strgList)
Alph -- list of characters to be used (must contain the leadChr)
Otputs:
strgList -- list with NString strings of NChar characters (from Alph) in each
'''
NSym = len(Alph)
strgList = []
for i in range(NStrng):
if random.random() < 0.5:
else:
while True:
indx = random.randint(0,NSym-1)
char = Alph[indx]

for j in range(1,NChar):
indx = random.randint(0,NSym-1)
char = char + Alph[indx]

strgList.append(char)

return strgList

#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Alph = ['a','b','c','d','e','f','g','h','i','j','k','l','m',
'n','o','p','q','r','s','t','u','v','w','x','y','z']

NChar = 4         # 4 chars per string
NStrng = 100000   # 100,000 strings in list
NRuns = 100       # 100 random lists (number of MC simulations)

# For quick testing
#NChar = 4        # 4 chars per string
#NStrng = 10000   # 10,000 strings in list
#NRuns = 20       # 20 random lists

syd = 1071       # initial seed for RNG (for repeatability)

printRunningMean = False

mu1,mu2,mu3,mug = 0.0,0.0,0.0,0.0
x1,x2,x3 = 0.0,0.0,0.0
print '%5d runs with lists containing %7d elements' %(NRuns,NStrng)
for irun in range(NRuns):
tic = clock()
k = irun + 1
print ' run - %2d' %k
syd += 19    # new seed for each run
random.seed(syd)
#print 'Genrating list of %7d strings...' %NStrng
strgList = genStrList(NChar,NStrng,Alph,'x')

t0 = clock()
xOnlyList,strgList = testFunc1(strgList) # algorithm-1
t1 = clock()

#print 'Length of xOnlyList = %7d' %len(xOnlyList)
#print ' Length of strgList = %7d' %len(strgList)
etime = t1 - t0
delta = etime - mu1
mu1 += delta/k
if printRunningMean:
print '  algorithm-1 -- %9.5f' %mu1
x1 += delta*(etime-mu1)

#print
random.seed(syd)
#print 'Genrating list of %7d strings...' %NStrng
strgList = genStrList(NChar,NStrng,Alph,'x')

t0 = clock()
xOnlyList = testFunc2(strgList)          # algorithm-2
t1 = clock()

#print 'Length of xOnlyList = %7d' %len(xOnlyList)
#print ' Length of strgList = %7d' %len(strgList)
etime = t1 - t0
delta = etime - mu2
mu2 += delta/k
if printRunningMean:
print '  algorithm-2 -- %9.5f' %mu2
x2 += delta*(etime-mu2)

random.seed(syd)
#print 'Genrating list of %7d strings...' %NStrng
strgList = genStrList(NChar,NStrng,Alph,'x')

t0 = clock()
xOnlyList,strgList = testFunc2A(strgList)          # algorithm-2A
t1 = clock()

#print 'Length of xOnlyList = %7d' %len(xOnlyList)
#print ' Length of strgList = %7d' %len(strgList)
etime = t1 - t0
delta = etime - mu3
mu3 += delta/k
if printRunningMean:
print '  algorithm-2A -- %9.5f' %mu3
x3 += delta*(etime-mu3)

toc = clock()
loopTime = toc - tic
delta = loopTime - mug
mug += delta/k
runsR = NRuns - k
estToComplete = mug*runsR
print '   ETTC = %8.2f seconds' %estToComplete

print
print '        Timing in seconds'
print ' ************************************'
print '    method     |     avg time (sd)'
print ' ------------------------------------'
print ' algorithm-1   | %9.5f (%8.6f)' %(mu1,math.sqrt(x1/NRuns))
print ' algorithm-2   | %9.5f (%8.6f)' %(mu2,math.sqrt(x2/NRuns))
print ' algorithm-2A  | %9.5f (%8.6f)' %(mu3,math.sqrt(x3/NRuns))

print
print "Th-tha-tha-that's all folks!"
```