[Python-ideas] proposed methods: list.replace / list.indices

Sun Dec 30 23:59:53 CET 2012

On 12/30/2012 9:10 AM, Ned Batchelder wrote:
> On 12/30/2012 5:46 AM, Terry Reedy wrote:
>> On 12/29/2012 10:25 PM, David Kreuter wrote:
>>
>>> I think it would be nice to have a method in 'list' to replace certain
>>> elements by others in-place. Like this:
>>>
>>>      l = [x, a, y, a]
>>>      l.replace(a, b)
>>>      assert l == [x, b, y, b]
>>>
>>> The alternatives are longer than they should be, imo. For example:
>>>
>>>      for i, n in enumerate(l):

Note that enumerate is a generic function of iterables, not a specific 
list method.

>>>          if n == a:
>>>              l[i] = b
>>
>> I dont see anything wrong with this. It is how I would do it in
>> python. Wrap it in a function if you want. Or write it on two line ;-).

My deeper objection is that 'replace_all_in_place' is a generic mutable 
collection function, not a specific list or even mutable sequence 
function. Python 1 was stronger list oriented. Python 3 is mostly 
iterable oriented, with remnants of the Python 1 heritage.

> I wonder at the underlying philosophy of things being accepted or
> rejected in this way.  For example, here's a thought experiment: if
> list.count() and list.index() didn't exist yet, would we accept them as
> additions to the list methods?

I personally would have deleted list.find in 3.0. Count and index are 
not list methods but rather sequence methods, part of the sequence ABC. 
Tuples got them, as their only two public methods, in 3.0 to conform. 
This ties in to Nick's comment. (Actually counting a particular item in 
a collection is not specific to sequencess, but having multiple items to 
count tends to be.) It would be possible for count and index to be 
functions instead. But their definition as methods goes back to Python 1.

Also note that .index has a start parameter, making it useful to get all 
indexes. See the code below.

 > By Terry's reasoning, there's no need
> to, since I can implement those operations in a few lines of Python.

We constantly get proposals to add new functions and methods that are 
easily written in a few lines. Everyone thinks their proposal is useful 
because it is useful in their work. If we accepted all such proposals, 
Python would have hundreds more.

> Does that mean they persist only for backwards compatibility?

Backwards compatibility is important. Changing them to functions would 
be disruptive without sufficient gain.

> Was their initial inclusion a violation of some "list method philosophy"?

No, it was part of the Python 1 philosophy of lists as the common data 
interchange type. As I said, this has changed in Python 3.

> Or is
> there a good reason for them to exist, and if so, why shouldn't
> .replace() and .indexes() also exist?

Neither are list methods. Nicks gave a generic indexes generator. A 
specific list indexes generator can use repeated applications of .index 
with start argument. I do that 'inline' below.

> I would hate for the main
> criterion to be, "these are the methods that existed in Python 2.3,"

Then you are hating reality ;-). The .method()s of basic builtin classes 
is close to frozen.

>> There is a perfectly good python version above that does the necessary
>> search and replace as efficiently as possible. Thank you for posting it.

> You say "as efficiently as possible," but you mean, "as algorithmically
> efficient as possible," which is true, they are linear, which is as good
> as it's going to get.  But surely if coded in C, these operations would
> be faster.

You are right.
Lets do the next-item search in C with .index.
If the density of items to be replaces is low, as it would be for most 
applications, this should dominate.

def enum(lis):
       for i, n in enumerate(lis):
           if n == 1:
               lis[i] = 2

a, b = 100, 10000  # started with 2,1 for initial tests
start = a*([1]+b*[0])
after = a*([2]+b*[0])

# test that correct before test speed!
# since the list is mutated, it must be reset for each test
lis = start.copy()
enum(lis)
print('enum: ', lis == after)

def repin(lis):
     i = -1
     try:
         while True:
             i = lis.index(1, i+1)
             lis[i] = 2
     except:
         pass

lis = start.copy()
repin(lis)
print('repin: ', lis == after)

from timeit import timeit
# now for speed, remembering to reset for each test
# first measure the copy time to subtract from test times
print(timeit('lis = start.copy()',
              'from __main__ import start', number=10))
print(timeit('lis = start.copy(); enum(lis)',
              'from __main__ import start, enum', number=10))
print(timeit('lis = start.copy(); repin(lis)',
              'from __main__ import start, repin', number=10))
# measure scan without replace to give an upper limit to python-coded 
replace
# since lis is not mutated, it only needs to be defined once
print(timeit('repin(lis)',
              'from __main__ import a, b, repin; lis = a*(b+1)*[0]',
              number=10))

# prints
enum:  True
repin:  True
0.06801244890066886
0.849063227602523
0.2759397696510706
0.20790119084727898

After subtracting and dividing, enum take .078 seconds for 100 
replacements in 1000000 items, repin just .021, which is essentially the 
time it takes just to scan 1000000 items. So doing the replacements also 
in C would not be much faster. Rerunning with 10000 replacements (a,b = 
10000, 100), the times are .080 and .024.

-- 
Terry Jan Reedy