[Tutor] extracting numbers from a list

Wed Oct 18 04:23:25 CEST 2006

On Mon, 16 Oct 2006, kumar s wrote:

> I have a simple question to ask tutors:
>
> list A :
>
> a = [10,15,18,20,25,30,40]

Hi Kumar,

If you're concerned about correctness, I'd recommend that you try thinking 
about the problem inductively.  An inductive definition for what you're 
asking is straightforward to state in about three or four lines of code. 
I'll try to go through it slowly so you see what the reasoning behind it 
is.  The code sketch above uses a technique that you should already know 
called "mathematical induction."

     http://en.wikipedia.org/wiki/Mathematical_induction

Let's say we're designing a function called getSpans().  Here are some 
sample behavior we'd like from it:

     getSpans([10, 15]) = [(10, 15)]
     getSpans([10, 15, 18]) = [(10, 15), (16, 18)]
     getSpans([10, 15, 18, 20]) = [(10, 15), (16, 18), (19, 20)]

Would you agree that this is reasonable output for a function like this? 
getSpans() takes a list of numbers, and returns a list of pairs of 
numbers.

There is one "base" case to this problem.  The smallest list we'd like to 
consider is a list of two elements.  If we see that, then we're happy, 
because the answer is really simple:

     getSpans([a, b]) = [(a, b)]

Otherwise, let's imagine a list that's a bit longer, with three elements. 
Concretely, we know that this is going to look like:

     getSpans([a, b, c]) = [(a, b), (b+1, c)]

But another way to say this, though is that:

     getSpans([a, b, c]) = [(a, b)] + getSpans([b+1, c])

That is, we try to restate the problem in terms of smaller subproblems.

Let's look at what the case for four elements might look like:

     getSpans([a, b, c, d]) = [(a, b), (b+1, c), (c+1, d)]

Concretely, we know that that's the list of spans we'd like to see.  But 
if we think about it, we might also restate this as:

     getSpans([a, b, c, d]) = [a, b] + getSpans([b+1, c, d])

because getSpans([b+1, c, d]) is going to give us:

     [(b+1, c), (c+1, d)]

All we need to do is add on [(a, b)] to that to get the complete answer to 
getSpans([a, b, c, d]).

Generally, for any particular list L that's longer than two elements:

     getExons(L) = [L[0:2]] + getExons([L[1] + 1] + L[2:])

When we work inductively, all we really need to think about is "base case" 
and "inductive case": the solution will often just fall through from 
stating those two cases.  An inductively-designed function is going to 
look something like:

     def solve(input):
         if input looks like a base-case:
             handle that directly in a base-case way
         else:
             break up the problem into smaller pieces
             that we assume can be solve()d by induction

The inductive definition above is slightly inefficient because we're doing 
physical list slicing.  Rewriting it to use loops and list indicies 
instead of slicing is a little harder, but not much harder.

Another example: how do we add up a list of numbers?  If there's just one 
number, that must be the sum.  Otherwise, we can add up the first number 
to the sum of the rest of the numbers.

#################################
def mysum(L):
     if len(L) == 1:
         return L[0]
     else:
         return L[0] + mysum(L[1:])
#################################

It's a funky way of doing this, but this is a real definition that works 
(modulo limits in Python's recursion implementation).  It's inefficient, 
but it's easy to state and reason about.  I'm assuming you're more 
interested in correctness than efficiency at the moment.  Get it correct 
first, then if you really need to, work to get it fast.