Python and Schools

Thu Apr 17 13:18:39 EDT 2003

David Eppstein wrote:

> In article <_Fzna.27993$LB6.643892 at news1.tin.it>,
>  Alex Martelli <aleax at aleax.it> wrote:
> 
>> Steven Taschuk wrote:
>>    ...
>> > trade-off as being between, say, an algorithm which runs in O(n)
>> > time but needs O(n^2) memory to do it, and an alternative which
>> 
>> I don't think there can be any such algorithm -- in O(n) time,
>> the algoritm cannot USE more than O(n) memory, so how could it
>> NEED it...?  The reverse, sure, but not this way, I think.
> 
> A simple example would be an algorithm that takes as input a sequence of
> values that are all (for whatever reason) guaranteed to be integers in
> range(n**2), and tests whether any two of them are the same:
> 
> def anysame(seq):
>     allocate array A, of dimension n**2, WITHOUT INITIALIZING IT
>         (is this possible in pure python?)

I don't think it is, but probably you could write a C-coded
extension that does this.  It would have to be a C-coded type
in order to be able to ensure returning the x-th item as a
valid Python object without having undergone any initialization,
though.  Perhaps on some architectures mmap applied to some
appropriate /dev/??? would let you do this (but I'm not sure).

>     for i,x in enumerate(seq):
>         if A[x] is an integer and 0 <= A[x] < i and A[A[x]] == x:
>             return True
>         A[x] = i
>     return False

Neat!  Takes a while to convince oneself of this, but, yes,
if "allocating N cells of uninitialized memory" exists as a
O(1) primitive, this will work.  I had not considered this
possibility when I expressed my doubts on Steven's words!

> Initializing A would require too much time, but the algorithm is capable
> of working with an uninitialized array.  The usual trick for reducing
> the memory requirements of such an algorithm is to replace the array by
> a hash table, and of course in Python the hash table (dictionary) based
> duplicate detection algorithm is much more preferable, but this makes
> theoreticians unhappy by turning worst-case time bounds into randomized
> expected time bounds.

Oh yes, absolutely.

Alex