Python and Schools
Beni Cherniavsky
cben at techunix.technion.ac.il
Fri Apr 18 04:34:14 EDT 2003
In comp.lang.python, you wrote:
>[David Eppstein]
>> ...
>> A simple example would be an algorithm that takes as input a sequence of
>> values that are all (for whatever reason) guaranteed to be integers in
>> range(n**2), and tests whether any two of them are the same:
>>
>> def anysame(seq):
>> allocate array A, of dimension n**2, WITHOUT INITIALIZING IT
>> (is this possible in pure python?)
>
>It shouldn't be possible: if you find a way to do it, it would be
>considered a security bug. No memory is really uninitialized -- if Python
>gave you read access to n**2 bytes in constant time, it would be revealing
>to you the state left behind by some other function (or user, or process).
>If you created a file with at least n**2 bytes beforehand, then you could
>mmap the file in constant time, and that acts like an array object
>initialized to whatever bytes happen to be sitting in the file. But the O()
>behavior of access to an mmap'ed region depends on the platform
>implementation of mmap'ed regions.
>
It's impossible in plain portable C but an operating system can do this using
memory mapping hardware. It's possible to mark the required range of memory
pages to cause an exception and be initialized [to zeros] only when they are
accessed. Since the algorithm only accesses a few pages, this should work
fine. This reveals a little clarification to the original claim: the
algorithm doesn't use O(N**2) memory, it eats O(N**2) of your *address space*.
In fact, that's what mmaping /dev/zero probably does on most platforms...
There is a catch: for large enough N, you need to mark O(N**2) pages in this
way, so it's O(N**2) time to allocate again :-(. This can be overcome by
hierarchical page tables. To be precise, you would need an arbitrarily deep
hierarchy rather than one of fixed depth that real computers tend to have
<wink>. Note that such a hierarchy is actually a way of implementing a
dictionary from digital trees :-).
You would actually need infinite memory to have an unbounded hierarchy. Which
exposes a small catch with application of asymptotic analysis once you
approach limits of address space, integer size, etc. No algorithm takes
O(f(N)) when N approaches infinity - instead it just runs out of resources
:-)! So many algorithms can be pervertly claimed to be O(1) on a given
architecture. For example one can sort any array of 16 bit integers in O(1)
by just allocating a O(2**16) array counting how many times each number
appeared and then scanning the array. Of course this is worse than any
sensible algorithm one would write - because of the huge factor. In truth
this is not O(1) - it's O(M) where M is your address space or "integer space"
size. But omitting such considerations can lead to funny "paradoxes"...
P.S. Cute algorithm indeed!
Beni Cherniavsky <cben at tx.technion.ac.il>
More information about the Python-list
mailing list