Surprising (for me) benchmark results...

Cedric Adjih adjih at crepuscule.com
Thu May 3 04:30:35 EDT 2001


  I was ready to post an anwser to Daniel along the same
lines, but you already did, so I've tried to found arguments 
for the opposite point of view :-)

John J. Lee <phrxy at csv.warwick.ac.uk> wrote:
> On Wed, 2 May 2001, Daniel Berln wrote:
> [...]
>> Typically, an item is multiple bytes. Let's say the average english word
>> is 5 characters ( I can't remember what the actual figure is).
>> This would mean Disk accesses are O(n), Sorting is O(n/5 log n/5) where
>> both n's are directly comparable. Once again, disk access clearly
>> dominates.
>
> O(N/5 log N/5) = O(N log N)
>
> The constant factors etc. only affect the *point* at which disk access
> becomes unimportant, not *whether* it does (assuming you can get to big
> enough N without running out of memory).

Very true. However the value of N necessary to compensate an fixed
initial constant factor grows exponentially with this factor ;
when comparing O(N) to O(N log N).

For instance in order to compensate a factor 30, log N=30 would
give N=10^13, which is somewhat big. Although a factor 10 is more likely
to be offset (N ~= 20000).

>> Until you are sorting a very large number of single byte items with
>> quicksort, disk accesses will dominate.
>
> Yes, that may well be true.  I don't know.
>
>> Now, this is even assuming disk accesses are O(n), which they aren't.
>
> They are O(N), but that doesn't mean that the time it takes is simply
> proportional to N, does it?  Or that there won't be some variation about
> the O(N) behaiviour for small changes in N?
>
> I think we're just not using the same terminology.  Rainer was just saying
> that, for large enough N, disc access will not dominate (as long as
> everything fits in memory).  This is true.  I think (correct me if I'm
> wrong) you're saying that we won't get to that point before we run out of
> memory.  Or perhaps you're just saying that for some small N and small
> change in N, the trend goes in the other direction, and in fact as N
> increases disk access goes from being unimportant to being important.
> These statements may or may not be true, I don't know.  It seems unlikely
> that O(N) could be too far wrong for if the storage required is close to
> the amount of memory in your machine and N isn't too small, at least.

  Well actually file access is not necessary O(N). Linux for instance
uses tree with depth=0, then tree with depth=1, then tree with depth=2, 
then tree with depth=3 to access the block number of a given block.
Since IIRC you need a regular tree with depth = O(N) to access N elements,
it could be argued file access is O(N log N). Of course since Linux and 
most implementations don't allow arbitrary deep trees, it becomes
difficult to talk about O(N) or O(N log N) :-)
And you could directly access the hard disk blocks.

  After all, it seems comparing O(1) and O(log N) is difficult
in practice.

-- Cedric



More information about the Python-list mailing list