Best search algorithm to find condition within a range
Ian Kelly
ian.g.kelly at gmail.com
Tue Apr 7 22:15:35 CEST 2015
On Tue, Apr 7, 2015 at 9:40 AM, <jonas.thornvall at gmail.com> wrote:
> Well of course you use same principles like a binary search setting min and max, closing in on the digit. In this case the searched numbers > base^exp and number< base^exp+1.
>
> But since the search is within large bases upto 32-bit space, so base 4294967295 is the biggest allowed. I need to find the nearest less exp in base for each (lets call them pseudo digits). But as you see there it will take time to add them up. So better doing a binary search, you know min-max half (iteration). You can do the same for a random oracle min max within range, and if the number of tries in general over 22 i think a random oracle do it better than a binary search.
Okay, so you're talking about doing a binary search but picking the
partition point randomly instead of evenly?
> It was a long time since i did this, but i do know there is a threshold where searching min max with the oracle will be faster than the binary search.
I don't know where you got this idea from, but it's fiction. If you
split the space in half, then you are always reducing the amount of
work remaining by a factor of 2, and the total work is O(log n). If
you do an uneven split, then while you will sometimes make a good
partition and reduce the amount of work by more than a factor of two,
more often the target will be in the larger subspace and you will have
reduced the amount of work by less than a factor of two. In the worst
case of this, you only remove one element from the search space at a
time and end up with a linear search.
As a demonstration of this, here's a function which does a binary
search for a number in a range and returns the number of partitions it
had to perform to find the number:
>>> def binsearch(n, min, max):
... if min == max: return 0
... med = (max + min) // 2
... if n <= med:
... return 1 + binsearch(n, min, med)
... else:
... return 1 + binsearch(n, med+1, max)
...
And here's the average number of partitions needed to find a number in
various ranges:
>>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n
16.68928
>>> n = 1000000
>>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n
19.951424
>>> n = 10000000
>>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n
23.3222784
Now let's try a variant function that splits the range at one third
instead of evenly:
>>> def trisearch(n, min, max):
... if min == max: return 0
... tern = (min + min + max) // 3
... if n <= tern:
... return 1 + trisearch(n, min, tern)
... else:
... return 1 + trisearch(n, tern+1, max)
...
>>> n = 100000
>>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n
17.88387
>>> n = 1000000
>>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n
21.502276
>>> n = 10000000
>>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n
25.1157592
The ternary version invariably takes more partitions on average to
find the goal number. Now let's try a third variant that partitions
the range randomly:
>>> def randsearch(n, min, max):
... if min == max: return 0
... part = randrange(min, max)
... if n <= part:
... return 1 + randsearch(n, min, part)
... else:
... return 1 + randsearch(n, part+1, max)
...
>>> n = 100000
>>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n
22.17863
>>> n = 1000000
>>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n
26.78371
>>> n = 10000000
>>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n
31.3916557
As you can see, this one does a lot worse on average, and at n=1e6
it's already performing worse than either of the previous versions did
at n=1e7. I'm not even going to try this for n=1e8 because it would
take too long to run.
More information about the Python-list
mailing list