Mapping with continguous ranges of keys

Thu Dec 15 15:42:00 EST 2016

On 12/15/2016 12:06 PM, Steve D'Aprano wrote:
> I have some key:value data where the keys often are found in contiguous
> ranges with identical values. For example:
>
> {1: "foo",
>  2: "foo",
>  3: "foo",
>  # same for keys 4 through 99
>  100: "foo",
>  101: "bar",
>  102: "bar",
>  103: "foobar",
>  104: "bar",
>  105: "foo",
> }

For this particular example (untested):

def lookup(n):
     F, B, FB = 'foo', 'bar', 'foobar'  # Ensure value objects reused
     if 1 <= n <= 100:
         return F
     elif 101 <= n <= 105:
         return (B, B, FB, B, F')[n - 101]
     else:
         raise ValueError('n must be in range(1, 106)')

> So in this case, the keys 1 through 100, plus 105, all have the same value.
>
> I have about a million or two keys, with a few hundred or perhaps a few
> thousand distinct values. The size of each contiguous group of keys with
> the same value can vary from 1 to perhaps a hundred or so.

> Has anyone dealt with data like this and could give a recommendation of the
> right data structure to use?
>
> The obvious way is to explicitly list each key, in a regular dict. But I
> wonder whether there are alternatives which may be better (faster, more
> memory efficient)?

If, as in the example, the keys comprise a contiguous sequence of ints, 
the 'obvious' way to me is sequence of values.  A tuple of constants 
gets compiled and will load quickly from a .pyc file.  4 or 8 bytes per 
entry times 1 or 2 million entries is usually tolerable on a gigabyte 
machine.

Or trade time for space with binary search or search in explicit binary 
tree.  Or combine binary search and indexed lookup as I did above.

-- 
Terry Jan Reedy