Mapping with continguous ranges of keys
Terry Reedy
tjreedy at udel.edu
Thu Dec 15 15:42:00 EST 2016
On 12/15/2016 12:06 PM, Steve D'Aprano wrote:
> I have some key:value data where the keys often are found in contiguous
> ranges with identical values. For example:
>
> {1: "foo",
> 2: "foo",
> 3: "foo",
> # same for keys 4 through 99
> 100: "foo",
> 101: "bar",
> 102: "bar",
> 103: "foobar",
> 104: "bar",
> 105: "foo",
> }
For this particular example (untested):
def lookup(n):
F, B, FB = 'foo', 'bar', 'foobar' # Ensure value objects reused
if 1 <= n <= 100:
return F
elif 101 <= n <= 105:
return (B, B, FB, B, F')[n - 101]
else:
raise ValueError('n must be in range(1, 106)')
> So in this case, the keys 1 through 100, plus 105, all have the same value.
>
> I have about a million or two keys, with a few hundred or perhaps a few
> thousand distinct values. The size of each contiguous group of keys with
> the same value can vary from 1 to perhaps a hundred or so.
> Has anyone dealt with data like this and could give a recommendation of the
> right data structure to use?
>
> The obvious way is to explicitly list each key, in a regular dict. But I
> wonder whether there are alternatives which may be better (faster, more
> memory efficient)?
If, as in the example, the keys comprise a contiguous sequence of ints,
the 'obvious' way to me is sequence of values. A tuple of constants
gets compiled and will load quickly from a .pyc file. 4 or 8 bytes per
entry times 1 or 2 million entries is usually tolerable on a gigabyte
machine.
Or trade time for space with binary search or search in explicit binary
tree. Or combine binary search and indexed lookup as I did above.
--
Terry Jan Reedy
More information about the Python-list
mailing list