fastest data structure for retrieving objects identified by (x,y) tuple?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Wed Oct 3 21:58:16 EDT 2012
On Wed, 03 Oct 2012 18:30:24 -0400, Benjamin Jessup wrote:
> I have a group of objects identified by unique (x,y) pairs and I want to
> find out an object's "neighbors" in a matrix of size 2400 x 2400.
[...]
> There is either a neighbor, or a null value. I always know the (x,y)
> pair to check the neighbors of, so is doing,
> >> obj = grid[x][y] #lists, doesn't scale with num of objects
> or,
> >> obj = grid.get((x,y),None) #dictionary, scales with num of objects
> the fastest? I can't seem to find a conclusion by testing each alone,
> then in the full environment. Is it that, depending on the number of
> objects, each has an advantage?
Almost certainly not. Since you don't show how you test each, I have no
idea why you can't find a conclusion.
Let's start off with the question you don't ask: which uses less memory?
The answer is, the dict solution by far. Here is a test:
# populate a random matrix using both dict and list
adict = {}
alist = [[None]*2400 for i in range(2400)]
from random import randrange
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
adict[(x, y)] = "something"
alist[x][y] = "something"
import sys
print(sys.getsizeof(adict))
print(sys.getsizeof(alist) + sum(sys.getsizeof(L) for L in alist))
The actual sizes printed will depend on how sparse the matrices are, but
for the same above (approximately half full), using Python 2.7, the
figures I get are:
adict: 24712
alist: 23127324
Now let's test the speed:
# randomly select matrix coordinates to look-up
test_coords = []
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
test_coords.append((x, y))
# build some test code
from timeit import Timer
setup = "from __main__ import adict, alist, test_coords"
t1 = Timer("for p in test_coords: obj = adict.get(p)", setup)
t2 = Timer("for p in test_coords: obj = alist[p[0]][p[1]]", setup)
# run the test code
print(min(t1.repeat(number=10000, repeat=7)))
print(min(t2.repeat(number=10000, repeat=7)))
Again, on my system using Python 2.7, I get:
3.13823986053
2.97539305687
which shows that the two are very close, but the list solution is about
6% faster. So in this situation, a list of lists uses about 100 times
more memory than a dict, but look-ups are about 6% faster.
I would be very surprised if the timing results depended on the number of
objects in the matrices.
In case you are not familiar with timeit, let me explain what I have done:
* I pre-select 1000 random coordinates.
* I write some test code inside a Timer object that look up each of
those coordinates, plus some setup code.
* timeit runs the setup code once.
* Then it runs my test code 10000 times as a single trial, timing
it as accurately as possible.
* It does seven trials, and I report the best of the seven.
The time printed is measured in seconds. In this case, I get 3 seconds
per trial, or 3e-7 seconds = 0.3 microseconds per lookup.
> I know the fastest way to retrieve them would be to have them store
> pointers to their neighbors, then use those for retrieval.
How do you know that?
No offence, but if you can't even work out whether lookups in a dict or a
list are faster, I can't imagine why you think you can intuit what the
fastest way to retrieve the nearest neighbours would be.
--
Steven
More information about the Python-list
mailing list