parallelizing cKDTRee
Speed is very important when searching kd-trees; otherwise we should not be using kd-trees but brute force. Thus exploiting multiple processors are important as well. 1. Multiprocessing: Must add support for pickling and unpickling to cKDTree (i.e. __reduce__ and __setstate__ methods). This would be useful for saving to disk as well. 2. Multithreading (Python): cKDTree.query calls cKDTree.__query with the GIL released (i.e. a 'with nogil:' block). I think this will be safe. 3. Multithreading (Cython): We could simply call cKDTree.__query in parallel using OpenMP pragmas. It would be a simple and quite portable hack. Which do you prefer? All three? (Forgive me for cross-posting. I did not know which list is the more appropriate.) Regards, Sturla Molden
2009/1/7 Sturla Molden <sturla@molden.no>:
Speed is very important when searching kd-trees; otherwise we should not be using kd-trees but brute force. Thus exploiting multiple processors are important as well.
Very sensible.
1. Multiprocessing: Must add support for pickling and unpickling to cKDTree (i.e. __reduce__ and __setstate__ methods). This would be useful for saving to disk as well.
This would be a good idea. One labor-saving device would be to take advantage of the fact that construction is pretty cheap and have the pickle method pickle only the construction parameters and the underlying array. Not that pickling the tree structure would be very hard either. One could also implement conversion beween C and python kdtrees, which would make it easier to modify existing kdtrees or implement new algorithms.
2. Multithreading (Python): cKDTree.query calls cKDTree.__query with the GIL released (i.e. a 'with nogil:' block). I think this will be safe.
This looks very easy and should pose no problems.
3. Multithreading (Cython): We could simply call cKDTree.__query in parallel using OpenMP pragmas. It would be a simple and quite portable hack.
Are there compilation issues with using OpenMP? Otherwise this should be fairly easy too, although selecting the right degree of parallelism may be a problem (I have no experience with OpenMP).
Which do you prefer? All three?
I think all three are good ideas; I'd start with the low-hanging fruit, which is just releasing the GIL. Then pickling, which is useful for other purposes, and then OpenMP if it's useful. I won't have time to do anything about this for a week or two. Anne
On 1/7/2009 4:55 PM, Anne Archibald wrote:
This would be a good idea. One labor-saving device would be to take advantage of the fact that construction is pretty cheap and have the pickle method pickle only the construction parameters and the underlying array. Not that pickling the tree structure would be very hard either.
Yes. I have an example of that in the Cookbook. I'm actually looking at adding __reduce__ and __setstate__ now. I'll post the code tomorrow or on friday.
3. Multithreading (Cython): We could simply call cKDTree.__query in parallel using OpenMP pragmas. It would be a simple and quite portable hack.
Are there compilation issues with using OpenMP? Otherwise this should be fairly easy too, although selecting the right degree of parallelism may be a problem (I have no experience with OpenMP).
Yes. There are compilers that don't support it (but gcc does). And there is the Cyton to C compilation, which tempers with the variable names. The pragma must thus be added to the C code and possibly use variable names present in the autogenerated C. Since there is no need for any synchronization among OpenMP threads here, I guess this should suffice: #pragma omp parallel \ private(__pyx_v_c, __pyx_v_k, __pyx_v_dd, \ __pyx_v_ii, __pyx_v_xx, \ __pyx_v_self, \ __pyx_v_eps, __pyx_v_p, \ __pyx_v_distance_upper_bound) It should be placed right before the line that says: for (__pyx_v_c = 0; __pyx_v_c < __pyx_8; __pyx_v_c+=1) { /* "/usr/data/david/src/dsp/scipy/trunk/scipy/spatial/ckdtree.pyx":583 Regards, Sturla Molden
On 1/7/2009 6:39 PM, Sturla Molden wrote:
This would be a good idea. One labor-saving device would be to take advantage of the fact that construction is pretty cheap and have the pickle method pickle only the construction parameters and the underlying array. Not that pickling the tree structure would be very hard either.
Yes. I have an example of that in the Cookbook.
I'm actually looking at adding __reduce__ and __setstate__ now. I'll post the code tomorrow or on friday.
Oh, I did not think of that :) I pickled the whole kd-tree, but it would be even cheaper to just have the worker processes build their own from the data. If the workers equal the number of processors, the extra overhead will be zero. Which means there would be no need for supporting the pickle protocol at all. Sturla Molden
participants (2)
-
Anne Archibald
-
Sturla Molden