ebuchman ethanbuchman at hotmail.com
Wed Dec 18 21:17:11 EST 2013

I'm learning cuda and decided to use python with ctypes to call all the cuda
functions but I'm finding some memory issues.  I've boiled it down to the
simplest scenario.  I use ctypes to call a cuda function which allocates
memory on the device and then frees it.  This works fine, but if I then try
to use np.dot on a totally other array declared in python, I get a
segmentation fault.  Note this only happens if the numpy array is
sufficiently large.  If I change the cuda mallocs to simple c mallocs, all
the problems go away, but thats not really helpful.  Any ideas what's going
on here?

CUDA CODE (debug.cu): 

#include <stdio.h>
#include <stdlib.h>

extern "C" {
void all_together( size_t N)
    int size = N *sizeof(float);
    int err;

    err = cudaMalloc(&d, size);
    if (err != 0) printf("cuda malloc error: %d\n", err);

    err = cudaFree(d);
    if (err != 0) printf("cuda free error: %d\n", err);

PYTHON CODE (master.py):

import numpy as np
import ctypes
from ctypes import *

dll = ctypes.CDLL('./cuda_lib.so', mode=ctypes.RTLD_GLOBAL)

def build_all_together_f(dll):
    func = dll.all_together
    func.argtypes = [c_size_t]
    return func

__pycu_all_together = build_all_together_f(dll)

if __name__ == '__main__':
    N = 5001 # if this is less, the error doesn't show up

    a = np.random.randn(N).astype('float32')

    da = __pycu_all_together(N)

    # toggle this line on/off to get error
    #np.dot(a, a)

    print 'end of python'

COMPILE: nvcc -Xcompiler -fPIC -shared -o cuda_lib.so debug.cu

RUN: python master.py

