I have some speed-critical loops involving numarray arrays and need to see which part is taking all the time. Is there a seperate profiling tool or how do you go about profiling everything (broken down by module/function) in an interpreted language like python?