I just uploaded a PR to add the basic setup for code profiling using Wes McKinney's `vbench`: PR 180. (You'll have to install vbench to actually run the benchmarks.)

This PR adds the basic infrastructure for benchmarking; I'm thinking that the benchmarks themselves can be added incrementally. Right now it takes a surprisingly-long time to run; this may be an issue with how I've set it up, but I'm not sure.

These files are from the pandas vbench suite with a bit of modification. For a direct comparison with the pandas suite, I've renamed:

    suite.py -> settings.py
    run_suite.py -> run_benchmarks.py

and instead of putting the test files in the same directory as these files, I've dumped them into the "suite" subdirectory.

If you want to try it out, you'll have to install vbench, and you'll probably want to uncomment the alternate start date in settings.py (even so, it still takes a surprisingly long time to run on my system). Running `make` will generate a "source" directory, which contains restructuredtext files for displaying the benchmarking output (source->vbench->figures contains the relevant plots).

As an example, I've attached the result for calling `img_as_ubyte` on an `int` image. Note the sharp drop in execution time in early February: this is when Christophe Gohlke's PR for dtype conversion (PR #99) was finally merged.