Data peeping function?

Thor Whalen thorwhalen at gmail.com
Sun Jan 12 12:36:08 EST 2014


The first thing I do once I import new data (as a pandas dataframe) is to .head() it, .describe() it, and then kick around a few specific stats according to what I see.

But I'm not satisfied with .describe(). Amongst others, non-numerical columns are ignored, and off-the-shelf stats will be computed for any numerical column.

I've been shopping around for a "data peeping" function that would:

(1) Have a hands-off mode where simply typing
       diagnose_this(data)
the function would figure things out on its own, and notify me when in doubt. For example, would assume that any string data with not too many unique values should be considered categorical and appropriate statistics erected.

(2) Perform standard diagnoses and print them out. For example, (a) missing values? (b) heterogeneously formatted data? (c) columns with only one unique value? etc.

(3) Be parametrizable, if I so choose.

Does anyone know of such a function?



More information about the Python-list mailing list