frequency analysis of a DB column

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Wed Aug 1 23:38:28 EDT 2007


En Wed, 01 Aug 2007 23:21:53 -0300, goldtech <goldtech at worldpost.com>  
escribió:

> In Python 2.1 are there any tools to take a column from a DB and do a
> frequency analysis - a breakdown of the values for this column?
>
> Possibly a histogram or a table saying out of 500 records I have one
> hundred and two "301" ninety-eight "212" values and three-hundred
> "410"?
> Is SQL the way to for this?

I'd start with:

select column, count(column), min(column), max(column)
 from table
group by column
order by count(column) desc

and then build an histogram from that (using PyChart for instance). Based  
on this distribution curve, one can refine the analysis in a lot of ways...

> Of course there'd be 1000's of values....

Should not be a problem for today's DBMS and hardware...

-- 
Gabriel Genellina




More information about the Python-list mailing list