Clustering technique
Jon Clements
joncle at googlemail.com
Tue Dec 22 06:59:52 EST 2009
On Dec 22, 11:12 am, Luca <nioski... at yahoo.it> wrote:
> Dear all, excuse me if i post a simple question.. I am trying to find
> a software/algorythm that can "cluster" simple data on an excel sheet
>
> Example:
> Variable a Variable b Variable c
> Case 1 1 0 0
> Case 2 0 1 1
> Case 3 1 0 0
> Case 4 1 1 0
> Case 5 0 1 1
>
> The systems recognizes that there are 3 possible clusters:
>
> the first with cases that has Variable a as true,
> the second has Variables b and c
> the third is "all the rest"
>
> Variabile a Variabile b Variabile c
>
> Case 1 1 0 0
> Case 3 1 0 0
>
> Case 2 0 1 1
> Case 5 0 1 1
>
> Case 4 1 1 0
>
> Thank you in advance
If you haven't already, download and install xlrd from http://www.python-excel.org
for a library than can read excel workbooks (but not 2007 yet).
Or, export as CSV...
Then using either the csv module/xlrd (both well documented) or any
other way of reading the data, you effectively want to end up with
something like this:
rows = [
#A #B #C #D
['Case 1', 1, 0 ,0],
['Case 2', 0, 1, 1],
['Case 3', 1, 0, 0],
['Case 4', 1, 1, 0],
['Case 5', 0, 1, 1]
]
One approach is to sort 'rows' by B,C & D. This will bring the
identical elements adjacent to each other in the list. Then you need
an iterator to group them... take a look at itertools.groupby.
Another is to use a defaultdict(list) found in collections. And just
loop over the rows, again with B, C & D as a key, and A being appended
to the list.
hth
Jon.
More information about the Python-list
mailing list