Cluster Analysis Flashcards Preview

Research > Cluster Analysis > Flashcards

Flashcards in Cluster Analysis Deck (20)
Loading flashcards...
1

When to look at cluster patterns

PT would like to group patients according to their attributes in order to better treat them

PT would like to classify patients based on their individual health records in order to develop specific appropriate management strategies

2

Hierarchical clustering

set of nested clusters organized using hierarchical tree

produce a set of nested clusters. each pair of individuals or clusters progressively nested in larger until only one remains

3

Non-Hierarchical clustering

group of individuals into clusters so that each object is in exactly one cluster

divides a data set of 'n' individuals into 'm' clusters

K-mean clustering most commonly used type

4

Hierarchical Clustering:
Bottom-up (agglomerative)

starts with one single piece of data and then merge it with others to form larger groups

5

Hierarchical Clustering:
Top down (divisive)

starts with all in one group and then partition data step by step using a flat clustering algorithm

6

Procedure of Agglomerative style

1. assign each item to a cluster

2. find closest pair of clusters and merge into a single cluster

3. compute distances (similarities) between the new cluster and each of the old clusters

4. repeat steps 2 and 3 until all items are clustered into a single cluster of the original sample size

7

Limitations of Hierarchical Clustering

necessary to specifiy both distance metric and linkage criteria without any strong theoretical basis

selecting the number of clusters using dendrogram may mislead

8

K-Mean Clustering

data is classified into K number of clusters.

each individual data is mapped into the cluster with its nearest mean

9

K-Mean Clustering:
Procedure

1. select K points as initial centroids

2. assign points to different centroids based on proximity

3. re-evaluate centroid of each group

4. repeat steps 2 and 3 until best solutions emerge (centers are stable)

10

K-Mean Clustering:
Limitations

researcher chooses number of clusters

more Ks=shorter distance from centroid

when every data point is a centroid the distance is 0 but is useless

11

Two Step Clustering

run pre-clustering first and then hierarchical methods.

-can have categorical AND continuous clusters
-automatic selection of number of clusters
-ability to analyze large data set efficiently

12

Two Step Clustering:
Procedure

1. a sequential approach is used to pre-cluster the cases by condensing the variables

2. the pre-clusters are statistically merged into the desired # of clusters

13

Cluster Quality Validation Index:
Silhouette coefficient

measures how well an individual data is clustered and estimates the average distance between clusters

14

Cluster Quality Validation Index:
Silhouette plot

displays a measure of how close each point in one cluster is to points in the neighboring cluster

15

Interpretation with Silhouette coefficient:

individual data with large Silhouette coefficient value of almost 1

very well clustered

16

Interpretation with Silhouette coefficient:

individual data with small Silhouette coefficient value of around 0

lies between two clusters

17

Interpretation with Silhouette coefficient:

individual data with negative coefficient value

probably placed in the wrong cluster

18

Silhouette coefficient value

0.5-1.0

Good

19

Silhouette coefficient value

0.2-0.5

Fair

20

Silhouette coefficient value

-1.0 - 0.2

Poor