L10; CLUSTERING Flashcards

1
Q

clustering

A

clustering is used to group/ classify or to create subsets of data with similar attributes.
it works by calculating the similarity of different objects.
this is often considered as the inverse of distance.

limitation; similarity is sometimes difficult to define and different similarity criteria can lead to different clustering results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

clustering types

A
  1. Hierarchical Clustering

2. Non-Hierarchical clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

hierarchical clustering (2)

A

algorithms;

  1. Agglomerative clustering (bottom up)
  2. Divisive clustering (top down)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

three way of distance measures

A
single linkage (closest point between two)
Complete Linkage (Furthest neighbour)
Average Linkage ( calculate every single points and then use average)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

K-means clustering

A

K-Means is the most commonly used clustering algorithm.

K refers to the number of clusters you want to classify your data into.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

procedure of k-means clustering

A
  1. choose value for K, the number of clusters.
  2. Randomly choose K points as centroids.
  3. Assign items to cluster with nearest centroid(mean).
  4. Recalculate centroids as the average of all data points in a cluster.
  5. repeat steps 3 and 4 till no more reassignments or reach max number of iterations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

k-means clustering limitations

A

difficult to choose K, need human inspection or novel algorithms.
dependant on seeds/ center positions;
sensitive to outliers;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

variable reduction

A

variable reduction techniques can be used to reduce the dimensions( variables/ columns) of a dataset before applying clustering methods.

This allows clustering on multidimensional data to be visualised in 2 or 3 dimensional space.

Principal Components Analysis and Exploratory Factor Analysis will be covered in the Forecasting and Advanced Business Analytics module.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly