Data Mining and Visualisation Flashcards Preview

CS1003 > Data Mining and Visualisation > Flashcards

Flashcards in Data Mining and Visualisation Deck (13)
Loading flashcards...
1

What are the three ares for big data application?

Scientific
Medical
Commericial

2

State the stages in the basic scientific process

1. Observe data about the world
2. Notice patterns in data
3. Devise a hypothesis which explains data
4. Run an experiment on unseen data
5. Refine or reject hypothesis

3

What are the stages in the knowledge discovery pipeline?

Acquisition
Cleaning
Selection
Processing
Data Mining
Visualisation
Interpretation/Knowledge

4

What is a risk of a deep neural net?

Model is so flexible that it will fit any data and predict nothing

5

Describe the steps in a k-Means algorithm

Pick k points at random as initial means
Assign each point to the nearest mean
Replace means by actual means of points assigned to it
Repeat until nothing changes

6

Describe what k-Means clustering algorithm does

Discovers similar groups in data, data falls into k clusters, each represented by the nearest mean. Evaluation is least total distance from each point to its nearest mean

7

What is statistics used to do?

Extract patterns from data

8

Describe the p value

How likely it is that a result this unusual could have occured by chance

9

What is the p value used to do?

Assess the significance of a result

10

Describe statistical power

The probability that your test detects an effect if it is real

11

What does the statistical power depend on?

Size of the effect and the sample size

12

What should graphical displays do?

- Show the data
- Induce the viewr to think about the substance
- Avoid distorting what the data as to say
- Present many numbers in a small space
- Make large data sets coherent
- Reveal the data at several levels of details

13

What is graphical excellence?

Well designed presentation of interesting data, consists of complex ideas communicated with clarity, precision and efficiency