Flashcards in Data Mining and Visualisation Deck (13)

Loading flashcards...

1

## What are the three ares for big data application?

###
Scientific

Medical

Commericial

2

## State the stages in the basic scientific process

###
1. Observe data about the world

2. Notice patterns in data

3. Devise a hypothesis which explains data

4. Run an experiment on unseen data

5. Refine or reject hypothesis

3

## What are the stages in the knowledge discovery pipeline?

###
Acquisition

Cleaning

Selection

Processing

Data Mining

Visualisation

Interpretation/Knowledge

4

## What is a risk of a deep neural net?

### Model is so flexible that it will fit any data and predict nothing

5

## Describe the steps in a k-Means algorithm

###
Pick k points at random as initial means

Assign each point to the nearest mean

Replace means by actual means of points assigned to it

Repeat until nothing changes

6

## Describe what k-Means clustering algorithm does

### Discovers similar groups in data, data falls into k clusters, each represented by the nearest mean. Evaluation is least total distance from each point to its nearest mean

7

## What is statistics used to do?

### Extract patterns from data

8

## Describe the p value

### How likely it is that a result this unusual could have occured by chance

9

## What is the p value used to do?

### Assess the significance of a result

10

## Describe statistical power

### The probability that your test detects an effect if it is real

11

## What does the statistical power depend on?

### Size of the effect and the sample size

12

## What should graphical displays do?

###
- Show the data

- Induce the viewr to think about the substance

- Avoid distorting what the data as to say

- Present many numbers in a small space

- Make large data sets coherent

- Reveal the data at several levels of details

13