Book - Chapter 6 Analytical Theory Regression Flashcards Preview

EMCDSA > Book - Chapter 6 Analytical Theory Regression > Flashcards

Flashcards in Book - Chapter 6 Analytical Theory Regression Deck (35)
Loading flashcards...
1
Q

Linear regression is a useful tool for answering what question

A

What is a persons expected income

2
Q

Logistic regression is a popular method for answering what question

A

What is the probability that an applicant will default on the loan

3
Q

In a linear regression what is the output

A

Continuous variable

4
Q

In linear regression what is the input

A

Continuous or discrete variables

5
Q

What is a key assumption of linear regression

A

That the relationship between an input variable and an output variable is linear

6
Q

What is a linear regression model

A

A probabilistic one that accounts for the randomness that can affect any particular outcome

7
Q

Where would you use linear regression

A

Real estate demand forecasting and medical for example proposed radiation treatment and reducing tumour sizes

8
Q

What is the model outcome of linear regression

A

A set of estimated coefficient to indicates the relative impact of each input variable

9
Q

In the linear regression what is a common technique to estimate the para metres

A

Ordinary least squares (0LS)

10
Q

What is the goal of OLS

A

Find the line the best approximates relationship between the outcome variable and the input variable

11
Q

What is a categorical variable

A

For example female or male

12
Q

In regression what is the proper way to implement a categorical variable that can take on M different values

A

M -1 binary

13
Q

What is the confidence percentage for linear regression

A

95%

14
Q

Linear regression what a confidence intervals used for

A

To draw inferences on the populations expected outcome, and prediction intervals are used to draw inferences on the next possible outcome

15
Q

What is a major assumption in linear regression modelling

A

That the relationship between the input variables and the output variable is linear

16
Q

How would you evaluate a relationship between the input variable and the output variable

A

To plot the output variable against each input variable

17
Q

What are common transformations in the linear regression

A

Taking square roots or the logarithm of the variables
Create a new input variables such as the age squared and added to the linear regression model to fit a quadratic relationship between an input variable and the output

18
Q

What is N fold cross validation

A

Common practice to randomly split the entire dataset into training set and a testing set

19
Q

What occurs in N fold cross validation

A

The entire dataset is randomly split into N data sets of approximately equal size
A model is trained against N -1 of these dataset and tested against the remaining dataset. A measure of the model area is obtained.
This process is repeated a total of eight times across the various combinations of any data sets taken N -1 at a time
The observed n model errors or averaged over the n folds

20
Q

What are outliers

A

They can result from bad data collection, data processing errors, or an actual rare occurrence

21
Q

What is the impact of logistic regression

A

Continuous or discrete variables

22
Q

What is the output of logistic regression

A

Coefficients that indicate the impact of each driver

23
Q

What are the use cases for logistic regression

A

Medical in the way you measure the likelihood of A patient response to treatment
Finance to determine the probability then after we default on the loan
Marketing to determine if the customer will switch carriers
Engineering the probability of a mechanical part experience a malfunction

24
Q

Logistical progression as the value of wine increases what happens the probability

A

The probability of the outcome occurring increases

25
Q

What is MLE

A

Maximum likelihood estimation and its use to estimate the model parameters

26
Q

In logistical aggression what is null deviance

A

Is the value where the likelihood function is based only on the intercept term

27
Q

What is the residual deviance in logistic regression

A

The value where the likelihood function is based on the parameters in the specified logistic model

28
Q

What is pseudo-r squared

A

A measure of how well the fitted model explains the data as compared to the default model of no predictor variables and only and intercept term

29
Q

If the pseudo R squared value is near one what does that indicate

A

A good fit over the simple null model

30
Q

How is logistic regression used as a classifier

A

To assign class labels to a person, item, or transaction based on the predicted probability provided by the model

31
Q

What is the default probability threshold in logistical regression

A

0.5

32
Q

How do you work out the false positive rate

A

Number of falls positives divided against number of negatives

33
Q

How do you work out the true positive rate

A

But of true positives divided by number of positives

34
Q

What is the receiver operating characteristic (ROC) curve

A

It is the plot of the true positive rate against the full positive rate

35
Q

When is the RAC curve useful

A

For evaluating other classifiers