Book - Chapter 6 Analytical Theory Regression Flashcards Preview

EMCDSA > Book - Chapter 6 Analytical Theory Regression > Flashcards

Flashcards in Book - Chapter 6 Analytical Theory Regression Deck (35)
Loading flashcards...
1

Linear regression is a useful tool for answering what question

What is a persons expected income

2

Logistic regression is a popular method for answering what question

What is the probability that an applicant will default on the loan

3

In a linear regression what is the output

Continuous variable

4

In linear regression what is the input

Continuous or discrete variables

5

What is a key assumption of linear regression

That the relationship between an input variable and an output variable is linear

6

What is a linear regression model

A probabilistic one that accounts for the randomness that can affect any particular outcome

7

Where would you use linear regression

Real estate demand forecasting and medical for example proposed radiation treatment and reducing tumour sizes

8

What is the model outcome of linear regression

A set of estimated coefficient to indicates the relative impact of each input variable

9

In the linear regression what is a common technique to estimate the para metres

Ordinary least squares (0LS)

10

What is the goal of OLS

Find the line the best approximates relationship between the outcome variable and the input variable

11

What is a categorical variable

For example female or male

12

In regression what is the proper way to implement a categorical variable that can take on M different values

M -1 binary

13

What is the confidence percentage for linear regression

95%

14

Linear regression what a confidence intervals used for

To draw inferences on the populations expected outcome, and prediction intervals are used to draw inferences on the next possible outcome

15

What is a major assumption in linear regression modelling

That the relationship between the input variables and the output variable is linear

16

How would you evaluate a relationship between the input variable and the output variable

To plot the output variable against each input variable

17

What are common transformations in the linear regression

Taking square roots or the logarithm of the variables
Create a new input variables such as the age squared and added to the linear regression model to fit a quadratic relationship between an input variable and the output

18

What is N fold cross validation

Common practice to randomly split the entire dataset into training set and a testing set

19

What occurs in N fold cross validation

The entire dataset is randomly split into N data sets of approximately equal size
A model is trained against N -1 of these dataset and tested against the remaining dataset. A measure of the model area is obtained.
This process is repeated a total of eight times across the various combinations of any data sets taken N -1 at a time
The observed n model errors or averaged over the n folds

20

What are outliers

They can result from bad data collection, data processing errors, or an actual rare occurrence

21

What is the impact of logistic regression

Continuous or discrete variables

22

What is the output of logistic regression

Coefficients that indicate the impact of each driver

23

What are the use cases for logistic regression

Medical in the way you measure the likelihood of A patient response to treatment
Finance to determine the probability then after we default on the loan
Marketing to determine if the customer will switch carriers
Engineering the probability of a mechanical part experience a malfunction

24

Logistical progression as the value of wine increases what happens the probability

The probability of the outcome occurring increases

25

What is MLE

Maximum likelihood estimation and its use to estimate the model parameters

26

In logistical aggression what is null deviance

Is the value where the likelihood function is based only on the intercept term

27

What is the residual deviance in logistic regression

The value where the likelihood function is based on the parameters in the specified logistic model

28

What is pseudo-r squared

A measure of how well the fitted model explains the data as compared to the default model of no predictor variables and only and intercept term

29

If the pseudo R squared value is near one what does that indicate

A good fit over the simple null model

30

How is logistic regression used as a classifier

To assign class labels to a person, item, or transaction based on the predicted probability provided by the model