Book - Chapter 6 Analytical Theory Regression Flashcards Preview

EMCDSA > Book - Chapter 6 Analytical Theory Regression > Flashcards

Flashcards in Book - Chapter 6 Analytical Theory Regression Deck (35)
Loading flashcards...

Linear regression is a useful tool for answering what question

What is a persons expected income


Logistic regression is a popular method for answering what question

What is the probability that an applicant will default on the loan


In a linear regression what is the output

Continuous variable


In linear regression what is the input

Continuous or discrete variables


What is a key assumption of linear regression

That the relationship between an input variable and an output variable is linear


What is a linear regression model

A probabilistic one that accounts for the randomness that can affect any particular outcome


Where would you use linear regression

Real estate demand forecasting and medical for example proposed radiation treatment and reducing tumour sizes


What is the model outcome of linear regression

A set of estimated coefficient to indicates the relative impact of each input variable


In the linear regression what is a common technique to estimate the para metres

Ordinary least squares (0LS)


What is the goal of OLS

Find the line the best approximates relationship between the outcome variable and the input variable


What is a categorical variable

For example female or male


In regression what is the proper way to implement a categorical variable that can take on M different values

M -1 binary


What is the confidence percentage for linear regression



Linear regression what a confidence intervals used for

To draw inferences on the populations expected outcome, and prediction intervals are used to draw inferences on the next possible outcome


What is a major assumption in linear regression modelling

That the relationship between the input variables and the output variable is linear


How would you evaluate a relationship between the input variable and the output variable

To plot the output variable against each input variable


What are common transformations in the linear regression

Taking square roots or the logarithm of the variables
Create a new input variables such as the age squared and added to the linear regression model to fit a quadratic relationship between an input variable and the output


What is N fold cross validation

Common practice to randomly split the entire dataset into training set and a testing set


What occurs in N fold cross validation

The entire dataset is randomly split into N data sets of approximately equal size
A model is trained against N -1 of these dataset and tested against the remaining dataset. A measure of the model area is obtained.
This process is repeated a total of eight times across the various combinations of any data sets taken N -1 at a time
The observed n model errors or averaged over the n folds


What are outliers

They can result from bad data collection, data processing errors, or an actual rare occurrence


What is the impact of logistic regression

Continuous or discrete variables


What is the output of logistic regression

Coefficients that indicate the impact of each driver


What are the use cases for logistic regression

Medical in the way you measure the likelihood of A patient response to treatment
Finance to determine the probability then after we default on the loan
Marketing to determine if the customer will switch carriers
Engineering the probability of a mechanical part experience a malfunction


Logistical progression as the value of wine increases what happens the probability

The probability of the outcome occurring increases


What is MLE

Maximum likelihood estimation and its use to estimate the model parameters


In logistical aggression what is null deviance

Is the value where the likelihood function is based only on the intercept term


What is the residual deviance in logistic regression

The value where the likelihood function is based on the parameters in the specified logistic model


What is pseudo-r squared

A measure of how well the fitted model explains the data as compared to the default model of no predictor variables and only and intercept term


If the pseudo R squared value is near one what does that indicate

A good fit over the simple null model


How is logistic regression used as a classifier

To assign class labels to a person, item, or transaction based on the predicted probability provided by the model