Flashcards in Semester 1 Deck (52)
What is a random variable?
A random variable is a variable whose numerical value is determined by chance, the outcome of a random phenomenon
What is the difference between a discrete random variable and a continuous random variable?
- Discrete random variable has a countable number of possible values
- Continuous random variable can take on any value in an interval e.g. time
What is a standardised variable?
A standardised variable measures how many standard deviations X is above or below the mean. Standardised random variables always have a mean of 0 and a standard variation of 1
What does the central limit theorem state?
The central limit theorem states that if Z is a standardised sum of N independent, identically distributed (discrete or continuous) random variables with a finite, non-zero standard deviation, the probability distribution of Z approaches the normal distribution as N increases
What is statistical inference?
Using a sample to draw conclusions about the characteristics of the population from which it came from.
What is a biased sample?
A sample that differs systematically from the population that it is intended to represent.
What is selection bias?
When a sample is biased because the selection of the sample systematically excludes or underrepresents certain groups. It often happens when we use a convenience sample.
What is a retrospective study?
A study that looks at past data for a contemporaneously selected sample. They may suffer from survivor bias: when we have to exclude members of the past population who are no longer around, by default. e.g. an examination of medical records of 65 year olds
What is a prospective study?
A study that selects a sample and then tracks members over time. They may suffer from non-response bias: the systematic refusal of some groups to participate in the experiment.
What is a simple random sample?
A sample of size N taken from a given population in which each member of the population is equally likely to be included in the sample and every possible sample of size N from the population has an equal chance of being selected.
What is a parameter?
A characteristic of the population whose value is unknown but can be estimated
What is an estimator?
A sample statistic that will be used to estimate the value of the population parameter
What is sampling variation?
The notion that because samples are chosen randomly, the sample average will vary from sample to sample around the mean
What is a sampling distribution?
The probability distribution that describes the population of all possible values of this statistic. Even if the population does not have a normal distribution, the sampling distribution of the sample mean will approach the normal distribution as the sample size increases
What is an unbiased estimator?
A sample statistic is an unbiased estimator of a population parameter if the mean of the sampling distribution of this statistic is equal to the value of the population parameter. We can gauge the accuracy of the estimator by examining the size of its standard deviation.
What is the t-distribution?
The sampling distribution of the variable that is created when the mean of a sample from a normal distribution is standardised using its standard error. The exact distribution of t depends on its sample size.
What is degrees of freedom?
The number of observations in the data that are free to vary when estimating statistical parameters.
Degrees of freedom = #observations - #estimated parameters
What is a confidence interval?
A confidence interval measures the reliability of a given statistic. It gives us a range to which we can say with a certain % confidence the true value of the population parameter lies.
What is econometrics?
The quantitative measurement and analysis of actual economic and business phenomena. It is used to describe economic reality, test hypotheses about economic theory and forecast future economic theory.
What is regression analysis?
A statistical technique that attempts to explain movements in one variable (dependent) as a function of movements in a set of other variables (independent) through the quantification of a single equation
What is B0?
The intercept term also known as the constant. It is the value of Y if all other known independent variables are equal to 0
What is B1?
The slope coefficient. The amount that Y will increase by when X increases by 1 unit holding all else constant.
What are potential sources of variation in Y?
1. Other potentially important explanatory variables
2. Measurement error
3. Incorrect functional form
4. Purely random and unpredictable occurrences
What is the stochastic error term?
The stochastic error term encompasses all other sources of variation in Y that are not captured by the model
What is the deterministic component of a regression equation?
B0 + B1X ect. The expected value of Y given X. Also known as the conditional expectation: the expectation of Y given X
How do we better the fit of a regression equation?
The smaller the estimated error term e (also known as the residual) the closer Y is to the observed value of Y so the better the fit
What is OLS?
Ordinary least squares is an estimator that minimises the sum of the squared residuals/deviations of the vertical distance between the actual observed data and the estimated regression line
What is the decomposition of variance?
The variation of Y around its mean (the TSS) can be decomposed into 2 parts: ESS (estimated value - mean) and RSS (actual value - estimated value)
What is the coefficient of determination?
R^2 is the proportion of the variance in Y that can be explained by the model