Misc econometrics Flashcards

1
Q

p-value

A

(In statistical hypothesis testing) The probability of obtaining test results at least as extreme as the results observed, assuming that the null hypothesis is correct.

In other words, The p-value is the largest significance level at which we could carry out our test and still fail to reject H₀

In still other words, the probability associated with our calculated test statistic (Z-statistic corresponding to our observed value (and the distribution assuming H₀ is true))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Z-statistic

A

Z-statistic (or Z-score or Standard score) is a number representing how many standard deviations an observed value (raw score) is away from the mean (of what is being observed).

Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

The Z-statistic is distributed normally with mean = 0 and variance = 1 (ie, it has a Standard Normal Distribution)

Z ~ N(0, 1)

So Z = (Observed Sample Value - Assumed Population Mean) / Standard Deviation of Sample Distribution

(Note: in hypothesis tests, the observed value is often the mean observed from our sample - in other words, we are testing whether the mean. The Z-statistic may also be used to estimate the probability that X could take a certain value (the observed value, x), given the assumed population mean value.)

(Note: Calculating z using this formula requires the population mean and the population standard deviation, not the sample mean or sample deviation. But knowing the true mean and standard deviation of a population is often unrealistic except in cases such as standardized testing, where the entire population is measured.

When the population mean and the population standard deviation are unknown, the standard score may be calculated using the sample mean and sample standard deviation as estimates of the population values.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Z-statistic (conversion)

A

If X ~ N (μ, σ²) then

Z = (𝐗−𝛍)/𝛔 ~ N(𝟎, 𝟏)

In other words: For any continuous and randomly distributed variable X with mean μ and variance σ² (X ~ N (μ, σ²)), all probabilities can be converted to the Standard Normal Distribution using the Z Normal (0, 1) transformation: Z = (𝐗−𝛍)/𝛔

The Z-statistic is distributed normally with mean = 0 and variance = 1 (ie, it has a Standard Normal Distribution)

Therefore the Z Normal transformation, Z = (𝐗−𝛍)/𝛔 converts the variable X into a Standard Normal distribution. We can thus use standard normal tables to find relevant probabilities for P(X ≤ x).

Note: the Z Normal (0, 1) transformation is so called because it serves to transform the distribution of X to a normal distribution centered on 0 with a Variance of 1, by way of shifting the normal distribution leftward by μ units (aligning mean with x-axis) and compressing(/stretching if σ²<1) horizontally by σ units (setting σ²=1)

Thus, Z ~ N(0, 1)

So, to convert any value of X to its corresponding Z value, subtract the value of the mean and divide by the standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Z-statistic (hypothesis tests)

A

is a number representing how many (of the sample distribution’s) standard deviations the observed (sample) value is away from the assumed (population) mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Random Sample

A

A sample of n observations of a RV Y, denoted Y₁, Y₂, …, Yₙ is said to be a random sample if the n observations are drawn independently from the same population and each element in the population is equally as likely to be selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Random Sample as ‘A set of Independently and Identically Distributed (IID) RVs’

A

We describe such a sample as being a set of Independent and Identically distributed (IID) Random Variables (RVs)

So, if a random sample of n elements is taken,
the sample elements constitute a set of IID RVs, Y₁, Y₂, …, Yₙ, each of which have the same PDF as that of Y

The random nature of Y₁, Y₂, …, Yₙ reflects the fact that many different outcomes are possible before the sampling is actually carried out (ie, each element of the sample is a (IID) RV (with the same PDF as Y (population)) BECAUSE they are randomly (and independently) selected from the population, meaning that each element from the sample follows a PDF identical to the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sample data

A

Once the sample is obtained, we have a set of numbers, say y₁, y₂, …, yₙ which constitute the data we work with.

This are different types of data:
• Cross-sectional data
• Time-series data
• Panel data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sample Statistics

A

A sample statistic is any quantity computed from values in a sample that is used for a statistical purposes.

(Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypothesis)

The two most often used sample statistics are the sample mean, denoted by Y̅, and the sample variance, denoted by S².

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sampling Distribution

A

A sample statistic (eg the sample mean) will have its own probability distribution called the sampling distribution.

Since each observation in a random sample is itself a RV, then any statistics calculated from a sample, called a sample statistic, is also a RV.

And since the sample statistic is an RVs, it will have its own probability distribution

The sampling distribution reflects the fact that a random sample (of size n) drawn from the population could materialise into a range of different manifestations, each with a corresponding probability. It is this probability distribution (that contains the information) of the all the possible samples that we could draw of size n from the population, that we call the sampling distribution (and we will see shortly that it distributes normally with mean μ and variance σ²/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Population > Sampling Distributions

A

So, to tie sampling distributions in with their wider context,

  • There is a POPULATION (of size N)
  • Y is a RV representing this population, with a PDF
  • θ is an unknown population parameter (such as the expected value E(Y) or variance V(Y) (or) σ²)
  • Note: these population parameters are unknown, fixed values
  • A random sample (of n observations) of the RV Y is drawn, denoted Y₁, Y₂, …, Yₙ
  • (Once the sample is obtained, we have a set of numbers, say y₁, y₂, …, yₙ, which constitute the data we work with)
  • Each Yᵢ has a PDF (identical to the PDF of Y)
  • From the sample we can calculate sample statistics
  • (Two sample statistics of interest: sample mean, Y̅ and the sample variance, S²)
  • Note: these sample statistics are RVs, with their own probability distribution, the SAMPLING DISTRIBUTIONS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The sampling distribution of the sample mean (Y̅)

A

Suppose Y ~ N (μ, σ²) and we have an IID sample of n observations from it: {Y₁, Y₂, …, Yₙ},

Then we say that Yᵢ ~ IIDN (μ, σ²)

In other words, each element of the sample is a RV with the same PDF as Y.

From these observations we can calculate the
sample mean, Y̅, as: Y̅ = 1/n Σ (Yᵢ)

Since Y̅ is a RV itself, it has a probability distribution.

It turns out that the sampling distribution of the sample mean is: Y̅ ~ N (μ, σ²/n)

(We’ll break this down in the next three cards)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The mean (or expected value) of the sampling distribution of Y̅

A

The mean of the sampling distribution of Y̅ is defined as:

E[Y̅] = μ

Interpretation:
If a sample of n random and independent observations are repeatedly and independently drawn from a population, then as the number of samples becomes very large (approaches infinity), the mean of the sample mean (Y̅) approaches the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The variance of the sampling distribution of Y̅

A

The (population) variance of the sampling distribution of Y̅ is defined as:

V[Y̅] = σ²/n

Interpretation:
As the sample size (n) increases, the variance of Y̅ decreases. So the sampling distribution of the sample mean will have lower variance the larger the sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The Sampling Distribution (of Y̅ ~ )

A

Thus, if we assume that the samples are taken from a normal RV, Y, we can deduce that:

Y̅ ~ N (μ, σ²/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Standardisation of Y̅

A

We can compute the standard normal for Y̅ to calculate probabilities:

Z = [ ( Y̅ - μ ) / ( σ/√n ) ] ~ N(0, 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The Central Limit Theorem

A

What about the shape of the sampling distribution of Y̅ if the population from which it is constructed is not normally distributed?

Use the Central Limit Theorem (CLT): as sample size gets large enough, the sampling distribution of Y̅ can be approximated by the normal distribution even if population itself is not normal.

Therefore, given the CLT, we can apply rules about normal distribution to the sampling distribution of the sample mean even when the population is not distributed normally

17
Q

Population variance/sample variance unknown

A

It thus follows that we can make inferences about the population mean based on the sample mean using the standard normal distribution (Z-statistic)
Z = [ ( Y̅ - μ ) / ( σ/√n ) ] ~ N(0, 1)

However, notice how the distribution of the sample mean depends on the population mean (= sample mean) but also on the population variance (divided by the sample size).

It is quite likely that we will not know the population variance

If this is the case we can use the sample variance, S², as an approximation, and it can be shown that:

T = [ ( Y̅ - μ ) / ( S/√n ) ] ~ t (n - 1)

Thus, we can use the sample variance and the tables from the t distribution to make inferences when σ² is unknown.

18
Q

Estimator

A

A sample statistic which is constructed to provide information about the unknown population parameters of a probability distribution is called an estimator and we denote it by θ ̂ (thetaHat)

(To place in context:)

Let Y be a RV representing a population with a PDF, f (y; θ), which depends on the unknown population parameter θ

Example: if Y ~ N ( μ, σ²); then = θ = ( μ; σ²)

Note: we will generally assume that there is only one parameter

If we can obtain certain random samples, then we can learn something about θ

(Refer to first point)

So a sample statistic is an estimator, and so the probability distribution of the estimator is the sampling distribution

19
Q

Estimator as a rule

A

More generally, an estimator θ ̂ (thetahat) of a population parameter θ can be expressed as a mathematical formula (rule):

θ ̂ = g(Y₁, Y₂, … , Yₙ)

In other words, regardless of the outcome of the RVs (the sample that happens to be drawn from the population), we apply this same rule to estimate the population parameter

An estimator of θ is a rule that assigns each possible outcome of the sample a value of θ
(remember any sample drawn is one manifestation of the many possible samples that could have been drawn from the population (with corresponding probabilities))

For example: a natural estimator of µ (population mean) is Y̅ (sample mean)
where Y̅ = (1/n) Σ Yᵢ

  • Given any outcome of the RVs {Y₁, Y₂, … , Yₙ } (ie: the sample drawn) the rule to estimate the population mean is the same: we simply take the average of {Y₁, Y₂, … , Yₙ }
  • For a particular outcome of the RVs {y₁, y₂, …, yₙ }, the estimator is just the average in the sample y̅ = (1/n) Σ yᵢ
20
Q

Quality of Estimate vs Quality of Estimator

A

Question:
Suppose that we want to estimate the average salary of university graduates in the UK. Suppose that we take one sample from the population and use the sample mean to estimate the average population salary. Suppose that we find that the sample mean is y̅ = £15,000. How close is this value (estimate) to the true population mean, µ?

Answer:
We don’t know, as µ is unknown!

➢ Instead of asking about the quality of the estimate, we should ask about the quality of the estimation procedure or estimator!
➢ ie How good is the sample mean as an estimator of the population mean?

➢ What are some (desirable) properties that an estimator may (or may not) possess?

Such properties, are most often divided into:
• small sample (or finite) properties - desirable properties for when the sample size is finite
• large sample (or asymptotic) properties - desirable properties for when the sample size becomes infinite

We will briefly consider the two main properties for estimators of ‘Finite or Small Samples’:

1) Unbiasedness
2) Minimum variance

21
Q

Unbiasedness

A

An estimator is unbiased if:
E[θ ̂ ] = θ

So if the mean of the sampling distribution of the estimator (which reflects all the different possible values that the sample statistic could assume when the estimating procedure is applied to whatever sample happens to be drawn, with corresponding probabilities) is equal to the population mean, then the estimator is unbiased.

In other words, if you take independently draw a large number of random samples from the population, computing the sample statistic for each, and then find the mean of these sample statistics, for an unbiased estimator this will be equal to the population mean.

(Part 1; topic 5 shows really clear graph to demonstrate)

22
Q

Minimum Variance Unbiased Estimator

A

Consider the set of all possible unbiased estimators for θ, which we will label θ ̂₁, θ ̂₂, …, θ ̂ₖ. One of these θ ̂ⱼ is said to be the Minimum, Variance Unbiased Estimator if:

V( θ ̂ⱼ ) < V( θ ̂ₖ )

for i = 1, … , k and i ≠ k

(Part 1; topic 5 shows really clear graph to demonstrate)

23
Q

Efficient

A

If an estimator is unbiased AND minimum variance, we say it is efficient (or the best)

24
Q

How to construct estimators with good properties for unknown parameters?

A

There are various approaches based on observed samples. Three common methods are:
• Least Squares
• Method of Moments
• Maximum Likelihood

➢ In this course we will focus on the Least Squares Estimation.

25
Q

Hypothesis testing

A

See LUBS2570; Part 1; topic 6 for really good notes

26
Q

Critical value

A

The Z-score that cuts the distribution off at the significance level; the z-score that corresponds to the significance level of the test

27
Q

Test statistic

A

The Z-statistic associated to the observed sample mean and assuming H₀ is correct

H₀ can be accepted or rejected solely by comparing the critical value with the test statistic

28
Q

Decision rule can be based on one of 2 methods:

A

1) Reject H₀ if the calculated test statistic < critical value:

z < z꜀

2) Reject H₀ if the p-value associated with the test statistic is less than the significance level:

p_value < α

29
Q

Hypothesis testing when σ is unknown

A

When

a) the population variance σ is unknown AND
b) the sample size is small

the t distribution must be used rather than the normal Z distribution, so a t-test statistics should be conducted instead of a Z test.

➢ The hypothesis tests with small samples and unknown population variances is similar as before, but now we need to consult the t distribution to obtain the critical values.

➢ For large samples t is typically not required; the Z-test can be used instead.

(Recall that if n is large, the t distribution approaches the standard normal distribution (diagram in LecNotes))

30
Q

Confidence Interval

A

There are two ways in which an estimate of a population parameter (using random samples) can be presented:

  1. As a Point Estimate: a single value is used to estimate an unknown population parameter (like how we have seen that we can use the sample mean to estimate the population mean)
  2. As an Interval Estimate or Confidence Interval: a range of values is used to estimate an unknown population parameter. This range of values is probably where the population parameter lies.

(Part 1; Topic 7 explains confidence intervals of population mean µ (when σ is known), the general random interval estimator, interpretations, and confidence intervals of population mean µ when σ is not known)

31
Q

Variance formula

A

σ² = Σ(xᵢ - μ)² / N

σ² = population variance
μ = population mean
N = population size
(xᵢ = value of ith element)

The variance (σ²), is defined as the sum of the squared distances (squared to make all distances positive, and so not cancel each other out) of each term in the distribution from the mean (μ), divided by the number of terms in the distribution (N); the average distance squared of a value in the distribution from its mean.

32
Q

Sample Variance formula

A

S² = Σ(xᵢ - x̅)² / n-1

S² = sample variance
x̅ = sample mean
n = sample size
(xᵢ = value of ith element)

https://www.onlinemathlearning.com/variance.html

33
Q

Why n-1 in sample variance?

A

‘ll have a go at explaining the intuition between the “N” and “n-1”:

Think of the whole equation as the average amount of variation. If this is truly what the equation is measuring then it should be (total amount of variation)/(number of things that can vary). Since the average i.e. mean is always Total/(Number of things).

Look at the numerator and the denominator in the sample variance equation. Is the following true?

  • The numerator is a measure of the total amount of variation
  • The denominator is the amount of things that are able to vary.

Yes. Why!? I mean surely there are N things that can vary about xbar i.e. the sample mean. Well actually no there aren’t. There are N things that can vary about the population mean but only N-1 that can vary about the sample mean. Here’s an example of why this is so:

Say you have 3 data points.

  • You calculate the sample mean and it comes out to be 2.
  • The first data point could be anything, let’s say it is 1.
  • The second data point could be anything, let’s say it is 3.
  • What can they second data point be? It absolutely MUST be 2. It is not free to vary - the sum of the three scores must be 6 or else the sample mean is not 2.

Knowing n-1 scores and the sample mean uniquely determines the last score so it is NOT free to vary. This is why we only have “n-1” things that can vary. So the average variation is (total variation)/(n-1).

Does this also has a connection with the degrees of freedom?

Yes. The reason n-1 is used is because that is the number of degrees of freedom in the sample. The sum of each value in a sample minus the mean must equal 0, so if you know what all the values except one are, you can calculate the value of the final one.

34
Q

Covariance

A

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, the covariance is positive.

35
Q

Correlation

A

Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). It’s a common tool for describing simple relationships without making a statement about cause and effect.

More generally, the correlation between two variables is 1 (or –1) if one of them always takes on a value that is given exactly by a linear function of the other with respectively a positive (or negative) slope.

https: //rb.gy/f4yofh
https: //rb.gy/nuwdto

36
Q

Covariance formula

A

Population covariance:
Cov(x,y) = Σ(xᵢ - x̅)(yᵢ - ȳ) / N

Sample covariance
Cov(x,y) = Σ(xᵢ - x̅)(yᵢ - ȳ) / n-1

https://byjus.com/covariance-formula/

Notice how the covariance formula relates to the variance formula. They are essentially the same but for one variable vs for two:

Cov(x,y) = Σ(xᵢ - x̅)(yᵢ - ȳ) / N
Cov(x,x) = Σ(xᵢ - x̅)(xᵢ - x̅) / N
= Σ(xᵢ - x̅)² / N
= Var(x) = σ²

So σ² (pop var) is essentially the covariance of the variable with itself

Another way of presenting the covariance formula is:
Cov (X, Y) = E [(X - μₓ)(Y - μᵧ)]

(This can lead us to the result:)
= E [(X - μₓ)(Y - μᵧ)]
= E [(X - E[X])(Y - E[Y])]
= E [XY - XE[Y] - YE[X] + E[X]E[Y]]
= E[XY] - E[X]E[Y] - E[X]E[Y] + E[X]E[Y]
= E[XY] - E[X]E[Y]

So given that we know that Cov(X,Y) = E[XY] - E[X]E[Y], and that Cov(X,X) = Var(X),
Var(X) = E[X²] - (E[X])²

Note: E is the expected value operator. (In probability theory, the expected value of a random variable is intuitively the arithmetic mean of a large number of independent realizations of X; By definition, the expected value of a constant random variable X = c is c.

37
Q

Correlation formula

A

The correlation coefficient is also known as the Pearson product-moment correlation coefficient, or Pearson’s correlation coefficient. As mentioned earlier, it is obtained by dividing the covariance of the two variables by the product of their standard deviations. Therefore the correlation between two variables is a normalised version of their covariance. The mathematical representation of the same can be shown in the following manner:

ρₓᵧ = Cov(x,y) / σ(X)σ(Y)
ρₓᵧ = σₓᵧ / σₓσᵧ

= [Σ(xᵢ - x̅)(yᵢ - ȳ) / N] / [√{Σ(xᵢ - μₓ)²/ N}√{Σ(yᵢ - μᵧ)²/ N}

(Note that the N denominators (or n-1 denominators in the sample case) cancel out)

https: //rb.gy/f4yofh
https: //rb.gy/nuwdto

38
Q

Difference between covariance and correlation

A
  • In simple words, both the terms measure the relationship and the dependency between two variables
  • “Covariance” indicates the direction of the linear relationship between variables
  • “Correlation” on the other hand measures both the strength AND direction of the linear relationship between two variables
  • Correlation is a function of the covariance
  • What sets them apart is the fact that correlation values are standardized whereas, covariance values are not
  • You can obtain the correlation coefficient of two variables by dividing the covariance of these variables by the product of the standard deviations of the same values
  • If we revisit the definition of Standard Deviation, it essentially measures the absolute variability of a datasets’ distribution
  • When you divide the covariance values by the standard deviation, it essentially scales the value down to a limited range of -1 to +1
  • This is precisely the range of the correlation values.

Also:
• Notably, correlation is dimensionless while covariance is in units obtained by multiplying the units of the two variables
• Although the values of the theoretical covariances and correlations are linked in the above way, the probability distributions of sample estimates of these quantities are not linked in any simple way and they generally need to be treated separately

As we see from the formula of covariance, it assumes the units from the product of the units of the two variables. On the other hand, correlation is dimensionless. It is a unit-free measure of the relationship between variables. This is because we divide the value of covariance by the product of standard deviations which have the same units. The value of covariance is affected by the change in scale of the variables. If all the values of the given variable are multiplied by a constant and all the values of another variable are multiplied, by a similar or different constant, then the value of covariance also changes. However, on doing the same, the value of correlation is not influenced by the change in scale of the values. Another difference between covariance and correlation is the range of values that they can assume. While correlation coefficients lie between -1 and +1, covariance can take any value between -∞ and +∞.

https: //rb.gy/nuwdto
https: //rb.gy/weuttg
https: //rb.gy/7jgsb8

39
Q

Degrees of Freedom

A

Each of a number of independently variable factors affecting the range of states in which a system may exist, in particular any of the directions in which independent motion can occur. (ie. the number of explanatory variables)

In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. The number of independent ways by which a dynamic system can move, without violating any constraint imposed on it, is called number of degrees of freedom.