(R10) Sampling and Estimation Flashcards

1
Q

Define Simple Random Sampling and provide two methods

A

Each element has an equal probability of being chosen; 1) random number generate or 2) select every kth element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sampling Distribution

A

The distribution of all distinct possible values that a statistic can assume when computed from samples of the same size randomly drawn from the same population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sampling error

A

The difference between the observed value of a statistic and the quantity it is intended to estimate (Sample mean - population mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Stratified Random Sampling

A
  • Separate the population into smaller groups based on one more distinguishing characteristics; then use simple random sampling
  • provides more precise mean and variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Three Data Types

A
  1. Time Series
  2. Cross-Sectional
  3. Panel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Time Series Data

A

Take a variable or multiple variables and observe how the variables change over a period of time
i.e. Monthly returns on Microsoft stock from Jan 1994 to Dec 2004.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cross-Sectional Data

A

Multiple observational units at a point in time

i.e Sales for 30 different companies for a particular quarter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Longitudinal Data

A

Observations over time of multiple characteristics of the same entity, such as unemployment, inflation anf GDP growth rates, for a country over 10 years.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Panel Data

A

Time series + cross sectional. Data that contains observations over time of the same characteristic for multiple entities, such as debt/equity ratios for 20 companies over 24 quarters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard Error Formula

A

Standard deviation divided by square root of n; the standard deviation of the distribution of the sample means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Central Limit Theorem

A

Theorem that states for simple random samples of size n, from a population with a mean u, and a finite variance, sigma^2, the sampling distribution of the sample mean, Xbar, approaches a normal probability distribution with mean u, and a variance equal to sigma^2 / N as the sample size becomes large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Properties of CLT

A
  • If the sample size, n, is sufficiently large (n>= 30), the sampling distribution of the sample means will be approximately normal.
    • The mean of the population, u, and the mean of the distribution of all possible sample means are equal.
  • **The variance of the distribution of sample means is sigma^2 /N. the population variance divided by the sample size.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Desired Properties of an Estimator

A
  1. Unbiasedness - when the expected value of the estimator is equal to the parameter you are trying to estimate.
  2. Efficient – if the variance of its sampling distribution is smaller than all the other unbiased estimators of the parameter you are trying to estimate.
  3. Consistent - the accuracy of the parameter estimate increases as the sample size increases.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Point Estimates

A

Sample mean and sample variance are point estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Confidence Interval Formula

A

Point estimate +/- reliability factor * standard error

C.I. = Xbar + z * (sigma / n^(1/2))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Distribution with known variance, which table should be used to create confidence interval?

A

Use Z score

17
Q

Distribution with unknown variance, which table should be used to create confidence interval?

A

Use t score if sample is less than 30; use t or z score if sample is greater than 30

18
Q

Level of Significance

A

How confident your estimate is, denoted by alpha

19
Q

Characteristics of T-Distribution

A
  1. Centered at Zero
  2. Flatter than a normal distribution
  3. As df increases, shape becomes more spiked and tails become thinner.
  4. t-test levels of significance only correspond to one tail probabilities
20
Q

Confidence intervals are affected by:

A
  • z score or t score
  • alpha - level of confidence
  • n - number of samples
21
Q

Data mining bias

A

Bias that refers to results where the statistical significance of the pattern is overestimated because the results were found through data-mining (the practice of hitting a data set over and over again until you hit gold)

22
Q

Sample selection bias

A

Bias which occurs when some data is systematically excluded from the analysis, usually because of the lack of availability (survivorship bias in mutual funds)

23
Q

Look ahead basis

A

Occurs when a study tests a relationship using sample data that was not available on the test date (i.e. stock price/returns vs. accounting data)

24
Q

Time period basis

A

Results only apply for that specific time period

25
Q

Unbiased estimator

A

When the expected value of the estimator is equal to the parameter you are trying to estimate.

26
Q

Efficient Estimator

A

If the variance of its sampling distribution is smaller than all the other unbiased estimators of the parameter you are trying to estimate.

27
Q

Consistent Estimator

A

The accuracy of the parameter estimate increases as the sample size increases.