4.3 Hypothesis Tests - χ²-Test Flashcards

normal approximation to the binomial distribution, χ² test for model fit, χ² test for independence

1
Q

Motivation Behind the χ²-Test

A

-consider a test for attribute data
-assume we have n independent observations of a variate which can take K different values
-to build a model for these observations, we can use i.i.d random variables X1,…,Xn∈{1,..,K} with:
P(Xi=k) = pk
-for all i∈{1,..,n} and k∈{1,…,K}, and probabilities pk satisfy Σpk=1 where the sum is taken from k=1 to k=K
-since the observations are independent, the order does not matter and we only need to consider how often each class occurs, let:
Yk = |{i|Xi=k}| = Σ1{k} (Xi)
-where the sum is from i=1 to i=n for all k∈{1,..,K}
-if the model is correct, then Yk~B(n,pk) for all k∈{1,…,K} but since ΣYk=n (sum over k) the observations are not independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Multinomial Distribution

Definition

A

-the joint distribution of (Y1,…,Yk) is called a multinomial distribution with parameters n and p1,…,pk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Normal Distribution as an Approximation to the Binomial

Overview

A

-for large n, we can approximate the distribution of Yk using a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Normal Distribution as an Approximation to the Binomial

Proof

A

-use the fact that Yk is a sum of n independent random variables 1_{k}(Xi) and thus we can apply the central limit theorem
-the central limit theorem states that for any i.i.d sequence Zi, i∈ℕ or random variables with mean μ=E(Zi) and σ²=Var(Xi) we can conclude:
Y~N(np,np(1-p))
-approximately for large n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Normal Distribution as an Approximation to the Binomial

Summary

A

-for large n, we can approximate a B(n,p) distribution by a N(np,np(1-p))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Normal Distribution as an Approximation to the Binomial

Rule of Thumb

A

-the normal approximation for B(n,p) can be used if np≥5 and n(1-p)≥5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

χ²-Test for Model Fit

Comparing Ho to the Observed Data

A

-assume that we have observed attribute data x1,…,xn∈{1,…,K} and we want to test the hypothesis Ho:P(Xi=k)=pk for all k∈{1,…,K}
-let:
yk = |{i|xi=k}| = Σ1_{k}(xi)
-be the sample count for class k∈{1,…,K}
-if Ho is true, we expect yk≈npk for all k
-so we can use:
c = Σ (yk-n
pk)²/n*pk
-sum over k=1 to K
-as a measure of how far away from Ho the data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

χ²-Test for Model Fit

Lemma

A

-assume Ho is true, let:
C = Σ (Yk-npk)²/npk
-sum from k=1 to k=K
-then C->χ²(K-1) as n->∞

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

χ²-Test for Model Fit

Lemma Proof for K=2

A
-we have:
Y1 + Y2 = n
-and
p1 + p2 = 1
-sub into the formula for C
-take the limit as n tends to infinity remembering to apply the normal approximation to the binomial
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

χ²-Test for Model Fit

Construct the Test for the Null Hypothesis

A

-if we write cn(α) for the (1-α)-quantile of the χ²(n)-distribution, then assuming Ho, we have:
P(C > c_{K-1}(α)) ≈ 1-α
-for large n, and thus we can reject Ho if the observed test statistic c satisfies c>c_K-1 (α)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

χ²-Test for Model Fit

Summary

A

data: x1,…,xn∈{1,…,K}
model: X1,…,Xn∈{1,…,K} i.i.d with P(Xi=k)=pk for all i∈(1,…,n} ,k∈{1,…,K}
test: Ho:pk=πk for all k∈{1,…,K} vs H1:pk≠πk for one k∈{1,…,K}
test statistic: c=Σ (Yk-πk)²/nπk from k=1-K where yk=|{i|xi=k}|=Σ1_{k}(xi)
critical value: c_K-1(α), the (1-α)-quantile of the χ²(K-1)-distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

χ²-Test for Model Fit

Rule of Thumb

A

-the χ²-test can be applied if n*πk≥5 for all k∈{1,…,K}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

χ²-Test for Independence

Purpose

A

-tests whether two categorical variates are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

χ²-Test for Model Fit

Number of Degrees of Freedom

A

K-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

χ²-Test for Independence

Description

A

a) create a table of the two variates
b) estimates the probability for each column
c) use these probabilities to estimate an expected outcome for each cell
d) compute the test statistic using these expected values and the observed values
e) find the critical value using the correct significance level, degrees of freedom = (no. of rows - 1)(no. of col.s - 1)
f) if the test statistic is less than the critical value, we can’t reject the null hypothesis that they are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Type I Errors

A

-reject Ho when Ho is true

17
Q

Type II Errors

A

-accept Ho when Ho is false

18
Q

Which test to use for numerical (quantitative) data?

A
  • for normally distributed data with known variance, use the z-test
  • for normally distributed data with unknown variance, use the t-test
  • for large sample size of any distribution, we can use the z-test
19
Q

Which test to use for attribute (qualitative) data?

A

-chi-squared test

20
Q

When is the hypothesis Ho rejected?

A
  • if the test statistic exceeds a critical value
  • the choice of critical value determines the significance level of the test
  • a test has significance level α if P(rejectHo | Ho is true)≤α