Smoothing Parameter Flashcards Preview

MATH5824M Generalised Linear and Additive Models > Smoothing Parameter > Flashcards

Flashcards in Smoothing Parameter Deck (28)
Loading flashcards...
1
Q

The Interpolating Spline

A

-line passing through all data points -so minimises: Σ [yi - μi^]²

2
Q

Smoothing Spline Definition

A
  • given data: D = { (ti,yi), i=1,…,n} -with model: yi = f(ti) + εi , εi~N(0,σ²)
  • where f(t) is smooth -given knot positions ti, i=1,…,n we can estimate f with smoothing spline fλ^(t) calculated using the matrix solution to the smoothing spline
3
Q

Methods for Chossing the Optimal Lambda

A

1) training and test
2) cross-validation / ‘leave-one-out’
3) generalised cross-validation

4
Q

Training and Test

A

1) partition indices I={1,…,n) into subsets I1 & I2 such that I1⋃I2=S
- this gives a training set: D1 = { (ti,yi), i∈I1} -and test set: D2 = { (ti,yi), i∈I2}
2) fit smoothing spline to D1 to find fλ,I1 for some specified λ
3) calculate the goodness of fit to D2: QI1:I2(λ) = Σ [yi - fλ,I1^(ti)]²
- sum over i in I2 and choose λ to minimise QI1:I2(λ)

5
Q

Cross Validation / Leave One Out

A

-same as the training and test set but use only one observation in the test set:

–training set: D1 = D-j = {(ti,yi), i∈I-j}

–test set: D2 = (tj,yj) for given j

-then calculate:

Q-j:j(λ) = [yj - fλ,-j^(tj)]²

-repeat for each j∈{1,…,n} then average to form the ordinary cross-validation criterion:

Qovc(λ) = 1/n Σ [yj - fλ,-j^(tj)]²

6
Q

Disadvantage of Cross Validation / Leave One Out

A
  • very computationally intensive
  • can write in terms of the smooting matrix instead
7
Q

Matrix Form of the Smoothing Spline Description

A

-for given λ and index ν≥1 the fitted value fλ^(tk) at each knot tk may be written as a linear combination of observation y1,…,yn

8
Q

Matrix Form of the Smoothing Spline Coefficients

A

-for a smoothing spline which minimises the penalised sum of squares for given λ has coefficients a^ and b^:

[a^ b^]^t = {Mλ}^(-1) [y 0]^t

9
Q

Matrix Form of the Smoothing Spline f

A

f = [f1 … fn]^t = K b^ + L a^ = [K L] [Mλ(11) Mλ(21)]^t y

10
Q

Matrix Form of the Smoothing Spline Smoothing Matrix

A

-can show:

S = Sλ = [K L] [Mλ(11) Mλ(21)]^t

=>

f = S y

-where S, the smoothing matrix, is a symmetric, positive definite matrix

11
Q

Cross Validation / Leave One Out Smoothing Matrix

A

-to speed up cross-validation, Qocv can be computed directly from applying spline, fλ^ fitted to the full dataset:

Qocv(λ) = 1/n Σ [(yj - fλ^(tj))/(1-sjj)]²

-where fλ^ is the full data fitted spline at tj and sjj is the jth diagonal element of Sλ

12
Q

Generalised Cross-Validation

A
  • a computationally efficient approximation to cross-validation
  • replaces sjj with the average of the diagonal elements of Sλ:

Qgcv(λ) = 1/n Σ [(yj - fλ^(tj)) / (1 - 1/n trace(Sλ)]²

-this is the optimal smoothing method used in the mgcv package in R

13
Q

How mang degrees of freedom are in a smoothing spline? Outline

A

-there are (n+ν) parameters in (b_.a_) but not all are completely free

14
Q

How mang degrees of freedom are in a smoothing spline? λ -> ∞

A

-smoothing spline f^(t) becomes the least squares regression solution for model formula y~1 when ν=1, OR y~1+t when ν=2

15
Q

How mang degrees of freedom are in a smoothing spline? λ -> 0

A

-number of degrees of freedom becomes n, since smoothing spline f^(t) becomes the interolating spline when λ=0

16
Q

Ordinary Least Squares Regression Fitted Values

A

y^ = X [X^t X]^(-1) X^t y

17
Q

Ordinary Least Squares Regression Hat Matrix

A

y^ = H y

-where H, the hat matrix, linearly maps data y onyo fitted values y^:

H = X [X^t X]^(-1) X^t

18
Q

Ordinary Least Squares Regression Hat Matrix & DoF

A

-for ordinary least squares regression:

trace(H) = p

-the trace of the hat matrix is equal to the number of model parameters (the number of degrees of freedom

19
Q

Smoothing Matrix Hat Matrix

A
  • for the smoothing spline, the smoothing matrix takes on the role of the hat matix
  • it linearly maps the data onto the fitted values
20
Q

Smoothing Matrix Effective Degrees of Freedom

A

edf_λ = trace(Sλ)

-can show that:

edf_∞ = ν edf_0 = n

21
Q

Penalised Sum of Squares

A

Rλ(f) = Σ[yi - f(ti)]² + λ J(f)

-sum from i=1 to i=n

22
Q

When can the penalised sum of squares be used?

A

-the penalised sum of squares is fine for Gaussian data BUT for non-Gaussian or non-identity link functions this needs to be replaces with the penalised deviance

23
Q

Penalised Deviance

Definition

A

Rλ(f,β) = D(y,f,β) + λ J(f)

  • where D is the deviance for the vector y of observations modelled by a linear predictor comprising of spline function, f, of order ν (& possible covariate main effects and interactions, β)
  • penalised deviance is then minimised with respect to spline coefficients b and a (& regression parameters β, if any)
24
Q

Penalised Deviance Roughness Penalty

A

-when there are several smooth terms of order ν in models f1,…,fm each may be assigned its own roughness penalty:

Rλ1,..,λm(y,f1,…,fm,β) = D(y,f1,…,fm,β) + Σ λn J(fn)

  • sum form n =1 to n=m
  • or the same one can be used for all of them
25
Q

Penalised Sum of Square Residuals

A

Rλ(f) = Σ [yi - f(ti)]² + λ J(f)

  • where the first term is the sum of square residuals, sum from i=1 to i=n
  • and λ≥0 is the smoothness parameter
  • and J(f) is the roughness penalty
26
Q

Which spline minimises the penalised sum of squares?

A

-f^, the function that minimises Rλ(f), is a natural spline:

f^(t) = Σ bi^ |t-ti|^p + {ao^ if ν=1 OR (ao^+a1^ t) if ν=2}

  • where p=(2ν-1)
  • and IF ν=1 then Σ bi^ = 0
  • or IF ν=2 then Σ bi^ = Σ t*bi^ = 0
27
Q

Penalised Sum of Squares

λ->0

A

-f^ is rougher and converges to the interpolating spline

28
Q

Penalised Sum of Squares

λ->∞

A

-f^ is smoother, regardless of where the points are, f^ becomes a straight line