The Interpolating Spline

-line passing through all data points -so minimises: Σ [yi - μi^]²

Smoothing Spline Definition

-given data: D = { (ti,yi), i=1,...,n} -with model: yi = f(ti) + εi , εi~N(0,σ²)

-where f(t) is smooth -given knot positions ti, i=1,...,n we can estimate f with smoothing spline fλ^(t) calculated using the matrix solution to the smoothing spline

Methods for Chossing the Optimal Lambda

1) training and test

2) cross-validation / 'leave-one-out'

3) generalised cross-validation

Training and Test

1) partition indices I={1,...,n) into subsets I1 & I2 such that I1⋃I2=S

-this gives a training set: D1 = { (ti,yi), i∈I1} -and test set: D2 = { (ti,yi), i∈I2}

2) fit smoothing spline to D1 to find fλ,I1 for some specified λ

3) calculate the goodness of fit to D2: QI1:I2(λ) = Σ [yi - fλ,I1^(ti)]²

-sum over i in I2 and choose λ to minimise QI1:I2(λ)

Cross Validation / Leave One Out

-same as the training and test set but use only one observation in the test set:

--training set: D1 = D-j = {(ti,yi), i∈I-j}

--test set: D2 = (tj,yj) for given j

-then calculate:

Q-j:j(λ) = [yj - fλ,-j^(tj)]²

-repeat for each j∈{1,...,n} then average to form the ordinary cross-validation criterion:

Qovc(λ) = 1/n Σ [yj - fλ,-j^(tj)]²

Disadvantage of Cross Validation / Leave One Out

-very computationally intensive

-can write in terms of the smooting matrix instead

Matrix Form of the Smoothing Spline Description

-for given λ and index ν≥1 the fitted value fλ^(tk) at each knot tk may be written as a linear combination of observation y1,...,yn

Matrix Form of the Smoothing Spline Coefficients

-for a smoothing spline which minimises the penalised sum of squares for given λ has coefficients a^ and b^:

[a^ b^]^t = {Mλ}^(-1) [y 0]^t

Matrix Form of the Smoothing Spline f

f = [f1 ... fn]^t = K b^ + L a^ = [K L] [Mλ(11) Mλ(21)]^t y

Matrix Form of the Smoothing Spline Smoothing Matrix

-can show:

S = Sλ = [K L] [Mλ(11) Mλ(21)]^t

=>

f = S y

-where S, the smoothing matrix, is a symmetric, positive definite matrix

Cross Validation / Leave One Out Smoothing Matrix

-to speed up cross-validation, Qocv can be computed directly from applying spline, fλ^ fitted to the full dataset:

Qocv(λ) = 1/n Σ [(yj - fλ^(tj))/(1-sjj)]²

-where fλ^ is the full data fitted spline at tj and sjj is the jth diagonal element of Sλ

Generalised Cross-Validation

-a computationally efficient approximation to cross-validation

-replaces sjj with the average of the diagonal elements of Sλ:

Qgcv(λ) = 1/n Σ [(yj - fλ^(tj)) / (1 - 1/n trace(Sλ)]²

-this is the optimal smoothing method used in the mgcv package in R

How mang degrees of freedom are in a smoothing spline? Outline

-there are (n+ν) parameters in (b_.a_) but not all are completely free

How mang degrees of freedom are in a smoothing spline? λ -> ∞

-smoothing spline f^(t) becomes the least squares regression solution for model formula y~1 when ν=1, OR y~1+t when ν=2

How mang degrees of freedom are in a smoothing spline? λ -> 0

-number of degrees of freedom becomes n, since smoothing spline f^(t) becomes the interolating spline when λ=0

Ordinary Least Squares Regression Fitted Values

y^ = X [X^t X]^(-1) X^t y

Ordinary Least Squares Regression Hat Matrix

y^ = H y

-where H, the hat matrix, linearly maps data y onyo fitted values y^:

H = X [X^t X]^(-1) X^t

Ordinary Least Squares Regression Hat Matrix & DoF

-for ordinary least squares regression:

trace(H) = p

-the trace of the hat matrix is equal to the number of model parameters (the number of degrees of freedom

Smoothing Matrix Hat Matrix

-for the smoothing spline, the smoothing matrix takes on the role of the hat matix

-it linearly maps the data onto the fitted values

Smoothing Matrix Effective Degrees of Freedom

edf_λ = trace(Sλ)

-can show that:

edf_∞ = ν edf_0 = n

Penalised Sum of Squares

Rλ(f) = Σ[yi - f(ti)]² + λ J(f)

-sum from i=1 to i=n

When can the penalised sum of squares be used?

-the penalised sum of squares is fine for Gaussian data BUT for non-Gaussian or non-identity link functions this needs to be replaces with the penalised deviance

Penalised Deviance

Definition

Rλ(f,β) = D(y,f,β) + λ J(f)

-where D is the deviance for the vector y of observations modelled by a linear predictor comprising of spline function, f, of order ν (& possible covariate main effects and interactions, β)

-penalised deviance is then minimised with respect to spline coefficients b and a (& regression parameters β, if any)

Penalised Deviance Roughness Penalty

-when there are several smooth terms of order ν in models f1,...,fm each may be assigned its own roughness penalty:

Rλ1,..,λm(y,f1,...,fm,β) = D(y,f1,...,fm,β) + Σ λn J(fn)

-sum form n =1 to n=m

-or the same one can be used for all of them

Penalised Sum of Square Residuals

Rλ(f) = Σ [yi - f(ti)]² + λ J(f)

-where the first term is the sum of square residuals, sum from i=1 to i=n

-and λ≥0 is the smoothness parameter

-and J(f) is the roughness penalty

Which spline minimises the penalised sum of squares?

-f^, the function that minimises Rλ(f), is a natural spline:

f^(t) = Σ bi^ |t-ti|^p + {ao^ if ν=1 OR (ao^+a1^ t) if ν=2}

-where p=(2ν-1)

-and IF ν=1 then Σ bi^ = 0

-or IF ν=2 then Σ bi^ = Σ t*bi^ = 0

Penalised Sum of Squares

λ->0

-f^ is rougher and converges to the interpolating spline

Penalised Sum of Squares

λ->∞

-f^ is smoother, regardless of where the points are, f^ becomes a straight line