Module 16: Statistical Distributions Flashcards

1
Q

Binomial distribution description

A

Bin(n, p)

Models the number of successes in n independent trials where p is the probability of success.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Negative binomial distribution description

A

NBin(r, p)

  • Models the number of trials needed until there have been r successes.
  • if r=1, the distribution is known as the geometric distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Poisson distribution description

A

Poi(λ)

  • Models the number of independent events occurring in a specified time period
  • used as an approximation to the binomial distribution for small p
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Normal distribution

A
  • mathematically tractable distribution (easy to parameterise and use), useful when little is known about the data
  • used as an approximation to the binomial and Poisson distributions when the sample size is large
  • used to model the error terms in a random walk
  • symmetrical and mesokurtic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Central Limit Theorem

A

by the Central Limit Theorem, the distribution of the average, X_bar, of a large sample of iid random variables with finite mean, μ, and finite variance, σ², is Normally distributed.

~ N ( μ , σ²/n )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

2 Tests for normality

A
  • QQ plots

- Jarque-Bera test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Generalised student’s t-distribution

A
  • Used to model symmetric data sets where the tails are fatter than implied by a normal distribution (leptokurtic) - important distribution for modelling risks
  • Can be derived as a normal mean-variance mixture distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Lognormal distribution

A
  • frequently used to model financial data that takes positive values only, eg asset prices, or insurance claim amounts
  • positively skewed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Wald (or inverse Gaussian) distribution

A
  • models the time taken for a random walk with drift to reach a particular level
  • positively skewed with useful properties in terms of aggregation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Chi-square distribution

A
  • Used for goodness of fit
  • represents the sum of v squared independent standard normal random variables
  • positively skewed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Exponential Distribution

A
  • Models expected time between observations under a Poisson process
  • monotonically decreasing, positively skewed, tail decreases exponentially
  • inflexible due to single parameter and unlikely to provide a good fit to data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Gamma Distribution

A
  • extension of exponential distribution
  • flexible and has useful properties in terms of aggregation
  • if X has a gamma distribution then Y = 1/X has an inverse gamma distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Generalised Inverse Gamma distribution

A
  • can produce a wide range of shapes - flexible as has three parameters
  • monotonically decreasing, positively skewed inflexible as single parameter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pareto distribution

A
  • used for modelling variables where the probability of an event falls in proportion to the magnitude of the event raised to a power, eg the distribution of wealth or the population of cities
  • the tail of the distribution follows the power law
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Generalised Pareto Distribution

A

Flexible distribution used in extreme value theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Triangular Distribution

A

Useful when the following limited data is available:

  • the minimum value
  • the maximum value
  • the mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Multivariate Distribution

A

A way of modelling several random variables at once

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Multivariate Normal Distribution

A

A column vector random variable, X, has a multivariate normal distribution if X = α + CZ where

  • α is a column vector of location parameters (ie means),
  • Z is a k-dimensional vector of iid standard normal random variables and
  • Σ is the covariance matrix and C is a matrix of constants such that CC’ = Σ
19
Q

2 Useful tests for testing whether observations are from a multivariate normal distribution

A
  • Mahalanobis distance

- Mardia’s test, based on the Mahanalobis angle

20
Q

2 Common approaches for generating correlated multivariate normal random variables

A
  • Cholesky decomposition

- Principle components

21
Q

Cholesky decomposition

A

A way of “square-rooting” a matrix.

It is used to derive the matrix C, such that CC’ = Σ.

If a vector, Z of iid standard normal random variables is generated, then a vector, X, of correlated normal random variables can be generated as X = μ + CZ where μ is the vector of means.

22
Q

Principal Component Analysis

A

a.k.a. eigenvalue decomposition

Provides a way of decomposing the covariance matrix, Σ, as Σ = VΛV’ where Λ is the diagonal matrix of eigenvalues and V is the matrix of corresponding eigenvectors.

Each pair consisting of an eigenvalue and its corresponding eigenvector is called a principal component. These can be derived iteratively.

23
Q

List 2 univariate discrete distributions

A
  • binomial and negative binomial distributions

- the Poisson distribution

24
Q

List 2 univariate continuous distributions taking values from -∞ to + ∞, and a variation of each

A
  • the normal distribution
  • normal mixture distribution
  • Student’s t-distribution
  • the skewed t-distribution
25
Q

List 9 univariate continuous distributions taking only non-negative values

A
  • lognormal distribution
  • Wald distribution
  • Chi-squared distribution
  • gamma and inverse gamma distributions
  • generalised inverse gamma distribution
  • exponential distribution
  • Frechet distribution
  • Pareto distribution
  • generalised Pareto distribution
26
Q

Outline what the binomial distribution aims to model

A

A Bin(n, p) distribution is the sum of n independent and identical Bernoulli(p) trials.

Random variable X ~ Bin(n, p) is the number of successes that occur in the n trials.

The limiting distribution of the binomial distribution as n -> ∞

27
Q

Outline what the negative binomial distributions (Type 1 and Type 2) aim to model

A

Type 1: Random variable X is the number of the on which the rth success occurs, where r is a positive integer.

Type 2: Let Y be the number of failures before the rth success. Y = X - r, where X is defined as above.

28
Q

Outline what the Poisson distribution aims to model

A

The Poisson distribution models the number of events (eg claims) that occur in a specified interval of time, when the events occur one after another in time in a well-defined manner.

This manner presumes that the events occur singly at a constant rate, and that the numbers of events that occur in separate (ie non-oiverlapping) time intervals are independent of one another.

These conditions can be described by saying that the events occur “randomly, at a rate of λ per period”.

Such events are said to occur according to a Poisson process.

29
Q

State the location and scaling parameters of the standard normal distribution

A

The standard normal distribution has a location parameter (and mean) of 0, and a scaling parameter (and standard deviation) of 1.

30
Q

State why the t-distribution is an important distribution for risk modelling

A

The kurtosis of the standard t-distribution is greater than that of the normal distribution.

The fact that the t-distribution is leptokurtic (relatively fatter tails) makes this an important distribution for risk modelling.

31
Q

State what is meant by X having a lognormal distribution

A

If Y = lnX (the natural log) has a normal distribution, then X is said to have a lognormal distribution.

32
Q

Outline 2 specific applications of the lognormal distribution, in the context of modelling financial risks

A
  1. Since it takes only positive values, and is skewed, it is applicable to many insurance situations, eg claim size.
  2. It can be used to model financial variables, eg asset returns, with assumptions that the natural logarithm of the variable will follow a random walk drift lnXₜ = μ + ln Xₜ₋₁ + eₜ, and that the returns are iid.
33
Q

State what the Wald distribution describes in terms of a probability

A

The Wald distribution describes the time taken for a Brownian motion process to reach a given value.

34
Q

State what the chi-squared distribution describes in terms of a probability

A

The chi-squared distribution with γ degrees of freedom is the distribution of the sum of γ squared independent variables taken from a standard normal distribution, and so can be simulated as such.

35
Q

State what the exponential distribution models

A

The exponential distribution provides the expected waiting times between the events of a Poisson process.

36
Q

List the characteristics of the exponential distribution that limit its application to ERM

A

The exponential distribution’s application is limited by:

  • its monotonically-decreasing nature
  • its single parameter
  • the low probabilities associated with extreme events
37
Q

State what is meant by X having an inverse-gamma distribution

A

If Y ~ Gamma, then X = 1/Y ~ InverseGamma

38
Q

State how the gamma (and inverse-gamma) can be fitted to a sample

A

Both gamma and inverse-gamma can be fitted by equating sample and population moments and solving for the distribution’s parameters.

39
Q

2 Key features of the Pareto distribution

A

The Pareto distribution is monotonically decreasing and, like the tails of the t-distribution, follows a power law with the shape parameter (γ) determining the power.

40
Q

Uniform distribution

A

Assigns an equal probability to all outcomes in a range

41
Q

State the key features of the triangular distribution

A

The triangular distribution can be used in cases where, in addition to the upper and lower values, the most likely value is known. The distribution has lower limit β₁, mode α, and upper limit β₂

The mean is the average of the parameter values:
μ = ⅓( β₁ + α + β₂)

42
Q

Outline 3 key limitations that mean the multivariate normal distribution is not a good description of reality in many risk management applications

A
  • the tails of the univariate marginal distributions are too thin
  • the joint tails do not assign enough weight to join extreme outcomes
  • the distribution has a strong form of symmetry, known as elliptical symmetry.
43
Q

Define what is meant by a multivariate sperical distribution, and name a specific example

A

A multivariate spherical distribution is one where the marginal distributions are:

  • identical
  • symmetric
  • uncorrelated with each other (note, however, that lack of correlation does not necessarily imply independence)
44
Q

Define what is meant by a multivariate elliptical distribution and name a specific example

A

If any chosen (fixed) probability can be described by an elliptical relationship between the variables then the distribution is said to be elliptical.