Probability Flashcards Preview

Year 1 - Mathematics > Probability > Flashcards

Flashcards in Probability Deck (175)
Loading flashcards...
1
Q

What is a sample space and how do you write it?

A

The set of all possible outcomes, eg.
throwing two dice: Ω = {(i, j) : 1 ≤ i, j ≤ 6}
tossing a coin: Ω = {H, T}

2
Q

What is a subset of Ω (sample space) called?

A

An event

3
Q

When are two events disjoint?

A

A ∩ B = ∅

When they cannot both occur

4
Q

What is Stirling’s formula for the approximation of n!?

A

n! ∼()√2πn^(n+ 1/2)e^(−n)

5
Q

What is the formula for the number of the arrangements of n objects, with repeats?
Eg,
a₁, …, a₁, a₂,…, a₂,…aₖ, …, aₖ
where a₁ is repeated m₁ times etc.

A

n!/(m₁!m₂!…m!)

6
Q

What is the multinomial coefficient?

A

The coefficient of a₁ᵐ¹….aₖᵐᵏ
in (a₁ + … + aₖ)^n where m1 + … + mk = n
nC(m₁m₂…mₖ)

7
Q
  1. How many distinct non-negative integer-valued solutions of the equation
    x₁ + x₂ + · · · + xₘ = n
    are there?
A

(n+m-1)Cn

8
Q

What is Vandermonde’s identity?

A

For k, m, n ≥ 0
(m+n)Ck = ᵏΣⱼ₌₀(mCj)(nC(k-j))

mCj = 0 for j>m

9
Q

Prove Vandermonde’s identity

A

Suppose we choose a committee consisting of k people from a group of m men and n women.
There are (m+n)Ck
ways of doing this which is the left-hand side.
Now the number of men in the committee is some j ∈ {0, 1, . . . , k} and then it contains k − j women.
The number of ways of choosing the j men is mCj
and for each such choice there are nC(k-j)
choices for
the women who make up the rest of the committee. So there are mCj * nC(k-j)
committees with exactly j
men and summing over j we get that the total number of committees is given by the right-hand side

10
Q

A probability space is a triple (Ω, F, P)
(Fancy F and P).
What do these symbols mean?

A
  1. Ω is the sample space
  2. F is a collection of subsets of Ω, called events, satisfying axioms F1–F3
  3. P is a probability measure, which is a function P : F → [0, 1] satisfying axioms P1–P3
11
Q

What is the probability of the union of two disjoint events?

eg, P(A ∪ B)

A

P(A ∪ B) = P (A) + P (B)

12
Q

What are the axioms on F (a collection of subsets of Ω)?

A

F1: ∅ ∈ F.
F2: If A ∈ F, then also Aᶜ ∈ F.
F3: If {Ai, i ∈ I} is a finite or countably infinite collection of members of F, then ∪ᵢ∈ᵢ Aᵢ ∈ F

13
Q

What are the axioms of P, where P is a function from F to R?

A
P1: For all A ∈ F, P(A) ≥ 0.
P2: P(Ω) = 1
P3: If {Ai, i ∈ I} is a finite or countably infinite collection of members of F, and Ai ∩ Aj = ∅ for
i ≠ j, then P(∪ᵢ∈ᵢ Aᵢ) = 
Σi∈I P(Ai)
14
Q

When Ω is finite or countably infinite, what do we usually take F to be?

A

We normally

take F to be the set of all subsets of Ω (the power set of Ω)

15
Q
Suppose that (Ω, F, P) is a probability space and that A, B ∈ F
 If A ⊆ B then P (A)  ≤
A

A ⊆ B then P (A) ≤ P (B)

16
Q

Prove that P (A’) = 1 − P (A) using the probability axioms

A

Since A ∪ A’ = Ω and A ∩ A’ = ∅, by P3, P (Ω) = P (A) + P (A’). By P2, P (Ω) = 1 and so P(A) + P (A’) = 1, which entails the required result

17
Q

Prove A ⊆ B then P (A) ≤ P (B) using the probability axioms

A

Since A ⊆ B, we have B = A ∪ (B ∩ A’). Since B ∩ Ac ⊆ A’, it must be disjoint from A. So by P3, P(B) = P(A) + P(B ∩ A’). Since by P1, P(B ∩ A’) ≥ 0, we thus have P (B) ≥ P(A)`

18
Q

Conditional Probability

What is the probability of A given B?

A

P(A|B) = P(A ∩ B)/P(B)

19
Q
Let (Ω, F, P) be a probability space and let B ∈ F satisfy P(B) > 0. Define a new
function Q : F → R by Q(A) = P(A|B)

Is (Ω, F, Q) a probability space?
Prove your result

A

Yes

Proof pg 12

20
Q

When are events A and B independent?

A

Events A and B are independent if P(A ∩ B) = P(A)P(B)

21
Q

More generally, a family of events A = {Aᵢ : i ∈ I} is independent if…

A

P(∩ᵢ∈ⱼ Aᵢ) = Πᵢ∈ⱼ P(Aᵢ)

for all finite subsets J of I

22
Q

When is a family of events pairwise independent?

A

A family A of events is pairwise independent if P(Aᵢ ∩ Aⱼ ) = P(Aᵢ)P(Aⱼ ) whenever i ≠ j.

23
Q

Does Pairwise Independence imply independence?

A

NO!!!!

24
Q

Given A and B are independent, are A and B’, and A’ and B’ independent?

A
Both
A and B'
and
A' and B'
are independent
25
Q

Prove that A and B’ are independent given A and B are independent

A

We have A = (A ∩ B) ∪ (A ∩ B’), where A ∩ B and A ∩ B’ are disjoint, so using the
independence of A and B, P(A ∩ B’) = P (A) − P(A ∩ B) = P(A) − P(A) P(B) = P (A) (1 − P(B)) = P(A)P(B’)

26
Q

When is a family of events {B1, B2, . . .} a partition of Ω?

A

if

  1. Ω = ∪ᵢ≥₁ Bᵢ (so that at least one Bi must happen), and
  2. Bᵢ ∩ Bⱼ = ∅ whenever i ≠ j (so that no two can happen together)
27
Q

What is the law of total probability/partition theorem?

A

Suppose {B1, B2, . . .} is a partition of Ω by sets from F,
such that P (Bᵢ) > 0 for all i ≥ 1. Then for any A ∈ F
P(A) = ᵢ≥₁ΣP(A|Bᵢ)P(Bᵢ)

28
Q

Prove the partition theorem

A

P(A) = P(A ∩ (∪ᵢ≥₁Bᵢ)), since ∪ᵢ≥₁Bᵢ = Ω
= P(∪ᵢ≥₁(A ∩ Bᵢ))
= ᵢ≥₁Σ P (A ∩ Bᵢ) by axiom P3, since A ∩ Bᵢ, i ≥ 1 are disjoint
= ᵢ≥₁Σ P (A|Bᵢ)P(Bᵢ)

29
Q

What is Bayes’ Theorem?

A

Suppose that {B1, B2, . . .} is a partition of Ω by sets from F such that P (Bi) > 0 for all i ≥ 1. Then for any A ∈ F such that P (A) > 0

P(Bₖ|A) = P(A|Bₖ)P(Bₖ)/(ᵢ≥₁Σ P (A|Bᵢ)P(Bᵢ))

30
Q

Prove Bayes’ theory

A

We have P(Bₖ|A) = P(Bₖ ∩ A)/P(A)
= P(A|Bₖ)P(Bₖ)/P(A)
Now substitute for P(A) using the law of total probability

31
Q

What is Simpson’s paradox?

A

it consists of the fact that for events E,
F, G, we can have

P(E|F ∩ G) > P(E|F’ ∩ G)
P(E|F ∩ G’) > P(E|F’ ∩ G’)
and yet
P(E|F) < P(E|F’).

32
Q

What is the multiplication rule?

Eg, P(A ∩ B) = …

A

P(A ∩ B) = P(A|B) P(B) = P(B|A) P(A)

33
Q

What is the generalisation of the multiplication rule for n events?

A

P (A1 ∩ A2 ∩ . . . ∩ An) = = P(A1) P(A2|A1). . . P(An|A1 ∩ A2 ∩ . . . ∩ An−1)

34
Q

inclusion-exclusion formula

P (A1 ∪ A2 ∪ . . . ∪ An) = ⁿΣᵢ₌₁ P(Aᵢ) - ….

A

P (A1 ∪ A2 ∪ . . . ∪ An) = ⁿΣᵢ₌₁ P(Aᵢ) - Σᵢ>ⱼ P(Ai ∩ Aj) + … + (-1)ⁿ⁺¹P(A1 ∩ A2 ∩ . . . ∩ An)

35
Q

What is a discrete random variable?

A
A discrete random variable X on a probability space (Ω, F, P) is a function X : Ω → R
such that
(a) {ω ∈ Ω : X(ω) = x} ∈ F for each x ∈ R,
(b) ImX := {X(ω) : ω ∈ Ω} is a finite or countable subset of R
36
Q

What is the more common/shorter way of writing P({ω ∈ Ω : X(ω) = x})?

A

P(X = x)

37
Q

How is the probability mass function defined?

A

The probability mass function (p.m.f.) of X is the function pₓ : R → [0, 1] defined by
pₓ(x) = P(X = x)

38
Q

What is the pmf when x ≠ ImX?

A

If x ≠ ImX X (that is, X(ω) never equals x) then pₓ(x) = P ({ω : X(ω) = x}) = P (∅) = 0.

39
Q

What does Σₓ∈ᵢₘₓ pₓ(x) = ?

why?

A

ₓ∈ᵢₘₓΣ pₓ(x) = ₓ∈ᵢₘₓΣ P ({ω : X(ω) = x})
=P(ₓ∈ᵢₘₓ ∪ {ω : X(ω) = x}) since the events are disjoint
= P (Ω) since every ω ∈ Ω gets mapped somewhere in ImX
= 1

40
Q

X has the Bernoulli distribution with parameter p (where 0 ≤ p ≤ 1) if…

A

P(X = 0) = 1 − p, P(X = 1) = p

41
Q

X has a binomial distribution with parameters n and p (where n
is a positive integer and p ∈ [0, 1]) if…

A

P (X = k) = nCk p^k (1-p)^n-k

42
Q

If X has the Bernoulli distribution, how do we write this?

A

X ∼ Ber(p)

43
Q

If X has the binomial distribution, how do we write this?

A

X ∼ Bin(n, p)

44
Q

If X has the geometric distribution, how do we write this?

A

X ∼ Geom(p)

45
Q

If X has the Poisson distribution, how do we write this?

A

X ∼ Po(λ)

46
Q

X has a geometric distribution with parameter p if….

A
P(X = k) = p(1 − p)^k-1,
k = 1, 2, ....
47
Q

What can the geometric distribution model?

A

We can use X to model the number of independent trials needed until we see the first success,
where p is the probability of success on a single trial

48
Q

If you want to use the geometric distribution to model he number of failures before the first success, which formula do you use?

A

P (Y = k) = p(1 − p)^k,

k = 0, 1, …

49
Q

X has the Poisson distribution with parameter λ ≥ 0 if…

A

P (X = k) = ( λ^k e^-λ) /k!, k = 0, 1, …

50
Q

Define the expectation of X

A
The expectation (or expected value or mean) of X is
E[X] = ₓ∈ᵢₘₓΣ xP(X=x)
provided that ₓ∈ᵢₘₓΣ |x|P(X=x) < ∞
51
Q

What is the expectation of the Poisson distribution?

A

λ

52
Q

What is the expectation of the Geometric distribution?

A

1/p

53
Q

What is the expectation of the Binomial distribution?

A

np

54
Q

What is the expectation of the Bernoulli distribution?

A

p

55
Q

Let h : R → R

If X is a discrete random variable, is Y = h(X) also a discrete random variable?

A

Yes

56
Q

If h : R → R, then

E [h(X)] = ….

A

E [h(X)] = ₓ∈ᵢₘₓΣ h(x)P (X = x)

provided that ₓ∈ᵢₘₓΣ |h(x)|P (X = x) < ∞.

57
Q

Prove the theorem that

E [h(X)] = ₓ∈ᵢₘₓΣ h(x)P (X = x)

A

Let A = {y : y = h(x) for some x ∈ ImX}
Start from the rhs. Write it as two sums, one over y∈A, the other over x∈ImX:h(x)=y
pg22

58
Q
Take h(x) = x^k
What is E[X^k] called?
A

The kth moment of X, when it exists

59
Q

Let X be a discrete random variable such that E [X] exists.
Describe the expectation when X is non-negative
Prove it

A

If X is non-negative then E [X] ≥ 0
We have ImX ⊆ [0, ∞) and so
E [X] = ₓ∈ᵢₘₓΣ xP (X = x) is a sum whose terms are all non-negative and so must itself be non-negative.

60
Q

Let X be a discrete random variable such that E [X] exists.
If a, b ∈ R then E [aX + b] = …
Prove it

A

E [aX + b] = aE [X] + b

61
Q

For a discrete random variable X, define the variance

A
For a discrete random variable X, the variance of X is defined by
var (X) = E[(X − E[X])²
]
= E[X²] - (E[X])²
provided that this quantity exists.
62
Q

What is variance a measure of?

A

The variance is a measure of how much the distribution of X is spread out about its mean: the more
the distribution is spread out, the larger the variance.

63
Q

Is always Var(X)≥ 0? Why?

A

Yes
since (X−E [X])2
is a non-negative random variable, var (X) ≥ 0

64
Q

How are standard deviation and variance related?

A

Standard deviation^2 = var (X)

65
Q

Suppose that X is a discrete random variable whose variance exists. Then if a and b
are (finite) fixed real numbers, then the variance of the discrete random variable Y = aX + b is given by ….
Prove it

A

var (Y ) = var (aX + b) = a² var (X)

66
Q

Suppose that B is an event such that P (B) > 0. Then the conditional distribution of
X given B is…
P(X = x|B) =

A

P(X = x|B) = P({X = x} ∩ B) / P(B), for x ∈ R

67
Q

Suppose that B is an event such that P (B) > 0,

The conditional expectation of X given B is…

A

ₓΣxP(X = x|B),
whenever the sum converges absolutely

We write pₓ|ᵦ(x) = P(X=x|B)

68
Q

What is the Partition theorem for expectations?

A

If {B1, B2, . . .} is a partition of Ω such that
P (Bi) > 0 for all i ≥ 1 then
E [X] = ᵢ≥₁ΣE [X | Bᵢ] P(Bᵢ),
whenever E [X] exists.

69
Q

Prove the Partition theorem for expectations

A

Use the total law of probability to split into two sums, one over x, one over i.
pg24

70
Q

Given two random variables X and Y their joint distribution (or joint probability
mass function) is
pₓ,ᵧ (x, y) =

A

pₓ,ᵧ (x, y) = P ({X = x} ∩ {Y = y})
= P(X = x, Y = y)
x, y ∈ R

71
Q

Is pₓ,ᵧ (x, y) always greater than 0?

A

Yes

72
Q

What does ₓΣᵧΣpₓ,ᵧ (x, y) = ??

A

ₓΣᵧΣpₓ,ᵧ (x, y) = 1

73
Q

Joint distributions:

What is the marginal distribution of X?

A

pₓ(x) = ᵧΣpₓ,ᵧ (x, y)

74
Q

Joint distributions:

marginal distribution of Y?

A

pᵧ(y) = ₓΣpₓ,ᵧ (x, y)

75
Q
Whenever pX(x) > 0 for some x ∈ R, we can also write down the conditional distribution of Y given
that X = x:
pᵧ|ₓ₌ₓ(y) =
A

pᵧ|ₓ₌ₓ(y) = P (Y = y|X = x)

= pₓ,ᵧ(x,y)/pₓ(x) for y ∈ R

76
Q

The conditional expectation of Y given that X = x is

E [Y |X = x] = …

A

E [Y |X = x] = ᵧΣypᵧ|ₓ₌ₓ(y)

whenever the sum converges absolutely

77
Q

When are Discrete random variables X and Y independent?

A

P(X = x, Y = y) = P(X = x)P(Y = y) for all x, y ∈ R.
In other words, X and Y are independent if and only if the events {X = x} and {Y = y} are independent
for all choices of x and y. We can also write this as
pₓ,ᵧ (x, y) = pₓ(x)pᵧ(y) for all x, y ∈ R

78
Q

In the same way as we defined expectation for a single discrete random variable, so in the bivariate case
we can define expectation of any function of the random variables X and Y . Let h : R² → R. Then
h(X, Y ) is itself a random variable, and
E[h(X, Y )] =

A

E[h(X, Y )] = ₓΣᵧΣ h(x, y)P(X = x, Y = y)
= ₓΣᵧΣ h(x, y)pₓ,ᵧ (x, y)

provided the sum converges absolutely.

79
Q

Suppose X and Y are discrete random variables and a, b ∈ R are constants. Then
E[aX + bY ] =
Prove it

A

E[aX + bY ] = aE[X] + bE[Y ]
provided that both E [X] and E [Y ] exist.

Prove it pg28

80
Q

What does E[aX + bY ] = aE[X] + bE[Y ] about expectation?

A

expectation is linear

81
Q

E[a₁X₁ + · · · + aₙXₙ] =

A

E[a₁X₁ + · · · + aₙXₙ] = a₁E[X₁] + · · · + aₙE[Xₙ]

82
Q

If X and Y are independent discrete random variables whose expectations exist, then
E[XY ] =
Prove it

A

E[XY] = E[X]E[Y ]

Proof pg28

83
Q

What is the covariance of X and Y?

A

cov (X, Y ) = E[(X − E [X])(Y − E [Y ])]

84
Q

What is cov(X,X) = ?

A

cov (X, X) = var (X)

85
Q

Does cov (X, Y ) = 0 imply that X and Y are independent?

A

NO!!!!

86
Q

multivariate distributions:
pX₁,X₂,…,Xₙ
(x₁, x₂, . . . , xₙ) =

A

pX₁,X₂,…,Xₙ
(x₁, x₂, . . . , xₙ) = P(X₁ = x₁, X₂ = x₂, …, Xₙ = xₙ)
for x₁, x₂, …,xₙ ∈ R

87
Q

A family {Xᵢ

: i ∈ I} of discrete random variables are independent if ….

A

A family {Xᵢ : i ∈ I} of discrete random variables are independent if for all finite
sets J ⊆ I and all collections {Aᵢ : i ∈ J} of subsets of R,
P(ᵢ∈ⱼ∩{Xᵢ ∈ Aᵢ}) = ᵢ∈ⱼΠP(Xᵢ ∈ Aᵢ)

88
Q

Suppose that X1, X2, . . . are independent random variables which all have the same distribution, what do we call them?

A

Independent and identically distributed (i.i.d)

89
Q

A kth order linear recurrence relation (or difference equation) has the form….

A

ᵏΣⱼ₌₀ aⱼ uₙ₊ⱼ = f(n)
with a₀ ≠ 0 and aₖ ≠ 0, where a₀…aₖ re constants independent of n

A solution to such a difference
equation is a sequence (uₙ)ₙ ≥ ₀ satisfying the sum for all n ≥ 0.

90
Q

The general solution (uₙ)ₙ ≥ ₀ (i.e. if the boundary conditions are not specified) of ᵏΣⱼ₌₀ aⱼ uₙ₊ⱼ = f(n) can be written as …
Prove this

A

uₙ = vₙ +wₙ where (vₙ)ₙ ≥ ₀ is a particular solution to the equation and (wₙ)ₙ ≥ ₀ solves
the homogeneous equation ᵏΣⱼ₌₀ aⱼ wₙ₊ⱼ = 0
proof pg31

91
Q

How would you solve the second order linear difference equation:
uₙ₊₁ + auₙ + buₙ₋₁ = f(n) ?

A

Substitute wₙ = Aλⁿ in wₙ₊₁ + awₙ + bwₙ₋₁ = 0
then divide by Aλⁿ⁻¹ to get the quadratic: λ² + aλ + b = 0 (Aux Eqn)
General Soln = wₙ = A₁λ₁ⁿ + A₂λ₂ⁿ
or if λ₁ = λ₂ = λ then wₙ = (A + Bn)λⁿ

92
Q

Consider a random walk on the integers Z, started from some n > 0,
which at each step increases by 1 with probability p, and decreases by 1 with probability q = 1 − p. Then
the probability uₙ that the walk ever hits 0 is given by…..
Prove it

A

uₙ = { (q/p)ⁿ if p>q
1 if p ≤ q

Proof pg 38

93
Q

Let X be a non-negative integer-valued random variable. Let
S := { s ∈ R : ∞Σₖ₌₀ |s|ᵏ P(X = k) < ∞ }
Then the probability generating function (p.g.f.) of X is Gₓ : S → R defined by ….

A

Gₓ(s) = E[sˣ] = ∞Σₖ₌₀ sᵏP(X=k)

94
Q

pₓ(k) = pₖ = …

A

pₓ(k) = pₖ = P(X=k)

95
Q

Is the distribution of X uniquely determined by its probability generating function, Gₓ?

A

Yes

96
Q

What is the probability generating function of the Bernoulli distribution?

A

Gₓ(s) = ₖΣpₖsᵏ = qs⁰ + ps¹ = q + ps

for all s ∈ R

97
Q

What is the probability generating function of the Binomial distribution?

A

Gₓ(s) = ⁿΣₖ₌₀ sᵏ ⁿCₖ pᵏ (1-p)ⁿ⁻ᵏ = ⁿΣₖ₌₀ ⁿCₖ (ps)ᵏ (1-p)ⁿ⁻ᵏ = (1 - p + ps)ⁿ
by the binomial theorem. This is valid for all s ∈ R

98
Q

What is the probability generating function of the Poisson distribution?

A

Gₓ(s) = ∞Σₖ₌₀ sᵏ λᵏe^-λ/k! = e^-λ ∞Σₖ₌₀ (sλ)ᵏ/k! = e^λ(s-1)

for all s ∈ R

99
Q

What is the probability generating function of the Geometric distribution with parameter p?

A

Gₓ(s) = ps/(1-(1-p)s)

provided that |s| < 1/1−p

100
Q

If X and Y are independent, then Gₓ₊ᵧ(s) = …

A

Gₓ₊ᵧ(s) = Gₓ(s)Gᵧ(s)

101
Q

Prove that Gₓ₊ᵧ(s) = Gₓ(s)Gᵧ(s) if X and Y are independent

A

Gₓ₊ᵧ(s) = E[sˣ⁺ʸ] = E[sˣsʸ]
Since X and Y are independent, sˣ and sʸ are independent.
So this equals E[sˣ]E[sʸ] = Gₓ(s)Gᵧ(s)

102
Q

Suppose that X₁, X₂, …, Xₙ are independent Ber(p) random variables and let Y = X₁ + … + Xₙ. How is Y distributed?

A

Y ∼ Bin(n, p)

103
Q

Prove that Y ∼ Bin(n, p), if Y = X₁ + … + Xₙ and X₁, X₂, …, Xₙ are independent Ber(p) random variables

A

Gᵧ(s) = E[sʸ] = E[s^(X₁ + … + Xₙ)] = E[s^X₁] … E[s^Xₙ] = (1 - p + ps)ⁿ
As Y has the same p.g.f. as a Bin(n, p) random variable, we deduce that Y ∼ Bin(n, p).

104
Q

Suppose that X₁, X₂, …, Xₙ are independent random variables such that Xᵢ ∼ Po(λᵢ)
Then ⁿΣᵢ₌₁ Xᵢ ∼ ….
In particular, what happens when λᵢ = λ for all 1 ≤ i ≤ n

Prove all of this

A

ⁿΣᵢ₌₁ Xᵢ ∼ Po(ⁿΣᵢ₌₁ λᵢ)

λᵢ = λ for all 1 ≤ i ≤ n:
ⁿΣᵢ₌₁ Xᵢ ∼ Po(nλ)

Proof pg41

105
Q

Show that G’ₓ(1) = E[X]

A
G'ₓ(s) = d/ds E[sˣ] = d/ds ∞Σₖ₌₀ sᵏ P(X=k) = ∞Σₖ₌₀ d/ds sᵏ P(X=k) = ∞Σₖ₌₀ ksᵏ⁻¹P(X=k) = E[Xsˣ⁻¹]
G'ₓ(1) = E[X]
106
Q

G’‘ₓ(1) = …

A

G’‘ₓ(1) = E[X(X − 1)] = E[X²] − E[X],

107
Q

Write the variance of X in terms of Gₓ(1) and its derivatives

A

var(X) = G’‘ₓ(1) + G’ₓ(1) - (G’ₓ(1))²

108
Q

dᵏ/dsᵏ Gₓ(s) |ₛ₌₁ = …

A

dᵏ/dsᵏ Gₓ(s) |ₛ₌₁ = E[X(X-1) … (X - k + 1)]

109
Q

Let X₁, X₂, . . . be i.i.d. non-negative integer-valued random variables with p.g.f. Gₓ(s).
Let N be another non-negative integer-valued random variable, independent of X₁, X₂, . . . and with p.g.f.
Gₙ(s). Then the p.g.f. of ᵢ₌₁Σᴺ Xᵢ is ……

Prove it

A

The pgf of ᵢ₌₁Σᴺ Xᵢ is Gₙ(Gₓ(s))

Note that the sum ᵢ₌₁Σᴺ Xᵢ has a random number of terms. We interpret it as 0 if N = 0.

Proof pg 44

110
Q

Suppose that X₁, X₂, … are independent and identically distributed Ber(p) random variables and that N ∼ Po(λ), independently of X₁, X₂, … Then ᵢ₌₁Σᴺ Xᵢ ∼

A

ᵢ₌₁Σᴺ Xᵢ ∼ Po(λp)

111
Q

Prove that:
Suppose that X₁, X₂, … are independent and identically distributed Ber(p) random variables and that N ∼ Po(λ), independently of X₁, X₂, … Then ᵢ₌₁Σᴺ Xᵢ ∼ Po(λp)

A

Gₓ(s) = 1 - p + ps and Gₙ(s) = exp(λ(s − 1)) and so
E[s^( ᵢ₌₁Σᴺ Xᵢ)] = Gₙ(Gₓ(s)) = exp(λ(1 - p + ps - 1)) = exp(λp(s-1))
Since this is the p.g.f. of Po(λp) and p.g.f.’s uniquely determine distributions, the result follows

112
Q

What is the offspring distribution?

A

Suppose we have a population (say of bacteria). Each individual in the population lives a unit time and,
just before dying, gives birth to a random number of children in the next generation. This number of
children has probability mass function p(i), i ≥ 0, called the offspring distribution

113
Q

Let Xₙ be the size of the population in generation n, so that X₀ = 1. Let Cᵢ⁽ⁿ⁾ be the number of children
of the ith individual in generation n ≥ 0, so that we may write Xₙ₊₁ = …

A

Xₙ₊₁ = C₁⁽ⁿ⁾ + C₂⁽ⁿ⁾ + … + Cₓₙ⁽ⁿ⁾

We interpret this sum as 0 if Xₙ = 0
Note that C₁⁽ⁿ⁾, C₂⁽ⁿ⁾, …. are independent and identically distributed.

114
Q

Let Xₙ be the size of the population in generation n, so that X₀ = 1. Let Cᵢ⁽ⁿ⁾ be the number of children
of the ith individual in generation n ≥ 0, so that we may write Xₙ₊₁ = C₁⁽ⁿ⁾ + C₂⁽ⁿ⁾ + … + Cₓₙ⁽ⁿ⁾
What is G(s)? and Gₙ(s)

A
G(s) = ∞Σᵢ₌₀ p(i)sᶦ
Gₙ(s) = E[sˣⁿ]  (That's X subscript n)
115
Q

For n ≥ 0
Gₙ₊₁(s) = …
Prove it

A

Gₙ₊₁(s) = Gₙ(G(s)) = G(G(…G(s)…)) = G(Gₙ(s))
^(n+1) times

Proof pg 45

116
Q

Suppose that the mean number of children of a single individual is µ i.e. ∞Σᵢ₌₁ ip(i) = µ
E[Xₙ] = ….
Prove it

A

E[Xₙ] = µⁿ

Proof pg 46

117
Q

Branching processes, what is the probability that the population dies out?

A

P(population dies out) = P(∞∪ₙ₌₀ {Xₙ = 0}) ≥ P (X₁ = 0) = p(0) > 0

118
Q
Extinction Probability (non-examinable)
pg 47-48
A

Extinction Probability (non-examinable)

119
Q

A random variable X defined on a probability space (Ω, F, P) is a function X: [ ] such that { w: [ ]} ∈ F for each x ∈ R.

A

A random variable X defined on a probability space (Ω, F, P) is a function X : Ω → R
such that {ω : X(ω) ≤ x} ∈ F for each x ∈ R.

120
Q

What is the cumulative distribution function of a random variable X?

A

is the function
Fₓ : R → [0, 1] defined by
Fₓ(x) = P (X ≤ x)

121
Q

Continuous distributions
The cdf = Fₓ(x)
Is Fₓ decreasing?
Prove

A

No, it’s non-decreasing

Proof pg 51

122
Q

Continuous distributions
The cdf = Fₓ(x)
P (a < X ≤ b) = ???
Prove

A

P (a < X ≤ b) = Fₓ(b) − Fₓ(a) for a < b

Proof pg 51

123
Q

Continuous distributions
The cdf = Fₓ(x)
As x → −∞, Fₓ(x) → ???
Prove

A

x → −∞, Fₓ(x) → 0

Proof pg 51/52

124
Q

Continuous distributions
The cdf = Fₓ(x)
As x → ∞, Fₓ(x) → ???
Prove

A

x → ∞, Fₓ(x) → 1

Proof pg x → ∞, Fₓ(x) → 151/52

125
Q
Continuous distributions 
Any functions satisfying:
Fₓ is non-decreasing
P (a < X ≤ b) = Fₓ(b) − Fₓ(a) for a < b
x → −∞,  Fₓ(x) → 0
x → −∞,  Fₓ(x) → 0
and [ ]
is the
cumulative distribution function of some random variable defined on some probability space
A

Right Continuity

126
Q
A continuous random variable X is a random variable whose c.d.f. satisfies
Fₓ(x) = P[ ] = ∫ [ ] 
where fₓ : R → R is a function such that
a) fₓ(u) [ ] 0 for all u ∈ R
b) −∞ ∫ ∞  fₓ(u)  du  =
A

Fₓ(x) = P (X ≤ x) = −∞ ∫ˣ fₓ(u) du

Bounds on the integral -∞ → x
where fₓ : R → R is a function such that

a) fₓ(u) ≥ 0 for all u ∈ R
b) −∞ ∫ ∞ fₓ(u) du = 1

127
Q

Continuous distributions

What is fₓ called?

A

fₓ is called the probability density function (p.d.f.) of X or, sometimes, just its density.

128
Q

The Fundamental Theorem of Calculus tells us that Fₓ of the form given in the definition is differentiable with dFₓ(x)/dx = [ ]

A

dFₓ(x)/dx = fₓ(x)

at any point x such that fₓ(x) is continuous.

129
Q

Is fₓ(x) a probability??

A

No!!!!!

Therefore it can exceed 1

130
Q

If X is a continuous random variable with p.d.f fₓ then
P(X=x) = [ ]
P(a ≤ X ≤ b) = [ ]

A

P(X=x) = 0 for all x ∈ R

P(a ≤ X ≤ b) = ₐ∫ᵇ fₓ(x) dx

131
Q

What is the p.d.f. of the Uniform distribution?

A

fₓ(x) = {1/b-a for a ≤ x ≤ b,

{ 0 otherwise

132
Q

What’s the notation for X is distributed Uniformally?

A

X ∼ U[a, b]

133
Q

What is the p.d.f. of the exponential distribution?

A

fₓ(x) = λe^(-λx), x ≥ 0

134
Q

What is the p.d.f. of the gamma distribution?

A

α > 0 and λ ≥ 0
fₓ(x) = ((λ^α)/Γ(α)) x^(α-1)e^(-λx), x ≥ 0

Here, Γ(α) is the so-called gamma function, which is defined by
Γ(α) = ∞∫₀ u^(α-1)e⁻ᵘ du for α > 0
For most values of α this integral does not have a closed form. However, for a strictly
positive integer n, we have Γ(n) = (n − 1)!.

135
Q

What is the p.d.f. of the e normal (or Gaussian) distribution?

A

µ ∈ R
and σ²> 0
fₓ(x) = 1/√2πσ² exp(-(x − µ)²/2σ² ), x ∈ R

136
Q

What’s the notation for when X is gamma distributed?

A

X ∼ Gamma(α, λ)

137
Q

What’s the notation for X is distributed normally?

A

X ∼ N(µ, σ²)

138
Q

What’s the notation for X is distributed normally?

A

X ∼ N(µ, σ²)

139
Q

What is the standard normal distribution?

A

N(0, 1)

140
Q

P (x ≤ X ≤ x + δ) ≈ [ ]

A

P (x ≤ X ≤ x + δ) ≈ fₓ(x) δ

141
Q

P (nδ ≤ X ≤ (n + 1)δ) ≈ [ ]

A

P (nδ ≤ X ≤ (n + 1)δ) ≈ fₓ(nδ)δ

142
Q

Let X be a continuous random variable with probability density function fₓ.
The expectation or mean of X is defined to be …

A

E [X] = −∞ ∫ ∞ xfₓ(x) dx

whenever −∞ ∫ ∞ |x|fₓ(x) dx < ∞

143
Q

Let X be a continuous random variable with probability density function fₓ
and let h be a function from R to R. Then
E [h(X)] = ???

A

E [h(X)] = −∞ ∫ ∞ h(x)fₓ(x) dx

whenever −∞ ∫ ∞ |h(x)|fₓ(x) dx < ∞

144
Q

Suppose X is a continuous random variable with p.d.f. fₓ.
Then if a, b ∈ R then
E [aX + b] = ???
and var (aX + b)

Prove it

A

E [aX + b] = aE [X] + b
var (aX + b) = a²var (X)

Proof pg 58

145
Q

Does E[1/X] = 1/E[X]?

A

No!!!!

146
Q

Suppose that X is a continuous random variable with density fₓ and that h : R → R
is a differentiable function which is strictly increasing.
Then Y = h(X) is a
continuous random variable with p.d.f.
fᵧ(y) =
Prove

A

fᵧ(y) = fₓ(h⁻¹(y))d/dy h⁻¹(y)

where h⁻¹ is the inverse function of h

Proof pg60

147
Q

joint cumulative distribution function, Fₓ,ᵧ : R
2 → [0, 1],
given by
Fₓ,ᵧ (x, y) =

A

Fₓ,ᵧ (x, y) = P (X ≤ x, Y ≤ y)

148
Q

joint cumulative distribution

Is Fₓ,ᵧ non-decreasing?

A

Yes

149
Q

joint cumulative distribution

What does Fₓ,ᵧ = when a and y →∞

A

Fₓ,ᵧ(x, y) = 1

150
Q

joint cumulative distribution

What does Fₓ,ᵧ = when x and y → - ∞

A

Fₓ,ᵧ(x, y) = 0

151
Q

Let X and Y be random variables such that
Fₓ,ᵧ(x, y) = −∞∫ʸ −∞∫ˣ fₓ,ᵧ(u, v) dudv
for some function fₓ,ᵧ : R² → R such that
a) fₓ,ᵧ(u, v) [ ] 0 for all u, v ∈ R
b) −∞∫ʸ −∞∫ˣ fₓ,ᵧ(u, v) dudv = [ ]

A

a) fₓ,ᵧ(u, v) ≥ 0 for all u, v ∈ R

b) −∞∫ʸ −∞∫ˣ fₓ,ᵧ(u, v) dudv = 1

152
Q

If X and Y are jointly continuous, what is fₓ,ᵧ ??

A

their joint density function.

153
Q

What is fₓ,ᵧ in terms of Fₓ,ᵧ(x,y)?

A

fₓ,ᵧ(x, y) = ∂²/∂x∂y Fₓ,ᵧ(x,y)

154
Q

For a single continuous random variable X, it turns out that the probability that it lies in some nice set
A ∈ R can be obtained by integrating its density over A
P (X ∈ A) = ???

A

P (X ∈ A) = ₐ∫ fₓ(x) dx

155
Q

For a single continuous random variable X
for nice sets B ⊆ R² we obtain the probability that the pair (X, Y ) lies in B by integrating
the joint density over the set B

P ((X, Y ) ∈ B) = ??

A

P ((X, Y ) ∈ B) = ∫∫₍ₓ,ᵧ₎∈ᵦ fₓ,ᵧ(x, y)) dxdy

156
Q

For a pair of jointly continuous random variables X and Y , we have
P (a < X ≤ b, c < Y ≤ d) = …
Prove

A

P (a < X ≤ b, c < Y ≤ d) = 𝒸∫ᵈ ₐ∫ᵇ fₓ,ᵧ(x, y)) dxdy
for a < b and c < d

Proof pg62

157
Q

Suppose X and Y are jointly continuous with joint density fₓ,ᵧ. Then X is a continuous random variable with density
fₓ(x) =

A

-∞∫∞ fₓ,ᵧ(x, y)) dy

158
Q

Suppose X and Y are jointly continuous with joint density fₓ,ᵧ. Then Y is a continuous random variable with density
fᵧ(y) =
Prove

A

-∞∫∞ fₓ,ᵧ(x, y)) dx

Proof pg 63

159
Q

the one-dimensional densities fₓ and fᵧ of the joint

distribution with density fₓ,ᵧ, are called what?

A

Marginal distribution

160
Q

When are Jointly continuous random variables X and Y with joint density fₓ,ᵧ independent?

A

fₓ,ᵧ(x, y) = fₓ(x) fᵧ(y)

for all x, y ∈ R

161
Q

jointly continuous random variables X₁, X₂, . . . , Xₙ with joint density
fₓ₁,ₓ₂,…,ₓₙ are independent if…

A

fₓ₁,ₓ₂,…,ₓₙ(x₁, x₂, . . . , xₙ) = fₓ₁(x₁)fₓ₂(x₂) … fₓₙ(xₙ)
for all x₁, x₂, . . . , xₙ∈ R

162
Q

if X and Y are independent then it follows easily that Fₓ,ᵧ (x, y) = …

A

Fₓ,ᵧ (x, y) = Fₓ(x)Fᵧ(y)

for all x, y ∈ R.

163
Q

Write E [h(X, Y )] in terms of a double integral

A

E [h(X, Y )] = -∞∫∞ -∞∫∞ h(x, y) fₓ,ᵧ(x, y)) dxdy

164
Q

What is he cov(X, Y)?

A

cov (X, Y ) = E [(X − E [X])(Y − E [Y ])] = E [XY ] − E [X] E [Y ]

165
Q

Let X₁, X₂, . . . , Xₙ denote i.i.d. random variables. Then these random variables are
said to constitute a [ ] from the distribution

A

random sample of size n

166
Q

What is the sample mean defined to be?

A

_

Xₙ = 1/n ᵢ₌₁Σⁿ Xᵢ

167
Q

What is var(X+Y)?? For random variables X and Y

A

var (X + Y ) = var (X) + var (Y ) + 2cov (X, Y )

168
Q

What is var(ᵢ₌₁Σⁿ Xᵢ)?? For random variables X and Y

A

var(ᵢ₌₁Σⁿ Xᵢ) = ᵢ₌₁Σⁿ var(Xᵢ) + ᵢ≠ⱼΣcov(Xᵢ, Xⱼ)

= ᵢ₌₁Σⁿ var(Xᵢ) +2ᵢ

169
Q

Suppose that X₁, X₂, . . . , Xₙ form a random sample from a distribution with mean µ
and variance σ². Then the expectation and variance of the sample mean are …
Prove it

A

_ _
E[Xₙ] = µ and var(Xₙ) = 1/n σ²

Proof pg 67

170
Q

Let X₁, X₂, . . . , Xₙ be a random sample from a Bernoulli distribution with parameter p.
What do E[Xᵢ], var(Xᵢ), and _ _
E[Xₙ] var(Xₙ)
equal??

A
E[Xᵢ] = p
var(Xᵢ) = p(1-p)
 for all 1 ≤ i ≤ n
Hence,   _                         _
            E[Xₙ] = p   and var(Xₙ) = p(1-p)/n
171
Q

Suppose that A is an event with probability P (A) and write p = P (A). Let X be the indicator function
of the event A i.e. the random variable defined by
X(ω) = 1ₐ(ω) = {1 if ω ∈ A
{0 if ω ∉ A
Then X ∼ [ ] and E[X} = [ ]

A

X ∼ Ber(p) and E [X] = p

172
Q

State the weak law of large numbers ….

Prove it

A

Suppose that X₁, X₂, . . . . are independent and identically
distributed random variables with mean µ. Then for any fixed ε > 0
As n → ∞
P(|1/n ᵢ₌₁Σⁿ Xᵢ − µ| > ε)→0

Proof pg 68

173
Q

Weak law of large numbers:
P(|1/n ᵢ₌₁Σⁿ Xᵢ − µ| ≤ ε)→???
As n → ∞

A

P(|1/n ᵢ₌₁Σⁿ Xᵢ − µ| ≤ ε)→1

174
Q

What is Markov’s inequality?

Prove it

A

Suppose that Y is a non-negative random variable whose expectation exists. Then
P(Y ≥ t) ≤ E[Y]/t for all t > 0.

Proof pg68

175
Q

What is Chebyshev’s inequality?

Prove it

A

Suppose that Z is a random variable with a finite variance. Then for any t > 0,
P (|Z − E [Z] | ≥ t) ≤
var (Z)/t²

Proof: Note that P (|Z − E [Z]| ≥ t) = P((Z − E [Z])² ≥ t²)
and then apply Markov’s inequality to the
non-negative random variable Y = (Z − E [Z])²