Permutation is an arrangement of objects in order.
Total number of permutations of N objects = N! (N factorial)
Where
N! = 1 * 2 * 3 *...* (N-1) * N
0! = 1
If some of the N objects are similar, such as N₁ objects are alike, N₂ objects are alike … Nₖ objects are alike, and ΣNᵢ = N.
The total number of permutations of these N objects = N! / (N₁!N₂!...Nₖ!)
If only r objects can be taken in each permutation.
The total number of permutations of r objects of N objects = N! / (N - r)!
Its notion is ɴPᵣ, where ɴPɴ is full permutation, i.e. N!
Combination represents number of ways of selecting r objects from N objects, irrespective of order. In contrast, ɴPᵣ selects r objects from N objects and the order matters.
The total number of combinations of r distinct of N objects = ɴPᵣ / ᵣPᵣ
= N! / (r!(N-r)!)
Its notion is ɴCᵣ reads as N choose r.
Sometimes the number of combinations is known as a binomial coefficient, and sometimes the notation ɴCᵣ is used.
Combination is also often represented using following notion:
Many experiments share the common element that their outcomes can be classified into one of two events, one can be labeled as “success” and the other as failure. A Bernoulli trial is each repetition of an experiment involving only 2 outcomes:
We are often interested in the result of independent, repeated Bernoulli trials, i.e. the number of successes in repeated trials.
A binomial distribution gives us the probabilities associated with independent, repeated Bernoulli trials.
A binomial distribution describes the probabilities of those of
The probability of getting r successes in N independent trials with each having p success probability:
p(X = r; N, p) = number of ways event can occur * p(one occurrence)
= ɴCᵣ * pʳ * (1 - p)⁽ᴺ⁻ʳ⁾
More formally, in sampling a stationary Bernoulli process, with the probability of success equal to p, the probability of observing exactly r successes in N independent trials is:
Another way of defining binomial distribution:
Mean:
E(Xᵢ) = Σxᵢp(xᵢ)
= 0 * (1 - p) + 1 * p
= p
E(X) = E(X₁ + X₂ ... + Xɴ)
= E(X₁) + E(X₂) + ...+ E(Xɴ)
= Np
Variance:
xᵢ = xᵢ²
V(xᵢ) = E(xᵢ²) - E(xᵢ)²
= p - p²
= p(1 - p)
= pq
V(X) = V(X₁) + V(X₂) ... + V(Xɴ)
= Npq
For small p and small N, the binomial distribution is what we call skewed right. That is, the bulk of the probability falls in the smaller numbers, and the distribution tails off to the right.
For large p and small N, the binomial distribution is what we call skewed left. That is, the bulk of the probability falls in the larger numbers and the distribution tails off to the left.
For p = 0.5 and large and small N, the binomial distribution is what we call symmetric. That is, the distribution is without skewness.
When p ≠ 0.5, when N becomes large, the binomial distribution approaches symmetry.
In a family of 11 children, what is the probability that there will be more boys than girls?
Solve this problem WITHOUT using the complements rule.
Solution:
p(boy) = 0.5
N = 11
p(more boys than girls) = p(6, N, p(boy)) + p(7, N, p(boy)) ... + p(11, N, p(boy))
= 0.2256 + 0.1611 + 0.0806 + 0.0269 + 0.0054 + 0.0005
= 0.5
Symmetric, bell shaped. It describes data that clusters around a mean with symmetric spread.
Continuous for all values of X between -∞ and ∞ so that each conceivable interval of real numbers has a probability greater than 0.
-∞ ≤ X ≤ ∞
Two parameters, µ and σ. Note that the normal distribution is actually a family of distributions, since µ and σ determine the shape of the distribution.
Probability density function (PDF):
The notation N(µ, σ²) means normally distributed with mean µ and variance σ². If we say X ~ N(µ, σ²), we mean that X is distributed N(µ, σ²).
About 2⁄3 of cases fall within 1 standard deviation of the mean, that is:
p(µ - σ ≤ X ≤ µ + σ) = 0.6826
About 95% of cases fall within 2 standard deviations of the mean, that is
p(µ - 2σ ≤ X ≤ µ + 2σ) = 0.9544
Usage in time series:
Working with PDF is tedious. The trick is to convert arbitrary normal distribution with µ and σ into a standardized normal distribution N(0, 1), i.e. µ = 0 and σ = 1.
N(0, 1) Lookup table and how it works can be found here
Define CDF as 𝐹(x) = p(X ≤ x):
Rule #1
p(Z ≤ a) = 𝐹(a) when a is positive
= 1 - 𝐹(-a) when a is negative
Due to the symmetry of the curve, when 𝐹(a) > 0.5, a > 0, and
when 𝐹(a) < 0.5, a < 0.
Rule #2
p(Z ≥ a) = 1 - 𝐹(a) when a is positive
= 𝐹(-a) when a is negative
Rule #3
p(a ≤ Z ≤ b) = 𝐹(b) - 𝐹(a)
Rule #4
Assume a positive a:
p(-a ≤ Z ≤ a) = 𝐹(a) - 𝐹(-a)
= 𝐹(a) - (1 - 𝐹(a))
= 2𝐹(a) - 1
Below are some examples of how to use standardized scores to address various questions.
The top 5% of applicants (as measured by GRE scores) will receive scholarships.
If GRE ~ N(500, 100²), what is the GRE score to qualify for a scholarship?
Solution:
Let X = GRE, want to find x such that p(X ≥ x) = 0.05
Let Z = (X - 500) / 100 ~ N(0, 1)
For p(Z ≥ z) = 0.05, z ≈ 1.65
x = (z * 100) + 500 = 665
Family income ~ N(25000, 10000²).
If the poverty level is $10,000, what percentage of the population lives in poverty?
Solution:
Let X = family income, want to find p(X ≤ 10000).
Let Z = (X - 25000) / 10000 ~ N(0, 1)
z = (10000 - 25000) / 10000 = -1.5
p(Z ≤ -1.5) = 1 - p(Z ≤ 1.5)
= 1 - 0.9332
= 0.0668
A new tax law is expected to benefit “middle income” families, those with incomes between
$20,000 and $30,000. If Family income ~ N(25000, 10000²), what percentage of the population
will benefit from the law?
Solution:
Let X = family income, want to find p(20000 ≤ X ≤ 30000)
Let Z = (X - 25000) / 10000 ~ N(0, 1)
z₀ = (20000 - 25000) / 10000 = -0.5
z₁ = (30000 - 25000) / 10000 = 0.5
p(20000 ≤ X ≤ 30000) = p(-0.5 ≤ Z ≤ 0.5)
= 2𝐹(0.5) - 1
= 1.383 - 1
= 0.383
For a large enough N, a binomial variable X is approximately ~N(Np, Npq). The normal distribution can be used to approximate the binomial distribution.
The Poisson distribution models the number of times an event happens in a fixed interval of time or space when events occur independently at a constant average rate. Examples of Poisson random variable:
If X is a Poisson random variable, then the probability mass function (PMF) is:
Verify ∑𝑓(x) = 1
Taylor series for eˣ = ∑xᵏ/k! for k = 0, 1, 2, … Now ∑𝑓(x) = ∑(e^-λ * λᵏ / k!)
= e^-λ * ∑(λᵏ / k!)
= e^-λ * e^λ
= 1
Mean and variance of a Poisson random variable are both λ.
There are theoretically an infinite number of possible Poisson distributions. Any specific Poisson distribution depends on the parameter λ.
Let X denote the number of events in a given continuous interval. It follows an approximate Poisson process with parameter λ > 0 if:
Let X equal the number of typos on a printed page with a mean of 3 typos per page.
What is the probability that a randomly selected page has at least 1 typo on it?
Solution:
p(X ≥ 1) = 1 - p(X = 0)
= 1 - e⁻³3⁰ / 0!
= 1 - e⁻³
= 0.9502
What is the probability that a randomly selected page has at most 1 typo on it?
Solution:
p(X ≤ 1) = p(X = 0) + p(X = 1)
= e⁻³3⁰ / 0! + e⁻³3¹ / 1!
= e⁻³ + 3e⁻³
= 0.1992
The Poisson distribution can be viewed as the limit of binomial distribution. Suppose X ~ Binomial(N, λ/N) where N is very large and λ/N is very small. We show that the PMF of X can be approximated by the PMF of a Poisson(λ).
In the screenshot, n is the Binomial distribution parameter N. λ is the Poisson distribution parameter. λ/N is the Binomial distribution parameter p. The k is the fixed value for Poisson random variable.
An intuitive understanding is that when N becomes larger, the Poisson interval is divided into N smaller sub-intervals (λ/N). The the sub-interval becomes sufficiently small, it can guarantee only one event happens in each sub-interval. If we regard an event occurrence in a sub-interval as a “success” in binomial distribution, then the following 2 probabilities are equivalent:
This is useful because Poisson PMF is much easier to compute than the binomial.