The mean is just the average value. It tells you where the center of your data is.
For n numbers: x₁, x₂, …, xₙ, the mean μ is:
μ = (1/n) * Σxᵢ
The expected value of a random variable is the arithmetic mean of that variable, i.e. E(X) = µ.
Discrete variable: the expected value of a discrete random variable, X, is found by multiplying each X value by its probability and then summing over all values of the random variable.
E(X) = Σxᵢp(xᵢ) = µₓ
Continuous variable: for a continuous variable X ranging over all the real numbers, the expectation is defined by integration.
E(X) = ∫x𝑓(x)𝑑x = µₓ
The variance tells you how spread out the data is — how far from the mean the values tend to be.
The variance of a random variable X is defined as the expected squared deviation of the values of this random variable about their mean.
V(X) = E((X - μ)²) = E(X²) - μ² = σ²
In the discrete case, this is equivalent to:
V(X) = Σ(xᵢ - μ)²p(xᵢ) = σ²
Sample is a subset of the whole population. Sample mean is:
x̄ = (1/n) * Σxᵢ where xᵢ is a subset (size of n) of all possible x's
Sample variance:
s² = (1/(n-1)) * Σ(xᵢ - x̄)²
= (1/(n-1)) * E(Σ(xᵢ² - 2xᵢx̄ + x̄²))
= (1/(n-1)) * E(Σxᵢ² - Σ2xᵢx̄ + Σx̄²)
= (1/(n-1)) * E(Σxᵢ² - 2nx̄² + nx̄²)
= (1/(n-1)) * (Σxᵢ² - nx̄²) where xᵢ is a subset (size of n) of all possible x's
Then we further have mean and variance for the mean estimator X̄:
E(X̄) = E((X₁ + X₂ + ... + Xₙ) / n)
= (E(X₁) + E(X₂) + ... + E(Xₙ)) / n
= μ
V(X̄) = V((X₁ + X₂ + ... + Xₙ) / n)
= V(X₁ + X₂ + ... + Xₙ) / n²
= (V(X₁) + V(X₂) + ... + V(Xₙ)) / n² [Xᵢ are independent, check rule#15 below]
= σ² / n
Proof for why does sample variance have n - 1 in the denominator is in the appendiex.
The standard deviation is just the square root of variance. It is in the same units as the original data, easier to interpret.
StdDev(X) = sqrt(Variance) = σₓ
Covariance measures how two variables move together.
For two variables X and Y:
Cov(X, Y) = E((xᵢ - E(X)) * (yᵢ - E(Y)))
Intuition:
Correlation is a scaled version of covariance, it removes the units and normalizes between -1 and +1.
+1: perfect positive linear relationship
-1: perfect negative linear relationship
0: no linear relationship
Corr(X, Y) = Cov(X, Y) / (StdDev(X) * StdDev(Y))
Autocorrelation (also called serial correlation) measures how a time series is correlated with a lagged version of itself. In simple terms, does today’s value depend on past values? If yes, autocorrelation exists.
For time series Xₜ, lag k:
Autocorrelation at lag k = Corr(Xₜ, Xₜ₋ₖ)
Interpretation
Autocorrelation Value | Meaning |
---|---|
Close to +1 | Strong positive correlation, values repeat trend |
Close to -1 | Strong negative correlation, values alternate up/down |
Close to 0 | No correlation, no predictable pattern |
Autocorrelation Plot (ACF Plot)
Assume following, a and b are any given constants. X and Y are random variables.
# | Rule | Note |
---|---|---|
1 | E(X) = Σxᵢp(xᵢ) = µₓ | discrete variable |
2 | E(g(X)) = Σg(xᵢ)p(xᵢ) = µ₉₍ₓ₎ | discrete variable, g(X) is some function of X |
3 | E(a) = a | the expectation of a constant is the constant |
4 | E(aX) = a * E(X) | multipling every value by 2, the expectation doubles |
5 | E(a ± X) = a ± E(X) | adding 7 to every case, the expectation will increase by 7 |
6 | E(a ± bX) = a ± bE(X) | |
7 | E((a ± X) * b) = (a ± E(X)) * b | |
8 | E(X + Y) = E(X) + E(Y) | |
9 | If X and Y are independent: E(XY) = E(X)E(Y) | |
10 | V(X) = E((X - μ)²) = E(X²) - E(X)² = E(X²) - μ² = σₓ² | |
11 | V(a) = 0 | a constant does not vary |
12 | V(a ± X) = V(X) | adding a constant to a variable does not change its variance |
13 | V(a ± bX) = b² * V(X) | |
14 | V(X ± Y) = V(X) + V(Y) ± 2COV(X,Y) | |
15 | If X and Y are independent, V(X ± Y) = V(X) + V(Y) | |
16 | Cov(X,Y) = E((X - E(X)) * (Y - E(Y)) = E(XY) - E(X)E(Y) | |
17 | If X and Y are independent, Cov(X,Y) = 0 | Cov(X,Y) = 0 does not necessarily mean X and Y are independent |
Prove V(X) = E((X - μ)²) = E(X²) - μ²
Solution:
V(X) = E((X - μ)²)
= E(X² - 2Xμ + μ²) [expand]
= E(X²) - E(2Xμ) + E(μ²) [rule#7]
= E(X²) - 2μE(X) + μ² [rule#7]
= E(X²) - 2μ² + μ²
= E(X²) - μ²
Prove V(aX) = a² * V(X)
Solution:
Let Y = aX
V(Y) = E(Y²) - E(Y)² [rule#10]
= E(a²X²) - E(aX)²
= a²E(X²) - a²E(X)² [rule#4]
= a²(E(X²) - E(X)²)
= a²V(X)
Let Z = (X - µₓ)/σₓ, find E(Z) and V(Z)
Solution:
E(Z) = E((X - µₓ) / σₓ)
= (E(X) - µₓ) / σₓ [rule#7,rule#3]
= 0
V(Z) = V((X - µₓ) / σₓ) [rule#13]
= V(X) / σₓ² [rule#12]
= 1
Finding the mean and variance for the number of heads obtained in 3 coin tosses.
Solution:
Let Xᵢ = 1 if the ith coin toss comes up heads, 0 otherwise.
So Xᵢ² = Xᵢ (0² = 0, 1² = 1).
E(X₁) = E(X₂) = E(X₃) = 0.5
E(X₁ + X₂ + X₃) = E(X₁) + E(X₂) + E(X₃) = 1.5 [rule#8]
V(X₁) = V(X₂) = V(X₃) = E(X²) - E(X)² = 0.5 - 0.25 = 0.25 [rule#10]
V(X₁ + X₂ + X₃) = V(X₁) + V(X₂) + V(X₃) = 0.75 [rule#15]
Concept | Meaning |
---|---|
Mean | Average — where the data centers |
Variance | How spread out the data is |
Std Dev | Square root of variance — easier to interpret as “typical deviation” |
Covariance | Measures how two variables move together (units depend on X and Y) |
Correlation | Scaled covariance, unit-free, ranges [-1, +1], easy to interpret |
Autocorrelation | How current value relates to past values. Detect trends, mean reversion, model building |
Why important in time series / trading?
The reason we use n-1 rather than n is so that the sample variance will be an unbiased estimator of the population variance σ². That is:
The sample variance is the formula for a particular sample:
s² = (1/(n-1)) * (Σxᵢ² - nx̄²)
Now we define an estimator S² for s² by replacing variables by estimators:
S² = (1/(n-1)) * (ΣXᵢ² - nX̄²)
Xᵢ is an unbiased estimator for population X, thus xᵢ.
X̄ is an unbiased estimator for mean of population X, thus x̄.
Now need to prove S² is an unbiased estimator of the population variance σ², i.e. E(S²) = σ²
Proof:
(1) From population variance definition σ² = E(X²) - μ², we have E(X²) = σ² + μ²
(2) E(Xᵢ²) = E(X²) = σ² + μ²
(3) E(X̄²) = V(X̄) + E(X̄)²
= σ² / n + μ² [check Notation Clarifications section]
then:
E(S²) = E((1/(n-1)) * (ΣXᵢ² - nX̄²))
= (1/(n-1)) * E(ΣXᵢ² - nX̄²)
= (1/(n-1)) * (ΣE(Xᵢ²) - nE(X̄²))
= (1/(n-1)) * (n(σ² + μ²) - n(σ² / n + μ²))
= σ²