3 Expectations

Mean

The mean is just the average value. It tells you where the center of your data is.

For n numbers: x₁, x₂, …, xₙ, the mean μ is:

μ = (1/n) * Σxᵢ

The expected value of a random variable is the arithmetic mean of that variable, i.e. E(X) = µ.

Variance

The variance tells you how spread out the data is — how far from the mean the values tend to be.

The variance of a random variable X is defined as the expected squared deviation of the values of this random variable about their mean.

V(X) = E((X - μ)²) = E(X²) - μ² = σ²

In the discrete case, this is equivalent to:

V(X) =  Σ(xᵢ - μ)²p(xᵢ) = σ²

Sample Mean and Variance

Sample is a subset of the whole population. Sample mean is:

x̄ = (1/n) * Σxᵢ    where xᵢ is a subset (size of n) of all possible x's

Sample variance:

s² = (1/(n-1)) * Σ(xᵢ - x̄)²
   = (1/(n-1)) * E(Σ(xᵢ² - 2xᵢx̄ + x̄²))
   = (1/(n-1)) * E(Σxᵢ² - Σ2xᵢx̄ + Σx̄²)
   = (1/(n-1)) * E(Σxᵢ² - 2nx̄² + nx̄²)
   = (1/(n-1)) * (Σxᵢ² - nx̄²)    where xᵢ is a subset (size of n) of all possible x's

Notation Clarifications

Then we further have mean and variance for the mean estimator X̄:

E(X̄) = E((X₁ + X₂ + ... + Xₙ) / n)
     = (E(X₁) + E(X₂) + ... + E(Xₙ)) / n
     = μ

V(X̄) = V((X₁ + X₂ + ... + Xₙ) / n)
     = V(X₁ + X₂ + ... + Xₙ) / n²
     = (V(X₁) + V(X₂) + ... + V(Xₙ)) / n²     [Xᵢ are independent, check rule#15 below]
     = σ² / n

Proof for why does sample variance have n - 1 in the denominator is in the appendiex.

Standard Deviation

The standard deviation is just the square root of variance. It is in the same units as the original data, easier to interpret.

StdDev(X) = sqrt(Variance) = σₓ

Covariance

Covariance measures how two variables move together.

For two variables X and Y:

Cov(X, Y) = E((xᵢ - E(X)) * (yᵢ - E(Y)))

Intuition:

Correlation

Correlation is a scaled version of covariance, it removes the units and normalizes between -1 and +1.

Autocorrelation

Autocorrelation (also called serial correlation) measures how a time series is correlated with a lagged version of itself. In simple terms, does today’s value depend on past values? If yes, autocorrelation exists.

For time series Xₜ, lag k:

Autocorrelation at lag k = Corr(Xₜ, Xₜ₋ₖ)

Interpretation

Autocorrelation Value Meaning
Close to +1 Strong positive correlation, values repeat trend
Close to -1 Strong negative correlation, values alternate up/down
Close to 0 No correlation, no predictable pattern

Autocorrelation Plot (ACF Plot)

Expectation Rules

Assume following, a and b are any given constants. X and Y are random variables.

# Rule Note
1 E(X) = Σxᵢp(xᵢ) = µₓ discrete variable
2 E(g(X)) = Σg(xᵢ)p(xᵢ) = µ₉₍ₓ₎ discrete variable, g(X) is some function of X
3 E(a) = a the expectation of a constant is the constant
4 E(aX) = a * E(X) multipling every value by 2, the expectation doubles
5 E(a ± X) = a ± E(X) adding 7 to every case, the expectation will increase by 7
6 E(a ± bX) = a ± bE(X)
7 E((a ± X) * b) = (a ± E(X)) * b
8 E(X + Y) = E(X) + E(Y)
9 If X and Y are independent: E(XY) = E(X)E(Y)
10 V(X) = E((X - μ)²) = E(X²) - E(X)² = E(X²) - μ² = σₓ²
11 V(a) = 0 a constant does not vary
12 V(a ± X) = V(X) adding a constant to a variable does not change its variance
13 V(a ± bX) = b² * V(X)
14 V(X ± Y) = V(X) + V(Y) ± 2COV(X,Y)
15 If X and Y are independent, V(X ± Y) = V(X) + V(Y)
16 Cov(X,Y) = E((X - E(X)) * (Y - E(Y)) = E(XY) - E(X)E(Y)
17 If X and Y are independent, Cov(X,Y) = 0 Cov(X,Y) = 0 does not necessarily mean X and Y are independent

Exercises

Exercise 1

Prove V(X) = E((X - μ)²) = E(X²) - μ²

Solution:
V(X) = E((X - μ)²)
     = E(X² - 2Xμ + μ²)           [expand]
     = E(X²) - E(2Xμ) + E(μ²)     [rule#7]
     = E(X²) - 2μE(X) + μ²        [rule#7]
     = E(X²) - 2μ² + μ²
     = E(X²) - μ²

Exercise 2

Prove V(aX) = a² * V(X)

Solution:
Let Y = aX
V(Y) = E(Y²) - E(Y)²              [rule#10]
     = E(a²X²) - E(aX)²
     = a²E(X²) - a²E(X)²          [rule#4]
     = a²(E(X²) - E(X)²)
     = a²V(X)

Exercise 3

Let Z = (X - µₓ)/σₓ, find E(Z) and V(Z)

Solution:
E(Z) = E((X - µₓ) / σₓ)
     = (E(X) - µₓ) / σₓ           [rule#7,rule#3]
     = 0
V(Z) = V((X - µₓ) / σₓ)           [rule#13]
     = V(X) / σₓ²                 [rule#12]
     = 1

Exercise 4

Finding the mean and variance for the number of heads obtained in 3 coin tosses.

Solution:
Let Xᵢ = 1 if the ith coin toss comes up heads, 0 otherwise.
So Xᵢ² = Xᵢ (0² = 0, 1² = 1).

E(X₁) = E(X₂) = E(X₃) = 0.5
E(X₁ + X₂ + X₃) = E(X₁) + E(X₂) + E(X₃) = 1.5                [rule#8]
V(X₁) = V(X₂) = V(X₃) = E(X²) - E(X)² = 0.5 - 0.25 = 0.25    [rule#10]
V(X₁ + X₂ + X₃) = V(X₁) + V(X₂) + V(X₃) = 0.75               [rule#15]

Summary

Concept Meaning
Mean Average — where the data centers
Variance How spread out the data is
Std Dev Square root of variance — easier to interpret as “typical deviation”
Covariance Measures how two variables move together (units depend on X and Y)
Correlation Scaled covariance, unit-free, ranges [-1, +1], easy to interpret
Autocorrelation How current value relates to past values. Detect trends, mean reversion, model building

Why important in time series / trading?

Appendix

Why Does Sample Variance Have n - 1 in the Denominator?

The reason we use n-1 rather than n is so that the sample variance will be an unbiased estimator of the population variance σ². That is:

The sample variance is the formula for a particular sample:
    s² = (1/(n-1)) * (Σxᵢ² - nx̄²)
Now we define an estimator S² for s² by replacing variables by estimators:
    S² = (1/(n-1)) * (ΣXᵢ² - nX̄²)
Xᵢ is an unbiased estimator for population X, thus xᵢ.
X̄  is an unbiased estimator for mean of population X, thus x̄.

Now need to prove S² is an unbiased estimator of the population variance σ², i.e. E(S²) = σ²

Proof:
(1) From population variance definition σ² = E(X²) - μ², we have E(X²) = σ² + μ²
(2) E(Xᵢ²) = E(X²) = σ² + μ²
(3) E(X̄²) = V(X̄) + E(X̄)²
          = σ² / n + μ²            [check Notation Clarifications section]

then:
E(S²) = E((1/(n-1)) * (ΣXᵢ² - nX̄²))
      = (1/(n-1)) * E(ΣXᵢ² - nX̄²)
      = (1/(n-1)) * (ΣE(Xᵢ²) - nE(X̄²))
      = (1/(n-1)) * (n(σ² + μ²) - n(σ² / n + μ²))
      = σ²