A compact note on expected value, variance, standard deviation, LOTUS, linearity, and the basic proof patterns used throughout probability theory.
MathematicsProbability TheoryExpectationVariance
Expectation and Variance
Expectation and variance are two fundamental concepts in probability theory that describe the central tendency and spread of a random variable’s distribution.
Expected Value (Mean)
The expected value, also known as the mean or expectation, represents the average value of a random variable over many trials.
When we apply a function to a random variable, we obtain a new random variable. Computing the expectation of this new random variable is a fundamental problem in probability theory.
Law of the Unconscious Statistician (LOTUS)
The core principle for computing expectations of functions of random variables is the Law of the Unconscious Statistician (LOTUS). This law states that to compute E[g(X)], we don’t need to first find the distribution of g(X). Instead, we can work directly with the original distribution of X.
Computation Formula
For a function g:R→R and random variable X, the expectation of g(X) is:
Monotonicity: If g(x)≤h(x) for all x, then E[g(X)]≤E[h(X)]
Application Examples
ExampleExpectation of Square Function
For any random variable X, computing E[X2]:
Discrete case: E[X2]=∑xx2⋅pX(x)
Continuous case: E[X2]=∫−∞∞x2⋅fX(x)dx
This result gives the standard variance formula: V(X)=E[X2]−(E[X])2
ExampleExpectation of Exponential Function
For any random variable X, computing E[etX]:
Discrete case: E[etX]=∑xetx⋅pX(x)
Continuous case: E[etX]=∫−∞∞etx⋅fX(x)dx
This is the definition of the moment generating function, which has wide applications in probability theory.
Numerical Estimation Methods
When functions are complex or distributions are non-standard, analytical solutions may be difficult to obtain. In such cases, we can use Taylor series approximation for numerical estimation.
Taylor Series Approximation Method
For a random variable X with mean μ and variance σ2, the expectation and variance of f(X) can be approximated using Taylor expansion.
ProofApproximation Derivation for Expectation
Perform second-order Taylor expansion of f(X) around μ:
f(X)=f(μ)+f′(μ)(X−μ)+2f′′(μ)(X−μ)2+R2
where R2 is the remainder term.
Take expectation of both sides:
E[f(X)]=E[f(μ)]+E[f′(μ)(X−μ)]+E[2f′′(μ)(X−μ)2]+E[R2]
Since f(μ), f′(μ), and f′′(μ) are constants:
E[f(X)]=f(μ)+f′(μ)E[X−μ]+2f′′(μ)E[(X−μ)2]+E[R2]
Using E[X−μ]=0 and E[(X−μ)2]=σ2, and ignoring higher-order remainder terms:
E[f(X)]≈f(μ)+2f′′(μ)σ2
ProofApproximation Derivation for Variance
Use first-order Taylor expansion (usually sufficient for variance calculation):
f(X)≈f(μ)+f′(μ)(X−μ)
Since f(μ) is constant, it doesn’t affect variance:
V[f(X)]≈V[f′(μ)(X−μ)]
Constant factors can be factored out:
V[f(X)]≈[f′(μ)]2V[X−μ]
Since V[X−μ]=V[X]=σ2:
V[f(X)]≈[f′(μ)]2σ2
Summary Formulas:
E[f(X)]V[f(X)]≈f(μ)+f′′(μ)2σ2≈(f′(μ))2σ2
Approximation Accuracy Notes
The expectation approximation uses second-order expansion, providing higher accuracy
The variance approximation uses first-order expansion; for strongly nonlinear functions, higher-order terms may be needed
When f(X) is a linear function, the approximation is exact
The more concentrated the distribution of X (smaller σ2), the better the approximation
Covariance and Correlation
When working with multiple random variables, we often want to measure their relationship.
Covariance
Cov(X,Y)=E[(X−μX)(Y−μY)]=E[XY]−E[X]E[Y]
Correlation Coefficient
ρX,Y=σXσYCov(X,Y)
Properties:
−1≤ρX,Y≤1
ρ=1: Perfect positive linear relationship
ρ=−1: Perfect negative linear relationship
ρ=0: No linear relationship (but may have non-linear relationship)
Common Distributions and Their Moments
Distribution
Expected Value
Variance
Bernoulli(p)
p
p(1−p)
Binomial(n,p)
np
np(1−p)
Poisson(λ)
λ
λ
Uniform(a,b)
2a+b
12(b−a)2
Normal(μ,σ²)
μ
σ2
Exponential(λ)
λ1
λ21
Important Theorems
TheoremLaw of Large Numbers
For i.i.d. random variables X1,X2,...,Xn with mean μ:
n1∑i=1nXiPμ as n→∞
TheoremCentral Limit Theorem
For i.i.d. random variables with mean μ and variance σ2:
σn∑i=1nXi−nμDN(0,1) as n→∞
Expectation with Multiple Random Variables
When working with functions of multiple random variables, we need to understand how to compute their expectations.
Expectation of Functions of Multiple Variables
For a function g(X,Y) of two random variables, the expectation is computed using the joint distribution:
Discussion