Expectation and Variance

Expectation and variance are two fundamental concepts in probability theory that describe the central tendency and spread of a random variable’s distribution.

Expected Value (Mean)

The expected value, also known as the mean or expectation, represents the average value of a random variable over many trials.

For Discrete Random Variables

$\mathbb{E}[X] = \mu_X = \sum_{x} x \cdot p_X(x)$

where:

$p_X(x)$ is the probability mass function (PMF)
The sum is taken over all possible values of $X$

Properties:

Linearity and Homogeneity: $\mathbb{E}[aX + b] = a\mathbb{E}[X] + b$
For two random variables: $\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$
For independent random variables: $\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]$

Proofs of Key Properties

ProofLinearity

For $a, b \in \mathbb{R}$

\begin{aligned} \mathbb{E}[aX + b] &= \sum_{x} (ax + b) \cdot p_X(x) \\ &= a\sum_{x} x \cdot p_X(x) + b\sum_{x} p_X(x) \\ &= a\mathbb{E}[X] + b \end{aligned}

ProofAdditivity

\begin{aligned} \mathbb{E}[X + Y] &= \sum_{x}\sum_{y} (x + y) \cdot p_{X,Y}(x,y) \\ &= \sum_{x}\sum_{y} x \cdot p_{X,Y}(x,y) + \sum_{x}\sum_{y} y \cdot p_{X,Y}(x,y) \\ &= \mathbb{E}[X] + \mathbb{E}[Y] \end{aligned}

ProofProduct for Independent Variables

If $X$ and $Y$ are independent, then $p_{X,Y}(x,y) = p_X(x)p_Y(y)$ , so:

\begin{aligned} \mathbb{E}[XY] &= \sum_{x}\sum_{y} xy \cdot p_{X,Y}(x,y) \\ &= \sum_{x}\sum_{y} xy \cdot p_X(x)p_Y(y) \\ &= \left(\sum_{x} x p_X(x)\right)\left(\sum_{y} y p_Y(y)\right) \\ &= \mathbb{E}[X]\mathbb{E}[Y] \end{aligned}

For Continuous Random Variables

$\mathbb{E}[X] = \mu_X = \int_{-\infty}^{\infty} x \cdot f_X(x)dx$

where:

$f_X(x)$ is the probability density function (PDF)

Variance

Variance measures how much the values of a random variable deviate from its mean.

DefinitionVariance

$\mathbb{V}(X) = \sigma_X^2 = \mathbb{E}[(X - \mu_X)^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$

For Discrete Random Variables

$\mathbb{V}(X) = \sum_{x} (x - \mu_X)^2 \cdot p_X(x)$

For Continuous Random Variables

$\mathbb{V}(X) = \int_{-\infty}^{\infty} (x - \mu_X)^2 \cdot f_X(x)dx$

Standard Deviation

The standard deviation is the square root of the variance: $\sigma_X = \sqrt{\mathbb{V}(X)}$

Properties of Variance

$\mathbb{V}(X) \geq 0$
$\mathbb{V}(a) = 0$ for any constant $a$
$\mathbb{V}(aX) = a^2 \mathbb{V}(X)$
$\mathbb{V}(X + a) = \mathbb{V}(X)$
For independent random variables: $\mathbb{V}(X + Y) = \mathbb{V}(X) + \mathbb{V}(Y)$

Proofs of Variance Properties

ProofScaling

For $a \in \mathbb{R}$

\begin{aligned} \mathbb{V}(aX) &= \mathbb{E}[(aX - \mathbb{E}[aX])^2] \\ &= \mathbb{E}[(aX - a\mathbb{E}[X])^2] \\ &= \mathbb{E}[a^2(X - \mathbb{E}[X])^2] \\ &= a^2\mathbb{E}[(X - \mathbb{E}[X])^2] \\ &= a^2\mathbb{V}(X) \end{aligned}

ProofShift Invariance

\begin{aligned} \mathbb{V}(X + a) &= \mathbb{E}[(X + a - \mathbb{E}[X + a])^2] \\ &= \mathbb{E}[(X + a - \mathbb{E}[X] - a)^2] \\ &= \mathbb{E}[(X - \mathbb{E}[X])^2] \\ &= \mathbb{V}(X) \end{aligned}

ProofAdditivity for Independent Variables

If $X$ and $Y$ are independent:

\begin{aligned} \mathbb{V}(X + Y) &= \mathbb{E}[(X + Y)^2] - (\mathbb{E}[X + Y])^2 \\ &= \mathbb{E}[X^2 + 2XY + Y^2] - (\mathbb{E}[X] + \mathbb{E}[Y])^2 \\ &= \mathbb{E}[X^2] + 2\mathbb{E}[X]\mathbb{E}[Y] + \mathbb{E}[Y^2] - \mathbb{E}[X]^2 - 2\mathbb{E}[X]\mathbb{E}[Y] - \mathbb{E}[Y]^2 \\ &= (\mathbb{E}[X^2] - \mathbb{E}[X]^2) + (\mathbb{E}[Y^2] - \mathbb{E}[Y]^2) \\ &= \mathbb{V}(X) + \mathbb{V}(Y) \end{aligned}

Examples

ExampleDiscrete Case (Die Roll)

For a fair six-sided die:

PMF: $p_X(x) = \frac{1}{6}$ for $x \in \{1, 2, 3, 4, 5, 6\}$

Expected Value: $\mathbb{E}[X] = \sum_{x=1}^{6} x \cdot \frac{1}{6} = \frac{1+2+3+4+5+6}{6} = \frac{21}{6} = 3.5$

Variance: $\mathbb{E}[X^2] = \sum_{x=1}^{6} x^2 \cdot \frac{1}{6} = \frac{1+4+9+16+25+36}{6} = \frac{91}{6}$ $\mathbb{V}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \frac{91}{6} - (3.5)^2 = \frac{91}{6} - \frac{49}{4} = \frac{182 - 147}{12} = \frac{35}{12} \approx 2.92$

ExampleContinuous Case (Normal Distribution)

For $X \sim N(\mu, \sigma^2)$ :

PDF: $f_X(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

Expected Value: $\mathbb{E}[X] = \mu$

Variance: $\mathbb{V}(X) = \sigma^2$

ExampleContinuous Case (Uniform Distribution)

For $X \sim U(a, b)$ :

PDF: $f_X(x) = \frac{1}{b-a}$ for $a \leq x \leq b$

Expected Value: $\mathbb{E}[X] = \int_a^b x \cdot \frac{1}{b-a} dx = \frac{a+b}{2}$

Variance: $\mathbb{V}(X) = \int_a^b \left(x - \frac{a+b}{2}\right)^2 \cdot \frac{1}{b-a} dx = \frac{(b-a)^2}{12}$

Expectation of Functions of Random Variables

When we apply a function to a random variable, we obtain a new random variable. Computing the expectation of this new random variable is a fundamental problem in probability theory.

Law of the Unconscious Statistician (LOTUS)

The core principle for computing expectations of functions of random variables is the Law of the Unconscious Statistician (LOTUS). This law states that to compute $\mathbb{E}[g(X)]$ , we don’t need to first find the distribution of $g(X)$ . Instead, we can work directly with the original distribution of $X$ .

Computation Formula

For a function $g: \mathbb{R} \to \mathbb{R}$ and random variable $X$ , the expectation of $g(X)$ is:

\mathbb{E}[g(X)] = \begin{cases} \sum_{x} g(x) \cdot p_X(x) & \text{(discrete)} \\ \int_{-\infty}^{\infty} g(x) \cdot f_X(x) dx & \text{(continuous)} \end{cases}

Important Properties

Linearity: $\mathbb{E}[a \cdot g(X) + b \cdot h(X)] = a\mathbb{E}[g(X)] + b\mathbb{E}[h(X)]$
Monotonicity: If $g(x) \leq h(x)$ for all $x$ , then $\mathbb{E}[g(X)] \leq \mathbb{E}[h(X)]$

Application Examples

ExampleExpectation of Square Function

For any random variable $X$ , computing $\mathbb{E}[X^2]$ :

Discrete case: $\mathbb{E}[X^2] = \sum_{x} x^2 \cdot p_X(x)$
Continuous case: $\mathbb{E}[X^2] = \int_{-\infty}^{\infty} x^2 \cdot f_X(x) dx$

This result gives the standard variance formula: $\mathbb{V}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$

ExampleExpectation of Exponential Function

For any random variable $X$ , computing $\mathbb{E}[e^{tX}]$ :

Discrete case: $\mathbb{E}[e^{tX}] = \sum_{x} e^{tx} \cdot p_X(x)$
Continuous case: $\mathbb{E}[e^{tX}] = \int_{-\infty}^{\infty} e^{tx} \cdot f_X(x) dx$

This is the definition of the moment generating function, which has wide applications in probability theory.

Numerical Estimation Methods

When functions are complex or distributions are non-standard, analytical solutions may be difficult to obtain. In such cases, we can use Taylor series approximation for numerical estimation.

Taylor Series Approximation Method

For a random variable $X$ with mean $\mu$ and variance $\sigma^2$ , the expectation and variance of $f(X)$ can be approximated using Taylor expansion.

ProofApproximation Derivation for Expectation

Perform second-order Taylor expansion of $f(X)$ around $\mu$ : $f(X) = f(\mu) + f'(\mu)(X-\mu) + \frac{f''(\mu)}{2}(X-\mu)^2 + R_2$ where $R_2$ is the remainder term.
Take expectation of both sides: $\mathbb{E}[f(X)] = \mathbb{E}[f(\mu)] + \mathbb{E}[f'(\mu)(X-\mu)] + \mathbb{E}\left[\frac{f''(\mu)}{2}(X-\mu)^2\right] + \mathbb{E}[R_2]$
Since $f(\mu)$ , $f'(\mu)$ , and $f''(\mu)$ are constants: $\mathbb{E}[f(X)] = f(\mu) + f'(\mu)\mathbb{E}[X-\mu] + \frac{f''(\mu)}{2}\mathbb{E}[(X-\mu)^2] + \mathbb{E}[R_2]$
Using $\mathbb{E}[X-\mu] = 0$ and $\mathbb{E}[(X-\mu)^2] = \sigma^2$ , and ignoring higher-order remainder terms: $\mathbb{E}[f(X)] \approx f(\mu) + \frac{f''(\mu)}{2}\sigma^2$

ProofApproximation Derivation for Variance

Use first-order Taylor expansion (usually sufficient for variance calculation): $f(X) \approx f(\mu) + f'(\mu)(X-\mu)$
Since $f(\mu)$ is constant, it doesn’t affect variance: $\mathbb{V}[f(X)] \approx \mathbb{V}[f'(\mu)(X-\mu)]$
Constant factors can be factored out: $\mathbb{V}[f(X)] \approx [f'(\mu)]^2 \mathbb{V}[X-\mu]$
Since $\mathbb{V}[X-\mu] = \mathbb{V}[X] = \sigma^2$ : $\mathbb{V}[f(X)] \approx [f'(\mu)]^2 \sigma^2$

Summary Formulas:

\begin{aligned} \mathbb{E}\left[f(X)\right] &\approx f(\mu) + f''(\mu)\frac{\sigma^2}{2} \\ \mathbb{V}\left[f(X)\right] &\approx \left(f'(\mu)\right)^2\sigma^2 \end{aligned}

Approximation Accuracy Notes

The expectation approximation uses second-order expansion, providing higher accuracy
The variance approximation uses first-order expansion; for strongly nonlinear functions, higher-order terms may be needed
When $f(X)$ is a linear function, the approximation is exact
The more concentrated the distribution of $X$ (smaller $\sigma^2$ ), the better the approximation

Covariance and Correlation

When working with multiple random variables, we often want to measure their relationship.

Covariance

$\text{Cov}(X,Y) = \mathbb{E}[(X - \mu_X)(Y - \mu_Y)] = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]$

Correlation Coefficient

$\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}$

Properties:

$-1 \leq \rho_{X,Y} \leq 1$
$\rho = 1$ : Perfect positive linear relationship
$\rho = -1$ : Perfect negative linear relationship
$\rho = 0$ : No linear relationship (but may have non-linear relationship)

Common Distributions and Their Moments

Distribution	Expected Value	Variance
Bernoulli(p)	$p$	$p(1-p)$
Binomial(n,p)	$np$	$np(1-p)$
Poisson(λ)	$\lambda$	$\lambda$
Uniform(a,b)	$\frac{a+b}{2}$	$\frac{(b-a)^2}{12}$
Normal(μ,σ²)	$\mu$	$\sigma^2$
Exponential(λ)	$\frac{1}{\lambda}$	$\frac{1}{\lambda^2}$

Important Theorems

TheoremLaw of Large Numbers

For i.i.d. random variables $X_1, X_2, ..., X_n$ with mean $\mu$ : $\frac{1}{n}\sum_{i=1}^{n} X_i \xrightarrow{P} \mu \text{ as } n \to \infty$

TheoremCentral Limit Theorem

For i.i.d. random variables with mean $\mu$ and variance $\sigma^2$ : $\frac{\sum_{i=1}^{n} X_i - n\mu}{\sigma\sqrt{n}} \xrightarrow{D} N(0,1) \text{ as } n \to \infty$

Expectation with Multiple Random Variables

When working with functions of multiple random variables, we need to understand how to compute their expectations.

Expectation of Functions of Multiple Variables

For a function $g(X,Y)$ of two random variables, the expectation is computed using the joint distribution:

\mathbb{E}[g(X,Y)] = \begin{cases} \sum_{x}\sum_{y} g(x,y) \cdot p_{X,Y}(x,y) & \text{(discrete)} \\ \iint_{\mathbb{R}^2} g(x,y) \cdot f_{X,Y}(x,y) dx dy & \text{(continuous)} \end{cases}

Key Properties

From this definition, we derive important properties:

Linearity: $\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$ (always holds)
Products: $\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]$ (holds only when X and Y are independent)

Computing Expectations from Joint Distributions

Geometric Interpretation for Continuous Case

For a joint probability density function $f(x,y)$ , computing $\mathbb{E}[X]$ involves integrating over the entire plane:

$\mathbb{E}[X] = \iint_{\mathbb{R}^2} x \cdot f(x,y) dx dy$

This can be understood geometrically as finding the “center of mass” in the x-direction of the 3D surface formed by the joint density.

The computation can be done in two equivalent ways:

Direct integration: Integrate $x \cdot f(x,y)$ over the entire plane
Using marginal density: First find $f_X(x) = \int_{-\infty}^{\infty} f(x,y) dy$ , then compute $\mathbb{E}[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) dx$

The second approach works because: $\mathbb{E}[X] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x \cdot f(x,y) dy dx = \int_{-\infty}^{\infty} x \left(\int_{-\infty}^{\infty} f(x,y) dy\right) dx = \int_{-\infty}^{\infty} x \cdot f_X(x) dx$

Connection to Discrete Case

Similarly, for discrete random variables: $\mathbb{E}[X] = \sum_{x}\sum_{y} x \cdot p_{X,Y}(x,y) = \sum_{x} x \left(\sum_{y} p_{X,Y}(x,y)\right) = \sum_{x} x \cdot p_X(x)$

This shows that whether we work with joint distributions directly or first compute marginal distributions, we arrive at the same expectation.

Conditional Expectation

The conditional expectation of $Y$ given $X = x$ is:

\mathbb{E}[Y|X = x] = \begin{cases} \sum_{y} y \cdot p_{Y|X}(y|x) & \text{(discrete)} \\ \int_{-\infty}^{\infty} y \cdot f_{Y|X}(y|x) dy & \text{(continuous)} \end{cases}

This leads to the law of total expectation: $\mathbb{E}[Y] = \mathbb{E}[\mathbb{E}[Y|X]]$

Expectation and Variance

Expected Value (Mean)

For Discrete Random Variables

Proofs of Key Properties

For Continuous Random Variables

Variance

For Discrete Random Variables

For Continuous Random Variables

Standard Deviation

Properties of Variance

Proofs of Variance Properties

Examples

Expectation of Functions of Random Variables

Law of the Unconscious Statistician (LOTUS)

Computation Formula

Important Properties

Application Examples

Numerical Estimation Methods

Taylor Series Approximation Method

Approximation Accuracy Notes

Covariance and Correlation

Covariance

Correlation Coefficient

Common Distributions and Their Moments

Important Theorems

Expectation with Multiple Random Variables

Expectation of Functions of Multiple Variables

Key Properties

Computing Expectations from Joint Distributions

Geometric Interpretation for Continuous Case

Connection to Discrete Case

Conditional Expectation

Discussion