Probability Distributions | Xingyu Yang

Probability distributions are mathematical functions that describe the likelihood of different outcomes for a random variable. They provide a complete description of the probability structure of random phenomena and are fundamental to statistical analysis and machine learning.

Overview

Probability distributions can be classified based on the nature of the random variable: discrete (countable outcomes), continuous (uncountable outcomes within intervals), or mixed (combinations). Each distribution is characterized by its support (possible values), probability function (PMF for discrete, PDF for continuous), cumulative distribution function, parameters, and moments.

DefinitionProbability Distribution

A probability distribution is a function or rule that assigns probabilities to the outcomes of a random experiment or, more generally, to the events in a sample space. Let $X$ be a random variable, then the probability distribution of $X$ is defined by its probability mass function (PMF) for discrete variables or probability density function (PDF) for continuous variables.

Without loss of generality, we can define the distribution of a random variable $X$ as follows: $P(X = x) = f(x)$ for discrete variables, where $f(x)$ is the PMF, and $P(X \leq x) = F(x)$ for continuous variables, where $F(x)$ is the cumulative distribution function (CDF). The PMF and PDF must satisfy the properties of non-negativity and normalization:

For discrete variables: $\sum_{x} P(X = x) = 1$
For continuous variables: $\int_{-\infty}^{\infty} f(x) dx = 1$

Discrete Probability Distributions

Bernoulli Distribution

Models a single trial with two possible outcomes (success/failure)

Parameters: $p$ (probability of success), where $0 \leq p \leq 1$

Support: $x \in \{0, 1\}$

PMF: $P(X = x) = p^x(1-p)^{1-x}$

Moment Calculations:

For the expected value:

\begin{aligned} \mathbb{E}[X] &= \sum_{x=0}^{1} x \cdot P(X = x) \\ &= 0 \cdot (1-p) + 1 \cdot p \\ &= p \end{aligned}

For the second moment:

\begin{aligned} \mathbb{E}[X^2] &= \sum_{x=0}^{1} x^2 \cdot P(X = x) \\ &= 0^2 \cdot (1-p) + 1^2 \cdot p \\ &= p \end{aligned}

Therefore, the variance is:

\begin{aligned} \mathbb{V}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \\ &= p - p^2 \\ &= p(1-p) \end{aligned}

Applications: Coin flips, binary outcomes, indicator variables

Binomial Distribution

Models the number of successes in $n$ independent Bernoulli trials

Parameters: $n$ (number of trials), $p$ (success probability)

Support: $x \in \{0, 1, 2, ..., n\}$

PMF: $P(X = x) = \binom{n}{x} p^x(1-p)^{n-x}$

Moment Calculations:

The expected value can be derived using the linearity of expectation. Since $X = \sum_{i=1}^{n} X_i$ where $X_i \sim \text{Bernoulli}(p)$ :

\begin{aligned} \mathbb{E}[X] &= \mathbb{E}\left[\sum_{i=1}^{n} X_i\right] \\ &= \sum_{i=1}^{n} \mathbb{E}[X_i] \\ &= \sum_{i=1}^{n} p \\ &= np \end{aligned}

For the variance, since the $X_i$ are independent:

\begin{aligned} \mathbb{V}(X) &= \mathbb{V}\left(\sum_{i=1}^{n} X_i\right) \\ &= \sum_{i=1}^{n} \mathbb{V}(X_i) \\ &= \sum_{i=1}^{n} p(1-p) \\ &= np(1-p) \end{aligned}

Alternatively, we can compute directly:

\begin{aligned} \mathbb{E}[X] &= \sum_{x=0}^{n} x \binom{n}{x} p^x(1-p)^{n-x} \\ &= np\sum_{x=1}^{n} \binom{n-1}{x-1} p^{x-1}(1-p)^{n-x} \\ &= np \end{aligned}

Applications: Quality control, survey sampling, clinical trials

Hypergeometric Distribution

Models the number of successes in $n$ draws without replacement from a finite population

Parameters: $N$ (population size), $K$ (number of success states), $n$ (number of draws)

Support: $x \in \{\max(0, n-(N-K)), \ldots, \min(n, K)\}$

PMF: $P(X = x) = \frac{\binom{K}{x}\binom{N-K}{n-x}}{\binom{N}{n}}$

Moment Calculations:

For the expected value, we use indicator variables. Let $I_j = 1$ if the $j$ -th draw is a success, $0$ otherwise. Then $X = \sum_{j=1}^{n} I_j$ .

The probability that any particular draw is a success is $P(I_j = 1) = \frac{K}{N}$ , so:

\begin{aligned} \mathbb{E}[X] &= \mathbb{E}\left[\sum_{j=1}^{n} I_j\right] \\ &= \sum_{j=1}^{n} \mathbb{E}[I_j] \\ &= \sum_{j=1}^{n} \frac{K}{N} \\ &= n\frac{K}{N} \end{aligned}

For the variance, we need to account for the dependence between draws:

\begin{aligned} \mathbb{V}(X) &= \mathbb{V}\left(\sum_{j=1}^{n} I_j\right) \\ &= \sum_{j=1}^{n} \mathbb{V}(I_j) + 2\sum_{j < k} \text{Cov}(I_j, I_k) \end{aligned}

Since $\mathbb{V}(I_j) = \frac{K}{N}(1-\frac{K}{N})$ and $\text{Cov}(I_j, I_k) = -\frac{K(N-K)}{N^2(N-1)}$ for $j \neq k$ :

\begin{aligned} \mathbb{V}(X) &= n\frac{K}{N}\left(1-\frac{K}{N}\right) + n(n-1)\left(-\frac{K(N-K)}{N^2(N-1)}\right) \\ &= n\frac{K}{N}\frac{N-K}{N} - n(n-1)\frac{K(N-K)}{N^2(N-1)} \\ &= n\frac{K(N-K)}{N^2}\left(1 - \frac{n-1}{N-1}\right) \\ &= n\frac{K(N-K)}{N^2}\left(\frac{N-n}{N-1}\right) \end{aligned}

Applications: Sampling without replacement, quality control, ecological studies

Poisson Distribution

Models the number of events occurring in a fixed interval

Parameters: $\lambda$ (rate parameter), where $\lambda > 0$

Support: $x \in \{0, 1, 2, ...\}$

PMF: $P(X = x) = \frac{e^{-\lambda}\lambda^x}{x!}$

Moment Calculations:

For the expected value:

\begin{aligned} \mathbb{E}[X] &= \sum_{x=0}^{\infty} x \cdot \frac{e^{-\lambda}\lambda^x}{x!} \\ &= e^{-\lambda}\sum_{x=1}^{\infty} \frac{\lambda^x}{(x-1)!} \\ &= e^{-\lambda}\lambda\sum_{x=1}^{\infty} \frac{\lambda^{x-1}}{(x-1)!} \end{aligned}

Let $k = x-1$ :

\begin{aligned} \mathbb{E}[X] &= e^{-\lambda}\lambda\sum_{k=0}^{\infty} \frac{\lambda^k}{k!} \\ &= e^{-\lambda}\lambda e^{\lambda} \\ &= \lambda \end{aligned}

For the second moment:

\begin{aligned} \mathbb{E}[X^2] &= \sum_{x=0}^{\infty} x^2 \cdot \frac{e^{-\lambda}\lambda^x}{x!} \\ &= e^{-\lambda}\sum_{x=1}^{\infty} x \cdot \frac{\lambda^x}{(x-1)!} \end{aligned}

Let $k = x-1$ :

\begin{aligned} \mathbb{E}[X^2] &= e^{-\lambda}\sum_{k=0}^{\infty} (k+1) \cdot \frac{\lambda^{k+1}}{k!} \\ &= e^{-\lambda}\lambda\sum_{k=0}^{\infty} (k+1) \cdot \frac{\lambda^k}{k!} \\ &= e^{-\lambda}\lambda\left(\sum_{k=0}^{\infty} k \cdot \frac{\lambda^k}{k!} + \sum_{k=0}^{\infty} \frac{\lambda^k}{k!}\right) \\ &= e^{-\lambda}\lambda(\lambda e^{\lambda} + e^{\lambda}) \\ &= \lambda(\lambda + 1) \end{aligned}

Therefore:

\begin{aligned} \mathbb{V}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \\ &= \lambda(\lambda + 1) - \lambda^2 \\ &= \lambda \end{aligned}

Properties: The Poisson distribution is the limit of Binomial( $n$ , $p$ ) as $n \to \infty$ , $p \to 0$ with $np = \lambda$ .

Applications: Call centers, traffic flow, radioactive decay, rare events

Continuous Probability Distributions

Normal (Gaussian) Distribution

The most important continuous distribution in statistics

Parameters: $\mu$ (mean), $\sigma^2$ (variance)

Support: $x \in (-\infty, \infty)$

PDF: $f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

Moment Calculations:

For the standard normal distribution $Z \sim N(0,1)$ :

The expected value is:

\begin{aligned} \mathbb{E}[Z] &= \int_{-\infty}^{\infty} z \cdot \frac{1}{\sqrt{2\pi}} e^{-z^2/2} dz \\ &= 0 \end{aligned}

This follows because the integrand is an odd function and the integral converges.

For the variance:

\begin{aligned} \mathbb{E}[Z^2] &= \int_{-\infty}^{\infty} z^2 \cdot \frac{1}{\sqrt{2\pi}} e^{-z^2/2} dz \end{aligned}

Using integration by parts with $u = z$ , $dv = z e^{-z^2/2} dz$ :

\begin{aligned} \mathbb{E}[Z^2] &= \frac{1}{\sqrt{2\pi}} \left[ -z e^{-z^2/2} \right]_{-\infty}^{\infty} + \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-z^2/2} dz \\ &= 0 + 1 \\ &= 1 \end{aligned}

Therefore, $\mathbb{V}(Z) = \mathbb{E}[Z^2] - (\mathbb{E}[Z])^2 = 1 - 0 = 1$ .

For the general normal distribution $X = \mu + \sigma Z$ :

\begin{aligned} \mathbb{E}[X] &= \mathbb{E}[\mu + \sigma Z] \\ &= \mu + \sigma \mathbb{E}[Z] \\ &= \mu \end{aligned}

\begin{aligned} \mathbb{V}(X) &= \mathbb{V}[\mu + \sigma Z] \\ &= \sigma^2 \mathbb{V}(Z) \\ &= \sigma^2 \end{aligned}

Properties: Central Limit Theorem states that sums of random variables approach normality. Linear combinations of normal variables are normal.

Additivity Property: If $X \sim N(\mu_1, \sigma_1^2)$ and $Y \sim N(\mu_2, \sigma_2^2)$ are independent, then: $X + Y \sim N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2)$

ProofAdditivity

Let $X \sim N(\mu_1, \sigma_1^2)$ and $Y \sim N(\mu_2, \sigma_2^2)$ be independent normal random variables.

We can write $X = \mu_1 + \sigma_1 Z_1$ and $Y = \mu_2 + \sigma_2 Z_2$ , where $Z_1, Z_2 \sim N(0,1)$ are independent standard normal variables.

Then: $X + Y = (\mu_1 + \mu_2) + \sigma_1 Z_1 + \sigma_2 Z_2$

Since $Z_1$ and $Z_2$ are independent, the linear combination $\sigma_1 Z_1 + \sigma_2 Z_2$ is also normally distributed with:

Mean: $\mathbb{E}[\sigma_1 Z_1 + \sigma_2 Z_2] = \sigma_1 \cdot 0 + \sigma_2 \cdot 0 = 0$
Variance: $\mathbb{V}(\sigma_1 Z_1 + \sigma_2 Z_2) = \sigma_1^2 \cdot 1 + \sigma_2^2 \cdot 1 = \sigma_1^2 + \sigma_2^2$

Therefore: $\sigma_1 Z_1 + \sigma_2 Z_2 \sim N(0, \sigma_1^2 + \sigma_2^2)$

And: $X + Y = (\mu_1 + \mu_2) + (\sigma_1 Z_1 + \sigma_2 Z_2) \sim N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2)$

ProofUsing Moment Generating Functions

The MGF of $X \sim N(\mu, \sigma^2)$ is: $M_X(t) = e^{\mu t + \frac{1}{2}\sigma^2 t^2}$

For independent $X$ and $Y$ : $M_{X+Y}(t) = M_X(t) \cdot M_Y(t) = e^{\mu_1 t + \frac{1}{2}\sigma_1^2 t^2} \cdot e^{\mu_2 t + \frac{1}{2}\sigma_2^2 t^2} = e^{(\mu_1 + \mu_2)t + \frac{1}{2}(\sigma_1^2 + \sigma_2^2)t^2}$

This is the MGF of $N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2)$ , proving the result.

Applications: Natural phenomena, measurement errors, statistical inference

Exponential Distribution

Models time between events in a Poisson process

Parameters: $\lambda$ (rate parameter), where $\lambda > 0$

Support: $x \in [0, \infty)$

PDF: $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$

Moment Calculations:

For the expected value:

\begin{aligned} \mathbb{E}[X] &= \int_{0}^{\infty} x \lambda e^{-\lambda x} dx \end{aligned}

Using integration by parts with $u = x$ , $dv = \lambda e^{-\lambda x} dx$ :

\begin{aligned} \mathbb{E}[X] &= \left[ -x e^{-\lambda x} \right]_{0}^{\infty} + \int_{0}^{\infty} e^{-\lambda x} dx \\ &= 0 + \left[ -\frac{1}{\lambda} e^{-\lambda x} \right]_{0}^{\infty} \\ &= \frac{1}{\lambda} \end{aligned}

For the second moment:

\begin{aligned} \mathbb{E}[X^2] &= \int_{0}^{\infty} x^2 \lambda e^{-\lambda x} dx \end{aligned}

Using integration by parts with $u = x^2$ , $dv = \lambda e^{-\lambda x} dx$ :

\begin{aligned} \mathbb{E}[X^2] &= \left[ -x^2 e^{-\lambda x} \right]_{0}^{\infty} + \int_{0}^{\infty} 2x e^{-\lambda x} dx \\ &= 0 + \frac{2}{\lambda} \int_{0}^{\infty} x \lambda e^{-\lambda x} dx \\ &= \frac{2}{\lambda} \cdot \frac{1}{\lambda} \\ &= \frac{2}{\lambda^2} \end{aligned}

Therefore:

\begin{aligned} \mathbb{V}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \\ &= \frac{2}{\lambda^2} - \left(\frac{1}{\lambda}\right)^2 \\ &= \frac{1}{\lambda^2} \end{aligned}

Properties: Memoryless property: $P(X > s+t | X > s) = P(X > t)$

Applications: Reliability engineering, queuing theory, survival analysis

Gamma Distribution(optional)

Generalizes exponential distribution, models waiting times

Parameters: $\alpha$ (shape), $\beta$ (rate), both $> 0$

Support: $x \in [0, \infty)$

PDF: $f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}$ for $x \geq 0$

Moment Calculations:

The moment generating function is:

\begin{aligned} M_X(t) &= \mathbb{E}[e^{tX}] \\ &= \int_{0}^{\infty} e^{tx} \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} dx \\ &= \frac{\beta^\alpha}{\Gamma(\alpha)} \int_{0}^{\infty} x^{\alpha-1} e^{-(\beta-t)x} dx \\ &= \frac{\beta^\alpha}{\Gamma(\alpha)} \cdot \frac{\Gamma(\alpha)}{(\beta-t)^\alpha} \\ &= \left(\frac{\beta}{\beta-t}\right)^\alpha \text{ for } t < \beta \end{aligned}

Using the MGF to find moments:

\begin{aligned} \mathbb{E}[X] &= M_X'(0) \\ &= \alpha \beta^{\alpha} (\beta-t)^{-\alpha-1} \Big|_{t=0} \\ &= \alpha \beta^{\alpha} \beta^{-\alpha-1} \\ &= \frac{\alpha}{\beta} \end{aligned}

\begin{aligned} \mathbb{E}[X^2] &= M_X''(0) \\ &= \alpha(\alpha+1)\beta^{\alpha} (\beta-t)^{-\alpha-2} \Big|_{t=0} \\ &= \frac{\alpha(\alpha+1)}{\beta^2} \end{aligned}

Therefore:

\begin{aligned} \mathbb{V}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \\ &= \frac{\alpha(\alpha+1)}{\beta^2} - \frac{\alpha^2}{\beta^2} \\ &= \frac{\alpha}{\beta^2} \end{aligned}

Properties: Sum of $\alpha$ independent Exponential( $\beta$ ) variables

Applications: Bayesian statistics, rainfall modeling, insurance

Logistic Distribution(optional)

Models growth curves and binary choice models

Parameters: $\mu$ (location), $s$ (scale), where $s > 0$

Support: $x \in (-\infty, \infty)$

PDF: $f(x) = \frac{e^{-(x-\mu)/s}}{s(1+e^{-(x-\mu)/s})^2}$

Moment Calculations:

The cumulative distribution function is: $F(x) = \frac{1}{1+e^{-(x-\mu)/s}}$

For the standard logistic distribution where $\mu = 0$ and $s = 1$ : $f(x) = \frac{e^{-x}}{(1+e^{-x})^2}$

The expected value can be found using symmetry:

\begin{aligned} \mathbb{E}[X] &= \int_{-\infty}^{\infty} x \cdot \frac{e^{-x}}{(1+e^{-x})^2} dx \end{aligned}

Let $u = -x$ , then:

\begin{aligned} \mathbb{E}[X] &= \int_{\infty}^{-\infty} (-u) \cdot \frac{e^{u}}{(1+e^{u})^2} (-du) \\ &= \int_{-\infty}^{\infty} (-u) \cdot \frac{e^{u}}{(1+e^{u})^2} du \end{aligned}

Using the identity $\frac{e^{u}}{(1+e^{u})^2} = \frac{e^{-u}}{(1+e^{-u})^2}$ :

\begin{aligned} \mathbb{E}[X] &= -\int_{-\infty}^{\infty} u \cdot \frac{e^{-u}}{(1+e^{-u})^2} du \\ &= -\mathbb{E}[X] \end{aligned}

Therefore, $\mathbb{E}[X] = 0$ .

For the variance:

\begin{aligned} \mathbb{E}[X^2] &= \int_{-\infty}^{\infty} x^2 \cdot \frac{e^{-x}}{(1+e^{-x})^2} dx \end{aligned}

Using the substitution $u = \frac{1}{1+e^{-x}}$ , which gives $x = \ln\left(\frac{u}{1-u}\right)$ and $dx = \frac{du}{u(1-u)}$ :

\begin{aligned} \mathbb{E}[X^2] &= \int_{0}^{1} \left[\ln\left(\frac{u}{1-u}\right)\right]^2 du \end{aligned}

This integral evaluates to $\frac{\pi^2}{3}$ , so $\mathbb{V}(X) = \frac{\pi^2}{3}$ .

For the general logistic distribution $X = \mu + sZ$ where $Z \sim \text{Logistic}(0,1)$ :

\begin{aligned} \mathbb{E}[X] &= \mu + s\mathbb{E}[Z] \\ &= \mu \end{aligned}

\begin{aligned} \mathbb{V}(X) &= s^2\mathbb{V}(Z) \\ &= \frac{s^2\pi^2}{3} \end{aligned}

Properties: Similar shape to normal distribution but with heavier tails. The difference of two Gumbel distributions follows a logistic distribution.

Applications: Logistic regression, choice modeling, growth curves

For more details on random variables and their properties, see Random Variable.

For expectation and variance calculations, see Expectation and Variance.

Overview

Discrete Probability Distributions

Bernoulli Distribution

Binomial Distribution

Hypergeometric Distribution

Poisson Distribution

Continuous Probability Distributions

Normal (Gaussian) Distribution

Exponential Distribution

Gamma Distribution(optional)

Logistic Distribution(optional)

Discussion