Probability distributions are mathematical functions that describe the likelihood of different outcomes for a random variable. They provide a complete description of the probability structure of random phenomena and are fundamental to statistical analysis and machine learning.
Overview
Probability distributions can be classified based on the nature of the random variable: discrete (countable outcomes), continuous (uncountable outcomes within intervals), or mixed (combinations). Each distribution is characterized by its support (possible values), probability function (PMF for discrete, PDF for continuous), cumulative distribution function, parameters, and moments.
Definition Probability Distribution
A probability distribution is a function or rule that assigns probabilities to the outcomes of a random experiment or, more generally, to the events in a sample space.
Let X X X be a random variable, then the probability distribution of X X X is defined by its probability mass function (PMF) for discrete variables or probability density function (PDF) for continuous variables.
Without loss of generality, we can define the distribution of a random variable X X X as follows:
P ( X = x ) = f ( x ) P(X = x) = f(x) P ( X = x ) = f ( x )
for discrete variables, where f ( x ) f(x) f ( x ) is the PMF, and
P ( X ≤ x ) = F ( x ) P(X \leq x) = F(x) P ( X ≤ x ) = F ( x )
for continuous variables, where F ( x ) F(x) F ( x ) is the cumulative distribution function (CDF).
The PMF and PDF must satisfy the properties of non-negativity and normalization:
For discrete variables: ∑ x P ( X = x ) = 1 \sum_{x} P(X = x) = 1 ∑ x P ( X = x ) = 1
For continuous variables: ∫ − ∞ ∞ f ( x ) d x = 1 \int_{-\infty}^{\infty} f(x) dx = 1 ∫ − ∞ ∞ f ( x ) d x = 1
Discrete Probability Distributions
Bernoulli Distribution
Models a single trial with two possible outcomes (success/failure)
Parameters : p p p (probability of success), where 0 ≤ p ≤ 1 0 \leq p \leq 1 0 ≤ p ≤ 1
Support : x ∈ { 0 , 1 } x \in \{0, 1\} x ∈ { 0 , 1 }
PMF : P ( X = x ) = p x ( 1 − p ) 1 − x P(X = x) = p^x(1-p)^{1-x} P ( X = x ) = p x ( 1 − p ) 1 − x
Moment Calculations :
For the expected value:
E [ X ] = ∑ x = 0 1 x ⋅ P ( X = x ) = 0 ⋅ ( 1 − p ) + 1 ⋅ p = p \begin{aligned}
\mathbb{E}[X] &= \sum_{x=0}^{1} x \cdot P(X = x) \\
&= 0 \cdot (1-p) + 1 \cdot p \\
&= p
\end{aligned} E [ X ] = x = 0 ∑ 1 x ⋅ P ( X = x ) = 0 ⋅ ( 1 − p ) + 1 ⋅ p = p
For the second moment:
E [ X 2 ] = ∑ x = 0 1 x 2 ⋅ P ( X = x ) = 0 2 ⋅ ( 1 − p ) + 1 2 ⋅ p = p \begin{aligned}
\mathbb{E}[X^2] &= \sum_{x=0}^{1} x^2 \cdot P(X = x) \\
&= 0^2 \cdot (1-p) + 1^2 \cdot p \\
&= p
\end{aligned} E [ X 2 ] = x = 0 ∑ 1 x 2 ⋅ P ( X = x ) = 0 2 ⋅ ( 1 − p ) + 1 2 ⋅ p = p
Therefore, the variance is:
V ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = p − p 2 = p ( 1 − p ) \begin{aligned}
\mathbb{V}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \\
&= p - p^2 \\
&= p(1-p)
\end{aligned} V ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = p − p 2 = p ( 1 − p )
Applications : Coin flips, binary outcomes, indicator variables
Binomial Distribution
Models the number of successes in n n n independent Bernoulli trials
Parameters : n n n (number of trials), p p p (success probability)
Support : x ∈ { 0 , 1 , 2 , . . . , n } x \in \{0, 1, 2, ..., n\} x ∈ { 0 , 1 , 2 , ... , n }
PMF : P ( X = x ) = ( n x ) p x ( 1 − p ) n − x P(X = x) = \binom{n}{x} p^x(1-p)^{n-x} P ( X = x ) = ( x n ) p x ( 1 − p ) n − x
Moment Calculations :
The expected value can be derived using the linearity of expectation. Since X = ∑ i = 1 n X i X = \sum_{i=1}^{n} X_i X = ∑ i = 1 n X i where X i ∼ Bernoulli ( p ) X_i \sim \text{Bernoulli}(p) X i ∼ Bernoulli ( p ) :
E [ X ] = E [ ∑ i = 1 n X i ] = ∑ i = 1 n E [ X i ] = ∑ i = 1 n p = n p \begin{aligned}
\mathbb{E}[X] &= \mathbb{E}\left[\sum_{i=1}^{n} X_i\right] \\
&= \sum_{i=1}^{n} \mathbb{E}[X_i] \\
&= \sum_{i=1}^{n} p \\
&= np
\end{aligned} E [ X ] = E [ i = 1 ∑ n X i ] = i = 1 ∑ n E [ X i ] = i = 1 ∑ n p = n p
For the variance, since the X i X_i X i are independent:
V ( X ) = V ( ∑ i = 1 n X i ) = ∑ i = 1 n V ( X i ) = ∑ i = 1 n p ( 1 − p ) = n p ( 1 − p ) \begin{aligned}
\mathbb{V}(X) &= \mathbb{V}\left(\sum_{i=1}^{n} X_i\right) \\
&= \sum_{i=1}^{n} \mathbb{V}(X_i) \\
&= \sum_{i=1}^{n} p(1-p) \\
&= np(1-p)
\end{aligned} V ( X ) = V ( i = 1 ∑ n X i ) = i = 1 ∑ n V ( X i ) = i = 1 ∑ n p ( 1 − p ) = n p ( 1 − p )
Alternatively, we can compute directly:
E [ X ] = ∑ x = 0 n x ( n x ) p x ( 1 − p ) n − x = n p ∑ x = 1 n ( n − 1 x − 1 ) p x − 1 ( 1 − p ) n − x = n p \begin{aligned}
\mathbb{E}[X] &= \sum_{x=0}^{n} x \binom{n}{x} p^x(1-p)^{n-x} \\
&= np\sum_{x=1}^{n} \binom{n-1}{x-1} p^{x-1}(1-p)^{n-x} \\
&= np
\end{aligned} E [ X ] = x = 0 ∑ n x ( x n ) p x ( 1 − p ) n − x = n p x = 1 ∑ n ( x − 1 n − 1 ) p x − 1 ( 1 − p ) n − x = n p
Applications : Quality control, survey sampling, clinical trials
Hypergeometric Distribution
Models the number of successes in n n n draws without replacement from a finite population
Parameters : N N N (population size), K K K (number of success states), n n n (number of draws)
Support : x ∈ { max ( 0 , n − ( N − K ) ) , … , min ( n , K ) } x \in \{\max(0, n-(N-K)), \ldots, \min(n, K)\} x ∈ { max ( 0 , n − ( N − K )) , … , min ( n , K )}
PMF : P ( X = x ) = ( K x ) ( N − K n − x ) ( N n ) P(X = x) = \frac{\binom{K}{x}\binom{N-K}{n-x}}{\binom{N}{n}} P ( X = x ) = ( n N ) ( x K ) ( n − x N − K )
Moment Calculations :
For the expected value, we use indicator variables. Let I j = 1 I_j = 1 I j = 1 if the j j j -th draw is a success, 0 0 0 otherwise. Then X = ∑ j = 1 n I j X = \sum_{j=1}^{n} I_j X = ∑ j = 1 n I j .
The probability that any particular draw is a success is P ( I j = 1 ) = K N P(I_j = 1) = \frac{K}{N} P ( I j = 1 ) = N K , so:
E [ X ] = E [ ∑ j = 1 n I j ] = ∑ j = 1 n E [ I j ] = ∑ j = 1 n K N = n K N \begin{aligned}
\mathbb{E}[X] &= \mathbb{E}\left[\sum_{j=1}^{n} I_j\right] \\
&= \sum_{j=1}^{n} \mathbb{E}[I_j] \\
&= \sum_{j=1}^{n} \frac{K}{N} \\
&= n\frac{K}{N}
\end{aligned} E [ X ] = E [ j = 1 ∑ n I j ] = j = 1 ∑ n E [ I j ] = j = 1 ∑ n N K = n N K
For the variance, we need to account for the dependence between draws:
V ( X ) = V ( ∑ j = 1 n I j ) = ∑ j = 1 n V ( I j ) + 2 ∑ j < k Cov ( I j , I k ) \begin{aligned}
\mathbb{V}(X) &= \mathbb{V}\left(\sum_{j=1}^{n} I_j\right) \\
&= \sum_{j=1}^{n} \mathbb{V}(I_j) + 2\sum_{j < k} \text{Cov}(I_j, I_k)
\end{aligned} V ( X ) = V ( j = 1 ∑ n I j ) = j = 1 ∑ n V ( I j ) + 2 j < k ∑ Cov ( I j , I k )
Since V ( I j ) = K N ( 1 − K N ) \mathbb{V}(I_j) = \frac{K}{N}(1-\frac{K}{N}) V ( I j ) = N K ( 1 − N K ) and Cov ( I j , I k ) = − K ( N − K ) N 2 ( N − 1 ) \text{Cov}(I_j, I_k) = -\frac{K(N-K)}{N^2(N-1)} Cov ( I j , I k ) = − N 2 ( N − 1 ) K ( N − K ) for j ≠ k j \neq k j = k :
V ( X ) = n K N ( 1 − K N ) + n ( n − 1 ) ( − K ( N − K ) N 2 ( N − 1 ) ) = n K N N − K N − n ( n − 1 ) K ( N − K ) N 2 ( N − 1 ) = n K ( N − K ) N 2 ( 1 − n − 1 N − 1 ) = n K ( N − K ) N 2 ( N − n N − 1 ) \begin{aligned}
\mathbb{V}(X) &= n\frac{K}{N}\left(1-\frac{K}{N}\right) + n(n-1)\left(-\frac{K(N-K)}{N^2(N-1)}\right) \\
&= n\frac{K}{N}\frac{N-K}{N} - n(n-1)\frac{K(N-K)}{N^2(N-1)} \\
&= n\frac{K(N-K)}{N^2}\left(1 - \frac{n-1}{N-1}\right) \\
&= n\frac{K(N-K)}{N^2}\left(\frac{N-n}{N-1}\right)
\end{aligned} V ( X ) = n N K ( 1 − N K ) + n ( n − 1 ) ( − N 2 ( N − 1 ) K ( N − K ) ) = n N K N N − K − n ( n − 1 ) N 2 ( N − 1 ) K ( N − K ) = n N 2 K ( N − K ) ( 1 − N − 1 n − 1 ) = n N 2 K ( N − K ) ( N − 1 N − n )
Applications : Sampling without replacement, quality control, ecological studies
Poisson Distribution
Models the number of events occurring in a fixed interval
Parameters : λ \lambda λ (rate parameter), where λ > 0 \lambda > 0 λ > 0
Support : x ∈ { 0 , 1 , 2 , . . . } x \in \{0, 1, 2, ...\} x ∈ { 0 , 1 , 2 , ... }
PMF : P ( X = x ) = e − λ λ x x ! P(X = x) = \frac{e^{-\lambda}\lambda^x}{x!} P ( X = x ) = x ! e − λ λ x
Moment Calculations :
For the expected value:
E [ X ] = ∑ x = 0 ∞ x ⋅ e − λ λ x x ! = e − λ ∑ x = 1 ∞ λ x ( x − 1 ) ! = e − λ λ ∑ x = 1 ∞ λ x − 1 ( x − 1 ) ! \begin{aligned}
\mathbb{E}[X] &= \sum_{x=0}^{\infty} x \cdot \frac{e^{-\lambda}\lambda^x}{x!} \\
&= e^{-\lambda}\sum_{x=1}^{\infty} \frac{\lambda^x}{(x-1)!} \\
&= e^{-\lambda}\lambda\sum_{x=1}^{\infty} \frac{\lambda^{x-1}}{(x-1)!}
\end{aligned} E [ X ] = x = 0 ∑ ∞ x ⋅ x ! e − λ λ x = e − λ x = 1 ∑ ∞ ( x − 1 )! λ x = e − λ λ x = 1 ∑ ∞ ( x − 1 )! λ x − 1
Let k = x − 1 k = x-1 k = x − 1 :
E [ X ] = e − λ λ ∑ k = 0 ∞ λ k k ! = e − λ λ e λ = λ \begin{aligned}
\mathbb{E}[X] &= e^{-\lambda}\lambda\sum_{k=0}^{\infty} \frac{\lambda^k}{k!} \\
&= e^{-\lambda}\lambda e^{\lambda} \\
&= \lambda
\end{aligned} E [ X ] = e − λ λ k = 0 ∑ ∞ k ! λ k = e − λ λ e λ = λ
For the second moment:
E [ X 2 ] = ∑ x = 0 ∞ x 2 ⋅ e − λ λ x x ! = e − λ ∑ x = 1 ∞ x ⋅ λ x ( x − 1 ) ! \begin{aligned}
\mathbb{E}[X^2] &= \sum_{x=0}^{\infty} x^2 \cdot \frac{e^{-\lambda}\lambda^x}{x!} \\
&= e^{-\lambda}\sum_{x=1}^{\infty} x \cdot \frac{\lambda^x}{(x-1)!}
\end{aligned} E [ X 2 ] = x = 0 ∑ ∞ x 2 ⋅ x ! e − λ λ x = e − λ x = 1 ∑ ∞ x ⋅ ( x − 1 )! λ x
Let k = x − 1 k = x-1 k = x − 1 :
E [ X 2 ] = e − λ ∑ k = 0 ∞ ( k + 1 ) ⋅ λ k + 1 k ! = e − λ λ ∑ k = 0 ∞ ( k + 1 ) ⋅ λ k k ! = e − λ λ ( ∑ k = 0 ∞ k ⋅ λ k k ! + ∑ k = 0 ∞ λ k k ! ) = e − λ λ ( λ e λ + e λ ) = λ ( λ + 1 ) \begin{aligned}
\mathbb{E}[X^2] &= e^{-\lambda}\sum_{k=0}^{\infty} (k+1) \cdot \frac{\lambda^{k+1}}{k!} \\
&= e^{-\lambda}\lambda\sum_{k=0}^{\infty} (k+1) \cdot \frac{\lambda^k}{k!} \\
&= e^{-\lambda}\lambda\left(\sum_{k=0}^{\infty} k \cdot \frac{\lambda^k}{k!} + \sum_{k=0}^{\infty} \frac{\lambda^k}{k!}\right) \\
&= e^{-\lambda}\lambda(\lambda e^{\lambda} + e^{\lambda}) \\
&= \lambda(\lambda + 1)
\end{aligned} E [ X 2 ] = e − λ k = 0 ∑ ∞ ( k + 1 ) ⋅ k ! λ k + 1 = e − λ λ k = 0 ∑ ∞ ( k + 1 ) ⋅ k ! λ k = e − λ λ ( k = 0 ∑ ∞ k ⋅ k ! λ k + k = 0 ∑ ∞ k ! λ k ) = e − λ λ ( λ e λ + e λ ) = λ ( λ + 1 )
Therefore:
V ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = λ ( λ + 1 ) − λ 2 = λ \begin{aligned}
\mathbb{V}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \\
&= \lambda(\lambda + 1) - \lambda^2 \\
&= \lambda
\end{aligned} V ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = λ ( λ + 1 ) − λ 2 = λ
Properties : The Poisson distribution is the limit of Binomial(n n n , p p p ) as n → ∞ n \to \infty n → ∞ , p → 0 p \to 0 p → 0 with n p = λ np = \lambda n p = λ .
Applications : Call centers, traffic flow, radioactive decay, rare events
Continuous Probability Distributions
Normal (Gaussian) Distribution
The most important continuous distribution in statistics
Parameters : μ \mu μ (mean), σ 2 \sigma^2 σ 2 (variance)
Support : x ∈ ( − ∞ , ∞ ) x \in (-\infty, \infty) x ∈ ( − ∞ , ∞ )
PDF : f ( x ) = 1 σ 2 π e − ( x − μ ) 2 2 σ 2 f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} f ( x ) = σ 2 π 1 e − 2 σ 2 ( x − μ ) 2
Moment Calculations :
For the standard normal distribution Z ∼ N ( 0 , 1 ) Z \sim N(0,1) Z ∼ N ( 0 , 1 ) :
The expected value is:
E [ Z ] = ∫ − ∞ ∞ z ⋅ 1 2 π e − z 2 / 2 d z = 0 \begin{aligned}
\mathbb{E}[Z] &= \int_{-\infty}^{\infty} z \cdot \frac{1}{\sqrt{2\pi}} e^{-z^2/2} dz \\
&= 0
\end{aligned} E [ Z ] = ∫ − ∞ ∞ z ⋅ 2 π 1 e − z 2 /2 d z = 0
This follows because the integrand is an odd function and the integral converges.
For the variance:
E [ Z 2 ] = ∫ − ∞ ∞ z 2 ⋅ 1 2 π e − z 2 / 2 d z \begin{aligned}
\mathbb{E}[Z^2] &= \int_{-\infty}^{\infty} z^2 \cdot \frac{1}{\sqrt{2\pi}} e^{-z^2/2} dz
\end{aligned} E [ Z 2 ] = ∫ − ∞ ∞ z 2 ⋅ 2 π 1 e − z 2 /2 d z
Using integration by parts with u = z u = z u = z , d v = z e − z 2 / 2 d z dv = z e^{-z^2/2} dz d v = z e − z 2 /2 d z :
E [ Z 2 ] = 1 2 π [ − z e − z 2 / 2 ] − ∞ ∞ + 1 2 π ∫ − ∞ ∞ e − z 2 / 2 d z = 0 + 1 = 1 \begin{aligned}
\mathbb{E}[Z^2] &= \frac{1}{\sqrt{2\pi}} \left[ -z e^{-z^2/2} \right]_{-\infty}^{\infty} + \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-z^2/2} dz \\
&= 0 + 1 \\
&= 1
\end{aligned} E [ Z 2 ] = 2 π 1 [ − z e − z 2 /2 ] − ∞ ∞ + 2 π 1 ∫ − ∞ ∞ e − z 2 /2 d z = 0 + 1 = 1
Therefore, V ( Z ) = E [ Z 2 ] − ( E [ Z ] ) 2 = 1 − 0 = 1 \mathbb{V}(Z) = \mathbb{E}[Z^2] - (\mathbb{E}[Z])^2 = 1 - 0 = 1 V ( Z ) = E [ Z 2 ] − ( E [ Z ] ) 2 = 1 − 0 = 1 .
For the general normal distribution X = μ + σ Z X = \mu + \sigma Z X = μ + σ Z :
E [ X ] = E [ μ + σ Z ] = μ + σ E [ Z ] = μ \begin{aligned}
\mathbb{E}[X] &= \mathbb{E}[\mu + \sigma Z] \\
&= \mu + \sigma \mathbb{E}[Z] \\
&= \mu
\end{aligned} E [ X ] = E [ μ + σ Z ] = μ + σ E [ Z ] = μ
V ( X ) = V [ μ + σ Z ] = σ 2 V ( Z ) = σ 2 \begin{aligned}
\mathbb{V}(X) &= \mathbb{V}[\mu + \sigma Z] \\
&= \sigma^2 \mathbb{V}(Z) \\
&= \sigma^2
\end{aligned} V ( X ) = V [ μ + σ Z ] = σ 2 V ( Z ) = σ 2
Properties : Central Limit Theorem states that sums of random variables approach normality. Linear combinations of normal variables are normal.
Additivity Property : If X ∼ N ( μ 1 , σ 1 2 ) X \sim N(\mu_1, \sigma_1^2) X ∼ N ( μ 1 , σ 1 2 ) and Y ∼ N ( μ 2 , σ 2 2 ) Y \sim N(\mu_2, \sigma_2^2) Y ∼ N ( μ 2 , σ 2 2 ) are independent, then:
X + Y ∼ N ( μ 1 + μ 2 , σ 1 2 + σ 2 2 ) X + Y \sim N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2) X + Y ∼ N ( μ 1 + μ 2 , σ 1 2 + σ 2 2 )
Proof Additivity
Let X ∼ N ( μ 1 , σ 1 2 ) X \sim N(\mu_1, \sigma_1^2) X ∼ N ( μ 1 , σ 1 2 ) and Y ∼ N ( μ 2 , σ 2 2 ) Y \sim N(\mu_2, \sigma_2^2) Y ∼ N ( μ 2 , σ 2 2 ) be independent normal random variables.
We can write X = μ 1 + σ 1 Z 1 X = \mu_1 + \sigma_1 Z_1 X = μ 1 + σ 1 Z 1 and Y = μ 2 + σ 2 Z 2 Y = \mu_2 + \sigma_2 Z_2 Y = μ 2 + σ 2 Z 2 , where Z 1 , Z 2 ∼ N ( 0 , 1 ) Z_1, Z_2 \sim N(0,1) Z 1 , Z 2 ∼ N ( 0 , 1 ) are independent standard normal variables.
Then:
X + Y = ( μ 1 + μ 2 ) + σ 1 Z 1 + σ 2 Z 2 X + Y = (\mu_1 + \mu_2) + \sigma_1 Z_1 + \sigma_2 Z_2 X + Y = ( μ 1 + μ 2 ) + σ 1 Z 1 + σ 2 Z 2
Since Z 1 Z_1 Z 1 and Z 2 Z_2 Z 2 are independent, the linear combination σ 1 Z 1 + σ 2 Z 2 \sigma_1 Z_1 + \sigma_2 Z_2 σ 1 Z 1 + σ 2 Z 2 is also normally distributed with:
Mean: E [ σ 1 Z 1 + σ 2 Z 2 ] = σ 1 ⋅ 0 + σ 2 ⋅ 0 = 0 \mathbb{E}[\sigma_1 Z_1 + \sigma_2 Z_2] = \sigma_1 \cdot 0 + \sigma_2 \cdot 0 = 0 E [ σ 1 Z 1 + σ 2 Z 2 ] = σ 1 ⋅ 0 + σ 2 ⋅ 0 = 0
Variance: V ( σ 1 Z 1 + σ 2 Z 2 ) = σ 1 2 ⋅ 1 + σ 2 2 ⋅ 1 = σ 1 2 + σ 2 2 \mathbb{V}(\sigma_1 Z_1 + \sigma_2 Z_2) = \sigma_1^2 \cdot 1 + \sigma_2^2 \cdot 1 = \sigma_1^2 + \sigma_2^2 V ( σ 1 Z 1 + σ 2 Z 2 ) = σ 1 2 ⋅ 1 + σ 2 2 ⋅ 1 = σ 1 2 + σ 2 2
Therefore:
σ 1 Z 1 + σ 2 Z 2 ∼ N ( 0 , σ 1 2 + σ 2 2 ) \sigma_1 Z_1 + \sigma_2 Z_2 \sim N(0, \sigma_1^2 + \sigma_2^2) σ 1 Z 1 + σ 2 Z 2 ∼ N ( 0 , σ 1 2 + σ 2 2 )
And:
X + Y = ( μ 1 + μ 2 ) + ( σ 1 Z 1 + σ 2 Z 2 ) ∼ N ( μ 1 + μ 2 , σ 1 2 + σ 2 2 ) X + Y = (\mu_1 + \mu_2) + (\sigma_1 Z_1 + \sigma_2 Z_2) \sim N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2) X + Y = ( μ 1 + μ 2 ) + ( σ 1 Z 1 + σ 2 Z 2 ) ∼ N ( μ 1 + μ 2 , σ 1 2 + σ 2 2 )
Proof Using Moment Generating Functions
The MGF of X ∼ N ( μ , σ 2 ) X \sim N(\mu, \sigma^2) X ∼ N ( μ , σ 2 ) is:
M X ( t ) = e μ t + 1 2 σ 2 t 2 M_X(t) = e^{\mu t + \frac{1}{2}\sigma^2 t^2} M X ( t ) = e μ t + 2 1 σ 2 t 2
For independent X X X and Y Y Y :
M X + Y ( t ) = M X ( t ) ⋅ M Y ( t ) = e μ 1 t + 1 2 σ 1 2 t 2 ⋅ e μ 2 t + 1 2 σ 2 2 t 2 = e ( μ 1 + μ 2 ) t + 1 2 ( σ 1 2 + σ 2 2 ) t 2 M_{X+Y}(t) = M_X(t) \cdot M_Y(t) = e^{\mu_1 t + \frac{1}{2}\sigma_1^2 t^2} \cdot e^{\mu_2 t + \frac{1}{2}\sigma_2^2 t^2} = e^{(\mu_1 + \mu_2)t + \frac{1}{2}(\sigma_1^2 + \sigma_2^2)t^2} M X + Y ( t ) = M X ( t ) ⋅ M Y ( t ) = e μ 1 t + 2 1 σ 1 2 t 2 ⋅ e μ 2 t + 2 1 σ 2 2 t 2 = e ( μ 1 + μ 2 ) t + 2 1 ( σ 1 2 + σ 2 2 ) t 2
This is the MGF of N ( μ 1 + μ 2 , σ 1 2 + σ 2 2 ) N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2) N ( μ 1 + μ 2 , σ 1 2 + σ 2 2 ) , proving the result.
Applications : Natural phenomena, measurement errors, statistical inference
Exponential Distribution
Models time between events in a Poisson process
Parameters : λ \lambda λ (rate parameter), where λ > 0 \lambda > 0 λ > 0
Support : x ∈ [ 0 , ∞ ) x \in [0, \infty) x ∈ [ 0 , ∞ )
PDF : f ( x ) = λ e − λ x f(x) = \lambda e^{-\lambda x} f ( x ) = λ e − λ x for x ≥ 0 x \geq 0 x ≥ 0
Moment Calculations :
For the expected value:
E [ X ] = ∫ 0 ∞ x λ e − λ x d x \begin{aligned}
\mathbb{E}[X] &= \int_{0}^{\infty} x \lambda e^{-\lambda x} dx
\end{aligned} E [ X ] = ∫ 0 ∞ x λ e − λ x d x
Using integration by parts with u = x u = x u = x , d v = λ e − λ x d x dv = \lambda e^{-\lambda x} dx d v = λ e − λ x d x :
E [ X ] = [ − x e − λ x ] 0 ∞ + ∫ 0 ∞ e − λ x d x = 0 + [ − 1 λ e − λ x ] 0 ∞ = 1 λ \begin{aligned}
\mathbb{E}[X] &= \left[ -x e^{-\lambda x} \right]_{0}^{\infty} + \int_{0}^{\infty} e^{-\lambda x} dx \\
&= 0 + \left[ -\frac{1}{\lambda} e^{-\lambda x} \right]_{0}^{\infty} \\
&= \frac{1}{\lambda}
\end{aligned} E [ X ] = [ − x e − λ x ] 0 ∞ + ∫ 0 ∞ e − λ x d x = 0 + [ − λ 1 e − λ x ] 0 ∞ = λ 1
For the second moment:
E [ X 2 ] = ∫ 0 ∞ x 2 λ e − λ x d x \begin{aligned}
\mathbb{E}[X^2] &= \int_{0}^{\infty} x^2 \lambda e^{-\lambda x} dx
\end{aligned} E [ X 2 ] = ∫ 0 ∞ x 2 λ e − λ x d x
Using integration by parts with u = x 2 u = x^2 u = x 2 , d v = λ e − λ x d x dv = \lambda e^{-\lambda x} dx d v = λ e − λ x d x :
E [ X 2 ] = [ − x 2 e − λ x ] 0 ∞ + ∫ 0 ∞ 2 x e − λ x d x = 0 + 2 λ ∫ 0 ∞ x λ e − λ x d x = 2 λ ⋅ 1 λ = 2 λ 2 \begin{aligned}
\mathbb{E}[X^2] &= \left[ -x^2 e^{-\lambda x} \right]_{0}^{\infty} + \int_{0}^{\infty} 2x e^{-\lambda x} dx \\
&= 0 + \frac{2}{\lambda} \int_{0}^{\infty} x \lambda e^{-\lambda x} dx \\
&= \frac{2}{\lambda} \cdot \frac{1}{\lambda} \\
&= \frac{2}{\lambda^2}
\end{aligned} E [ X 2 ] = [ − x 2 e − λ x ] 0 ∞ + ∫ 0 ∞ 2 x e − λ x d x = 0 + λ 2 ∫ 0 ∞ x λ e − λ x d x = λ 2 ⋅ λ 1 = λ 2 2
Therefore:
V ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = 2 λ 2 − ( 1 λ ) 2 = 1 λ 2 \begin{aligned}
\mathbb{V}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \\
&= \frac{2}{\lambda^2} - \left(\frac{1}{\lambda}\right)^2 \\
&= \frac{1}{\lambda^2}
\end{aligned} V ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = λ 2 2 − ( λ 1 ) 2 = λ 2 1
Properties : Memoryless property: P ( X > s + t ∣ X > s ) = P ( X > t ) P(X > s+t | X > s) = P(X > t) P ( X > s + t ∣ X > s ) = P ( X > t )
Applications : Reliability engineering, queuing theory, survival analysis
Gamma Distribution(optional)
Generalizes exponential distribution, models waiting times
Parameters : α \alpha α (shape), β \beta β (rate), both > 0 > 0 > 0
Support : x ∈ [ 0 , ∞ ) x \in [0, \infty) x ∈ [ 0 , ∞ )
PDF : f ( x ) = β α Γ ( α ) x α − 1 e − β x f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} f ( x ) = Γ ( α ) β α x α − 1 e − β x for x ≥ 0 x \geq 0 x ≥ 0
Moment Calculations :
The moment generating function is:
M X ( t ) = E [ e t X ] = ∫ 0 ∞ e t x β α Γ ( α ) x α − 1 e − β x d x = β α Γ ( α ) ∫ 0 ∞ x α − 1 e − ( β − t ) x d x = β α Γ ( α ) ⋅ Γ ( α ) ( β − t ) α = ( β β − t ) α for t < β \begin{aligned}
M_X(t) &= \mathbb{E}[e^{tX}] \\
&= \int_{0}^{\infty} e^{tx} \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} dx \\
&= \frac{\beta^\alpha}{\Gamma(\alpha)} \int_{0}^{\infty} x^{\alpha-1} e^{-(\beta-t)x} dx \\
&= \frac{\beta^\alpha}{\Gamma(\alpha)} \cdot \frac{\Gamma(\alpha)}{(\beta-t)^\alpha} \\
&= \left(\frac{\beta}{\beta-t}\right)^\alpha \text{ for } t < \beta
\end{aligned} M X ( t ) = E [ e tX ] = ∫ 0 ∞ e t x Γ ( α ) β α x α − 1 e − β x d x = Γ ( α ) β α ∫ 0 ∞ x α − 1 e − ( β − t ) x d x = Γ ( α ) β α ⋅ ( β − t ) α Γ ( α ) = ( β − t β ) α for t < β
Using the MGF to find moments:
E [ X ] = M X ′ ( 0 ) = α β α ( β − t ) − α − 1 ∣ t = 0 = α β α β − α − 1 = α β \begin{aligned}
\mathbb{E}[X] &= M_X'(0) \\
&= \alpha \beta^{\alpha} (\beta-t)^{-\alpha-1} \Big|_{t=0} \\
&= \alpha \beta^{\alpha} \beta^{-\alpha-1} \\
&= \frac{\alpha}{\beta}
\end{aligned} E [ X ] = M X ′ ( 0 ) = α β α ( β − t ) − α − 1 t = 0 = α β α β − α − 1 = β α
E [ X 2 ] = M X ′ ′ ( 0 ) = α ( α + 1 ) β α ( β − t ) − α − 2 ∣ t = 0 = α ( α + 1 ) β 2 \begin{aligned}
\mathbb{E}[X^2] &= M_X''(0) \\
&= \alpha(\alpha+1)\beta^{\alpha} (\beta-t)^{-\alpha-2} \Big|_{t=0} \\
&= \frac{\alpha(\alpha+1)}{\beta^2}
\end{aligned} E [ X 2 ] = M X ′′ ( 0 ) = α ( α + 1 ) β α ( β − t ) − α − 2 t = 0 = β 2 α ( α + 1 )
Therefore:
V ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = α ( α + 1 ) β 2 − α 2 β 2 = α β 2 \begin{aligned}
\mathbb{V}(X) &= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \\
&= \frac{\alpha(\alpha+1)}{\beta^2} - \frac{\alpha^2}{\beta^2} \\
&= \frac{\alpha}{\beta^2}
\end{aligned} V ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = β 2 α ( α + 1 ) − β 2 α 2 = β 2 α
Properties : Sum of α \alpha α independent Exponential(β \beta β ) variables
Applications : Bayesian statistics, rainfall modeling, insurance
Logistic Distribution(optional)
Models growth curves and binary choice models
Parameters : μ \mu μ (location), s s s (scale), where s > 0 s > 0 s > 0
Support : x ∈ ( − ∞ , ∞ ) x \in (-\infty, \infty) x ∈ ( − ∞ , ∞ )
PDF : f ( x ) = e − ( x − μ ) / s s ( 1 + e − ( x − μ ) / s ) 2 f(x) = \frac{e^{-(x-\mu)/s}}{s(1+e^{-(x-\mu)/s})^2} f ( x ) = s ( 1 + e − ( x − μ ) / s ) 2 e − ( x − μ ) / s
Moment Calculations :
The cumulative distribution function is:
F ( x ) = 1 1 + e − ( x − μ ) / s F(x) = \frac{1}{1+e^{-(x-\mu)/s}} F ( x ) = 1 + e − ( x − μ ) / s 1
For the standard logistic distribution where μ = 0 \mu = 0 μ = 0 and s = 1 s = 1 s = 1 :
f ( x ) = e − x ( 1 + e − x ) 2 f(x) = \frac{e^{-x}}{(1+e^{-x})^2} f ( x ) = ( 1 + e − x ) 2 e − x
The expected value can be found using symmetry:
E [ X ] = ∫ − ∞ ∞ x ⋅ e − x ( 1 + e − x ) 2 d x \begin{aligned}
\mathbb{E}[X] &= \int_{-\infty}^{\infty} x \cdot \frac{e^{-x}}{(1+e^{-x})^2} dx
\end{aligned} E [ X ] = ∫ − ∞ ∞ x ⋅ ( 1 + e − x ) 2 e − x d x
Let u = − x u = -x u = − x , then:
E [ X ] = ∫ ∞ − ∞ ( − u ) ⋅ e u ( 1 + e u ) 2 ( − d u ) = ∫ − ∞ ∞ ( − u ) ⋅ e u ( 1 + e u ) 2 d u \begin{aligned}
\mathbb{E}[X] &= \int_{\infty}^{-\infty} (-u) \cdot \frac{e^{u}}{(1+e^{u})^2} (-du) \\
&= \int_{-\infty}^{\infty} (-u) \cdot \frac{e^{u}}{(1+e^{u})^2} du
\end{aligned} E [ X ] = ∫ ∞ − ∞ ( − u ) ⋅ ( 1 + e u ) 2 e u ( − d u ) = ∫ − ∞ ∞ ( − u ) ⋅ ( 1 + e u ) 2 e u d u
Using the identity e u ( 1 + e u ) 2 = e − u ( 1 + e − u ) 2 \frac{e^{u}}{(1+e^{u})^2} = \frac{e^{-u}}{(1+e^{-u})^2} ( 1 + e u ) 2 e u = ( 1 + e − u ) 2 e − u :
E [ X ] = − ∫ − ∞ ∞ u ⋅ e − u ( 1 + e − u ) 2 d u = − E [ X ] \begin{aligned}
\mathbb{E}[X] &= -\int_{-\infty}^{\infty} u \cdot \frac{e^{-u}}{(1+e^{-u})^2} du \\
&= -\mathbb{E}[X]
\end{aligned} E [ X ] = − ∫ − ∞ ∞ u ⋅ ( 1 + e − u ) 2 e − u d u = − E [ X ]
Therefore, E [ X ] = 0 \mathbb{E}[X] = 0 E [ X ] = 0 .
For the variance:
E [ X 2 ] = ∫ − ∞ ∞ x 2 ⋅ e − x ( 1 + e − x ) 2 d x \begin{aligned}
\mathbb{E}[X^2] &= \int_{-\infty}^{\infty} x^2 \cdot \frac{e^{-x}}{(1+e^{-x})^2} dx
\end{aligned} E [ X 2 ] = ∫ − ∞ ∞ x 2 ⋅ ( 1 + e − x ) 2 e − x d x
Using the substitution u = 1 1 + e − x u = \frac{1}{1+e^{-x}} u = 1 + e − x 1 , which gives x = ln ( u 1 − u ) x = \ln\left(\frac{u}{1-u}\right) x = ln ( 1 − u u ) and d x = d u u ( 1 − u ) dx = \frac{du}{u(1-u)} d x = u ( 1 − u ) d u :
E [ X 2 ] = ∫ 0 1 [ ln ( u 1 − u ) ] 2 d u \begin{aligned}
\mathbb{E}[X^2] &= \int_{0}^{1} \left[\ln\left(\frac{u}{1-u}\right)\right]^2 du
\end{aligned} E [ X 2 ] = ∫ 0 1 [ ln ( 1 − u u ) ] 2 d u
This integral evaluates to π 2 3 \frac{\pi^2}{3} 3 π 2 , so V ( X ) = π 2 3 \mathbb{V}(X) = \frac{\pi^2}{3} V ( X ) = 3 π 2 .
For the general logistic distribution X = μ + s Z X = \mu + sZ X = μ + s Z where Z ∼ Logistic ( 0 , 1 ) Z \sim \text{Logistic}(0,1) Z ∼ Logistic ( 0 , 1 ) :
E [ X ] = μ + s E [ Z ] = μ \begin{aligned}
\mathbb{E}[X] &= \mu + s\mathbb{E}[Z] \\
&= \mu
\end{aligned} E [ X ] = μ + s E [ Z ] = μ
V ( X ) = s 2 V ( Z ) = s 2 π 2 3 \begin{aligned}
\mathbb{V}(X) &= s^2\mathbb{V}(Z) \\
&= \frac{s^2\pi^2}{3}
\end{aligned} V ( X ) = s 2 V ( Z ) = 3 s 2 π 2
Properties : Similar shape to normal distribution but with heavier tails. The difference of two Gumbel distributions follows a logistic distribution.
Applications : Logistic regression, choice modeling, growth curves
For more details on random variables and their properties, see Random Variable .
For expectation and variance calculations, see Expectation and Variance .
Discussion