Definition of Random Variable

A random variable is the object that lets probability theory talk about data numerically. It turns outcomes in a sample space into numbers, so that probabilities can be described with distributions, expectations, variances, and other statistical tools.

DefinitionRandom Variable

A random variable is a variable whose possible values are numerical outcomes of a random phenomenon. There are two main types of random variables: discrete and continuous.

Discrete Random Variables: These are random variables that can take on a countable number of values. For example, the number of heads in 10 coin flips is a discrete random variable.
Continuous Random Variables: These are random variables that can take on an infinite number of values within a given range. For example, the time it takes for a computer to solve a problem is a continuous random variable.

Formally, a random variable is a measurable function that maps outcomes of a random process to real numbers. This mapping allows us to assign probabilities to different outcomes and analyze them statistically using existing mathematical tools.

Let $\Omega$ be the sample space of a random process, and let $X: \Omega \to \mathbb{R}$ be a random variable. The function $X$ assigns a real number to each outcome in $\Omega$ . The probability distribution of a random variable describes how the probabilities are distributed over the possible values of the random variable.

Probability Functions

For discrete random variables, we use the Probability Mass Function (PMF): $p_X(x) = P(X = x)$

Properties:

$p_X(x) \geq 0$ for all $x$
$\sum_{x} p_X(x) = 1$

For continuous random variables, we use the Probability Density Function (PDF): $f_X(x) \text{ where } P(a \leq X \leq b) = \int_a^b f_X(x)dx$

Properties:

$f_X(x) \geq 0$ for all $x$
$\int_{-\infty}^{\infty} f_X(x)dx = 1$

Cumulative Distribution Function (CDF)

The CDF is defined for both discrete and continuous random variables: $F_X(x) = P(X \leq x)$

For discrete: $F_X(x) = \sum_{t \leq x} p_X(t)$

For continuous: $F_X(x) = \int_{-\infty}^{x} f_X(t)dt$

In summary, random variables are functions that map outcomes of random processes(sample space) to real numbers, allowing us to analyze and quantify the behavior of random phenomena.

A more rigorous definition is possible by introducing measurement and probability space, you may access here optionally: Random Variable - stackexchange.

For more details on expectation and variance calculations, see Expectation and Variance.

Some Examples

Below are some examples of random variables in different contexts: discrete, continuous, and mixed.

Discrete Random Variable

ExampleRolling a Die

Consider a simple example of rolling a fair six-sided die. The sample space $\Omega$ consists of the outcomes $\{1, 2, 3, 4, 5, 6\}$ . We can define a random variable $X$ that maps each outcome to its value. For example, if we roll a die and get a 3, then $X(\omega) = 3$ . The probability distribution of this random variable is uniform, meaning each outcome has an equal probability of $\frac{1}{6}$ .

PMF: $p_X(x) = \frac{1}{6}$ for $x \in \{1, 2, 3, 4, 5, 6\}$

For detailed calculations of expected value and variance, see Expectation and Variance.

Continuous Random Variable

ExampleRainfall Measurement

Consider a continuous random variable that represents the amount of rainfall in a city over a month. The sample space $\Omega$ could be the set of all non-negative real numbers, representing the amount of rainfall in millimeters. We can define a random variable $Y$ that maps each outcome to the amount of rainfall. For example, if we measure 50 mm of rainfall in a month, then $Y(\omega) = 50$ . The probability distribution of this random variable could be modeled using a normal distribution, where the mean represents the average rainfall and the standard deviation represents the variability in rainfall.

Suppose the rainfall follows a normal distribution with mean $\mu = 100$ mm and standard deviation $\sigma = 30$ mm. The PDF is: $f_Y(y) = \frac{1}{30\sqrt{2\pi}} e^{-\frac{(y-100)^2}{2 \cdot 30^2}}$

Probability of specific ranges:

$P(70 \leq Y \leq 130) = P(\mu-\sigma \leq Y \leq \mu+\sigma) \approx 0.6827$ (68.27%)
$P(40 \leq Y \leq 160) = P(\mu-2\sigma \leq Y \leq \mu+2\sigma) \approx 0.9545$ (95.45%)

CDF: $F_Y(y) = \int_{0}^{y} f_Y(t)dt$ (truncated normal since rainfall ≥ 0)

Mixed Random Variable

Mixed random variables are those that can take on both discrete and continuous values. For example, consider a random variable that represents the number of customers arriving at a store in a day, where the number of customers can be any non-negative integer (discrete) and the time of arrival can be any real number (continuous).

ExampleCustomer Arrivals

Consider a random variable $Z$ that represents the number of customers arriving at a store in a day. The sample space $\Omega$ could be the set of all non-negative integers for the number of customers and the set of all non-negative real numbers for the time of arrival. We can define a random variable $Z$ that maps each outcome to the number of customers and their arrival time. For example, if 5 customers arrive at the store at different times throughout the day, we can represent this as $Z(\omega) = (5, t_1, t_2, t_3, t_4, t_5)$ , where $t_i$ represents the time of arrival of each customer. The probability distribution of this random variable could be a combination of a discrete distribution for the number of customers and a continuous distribution for the arrival times.

Comparison: Discrete vs Continuous Random Variables

Aspect	Discrete Random Variables	Continuous Random Variables
Values	Countable (finite or infinite)	Uncountable (interval)
Probability Function	PMF: $p_X(x) = P(X = x)$	PDF: $f_X(x)$ where $P(a \leq X \leq b) = \int_a^b f_X(x)dx$
Individual Points	$P(X = x) > 0$ for specific $x$	$P(X = x) = 0$ for any specific $x$
CDF	Step function	Continuous function
Examples	Coin flips, dice rolls, counts	Time, distance, temperature
Expected Value	$E[X] = \sum x \cdot p_X(x)$	$E[X] = \int x \cdot f_X(x)dx$

For information about covariance and correlation between random variables, see Expectation and Variance.

Joint Random Variables

When working with multiple random variables simultaneously, we need to understand their joint behavior and relationships.

DefinitionJoint Random Variables

Joint random variables describe the behavior of two or more random variables defined on the same probability space. For two random variables $X$ and $Y$ , their joint distribution specifies the probability of $X$ taking value $x$ and $Y$ taking value $y$ simultaneously.

Joint Probability Functions

For discrete random variables, we use the Joint Probability Mass Function: $p_{X,Y}(x,y) = P(X = x, Y = y)$

Properties:

$p_{X,Y}(x,y) \geq 0$ for all $x,y$
$\sum_{x}\sum_{y} p_{X,Y}(x,y) = 1$

For continuous random variables, we use the Joint Probability Density Function: $f_{X,Y}(x,y) \text{ where } P(a \leq X \leq b, c \leq Y \leq d) = \int_a^b \int_c^d f_{X,Y}(x,y) dy dx$

Properties:

$f_{X,Y}(x,y) \geq 0$ for all $x,y$
$\iint_{\mathbb{R}^2} f_{X,Y}(x,y) dx dy = 1$

Marginal Distributions

The marginal distribution of one variable can be obtained from the joint distribution:

For discrete:

$p_X(x) = \sum_{y} p_{X,Y}(x,y)$
$p_Y(y) = \sum_{x} p_{X,Y}(x,y)$

For continuous:

$f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) dy$
$f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) dx$

Independence

Random variables $X$ and $Y$ are independent if: $p_{X,Y}(x,y) = p_X(x) \cdot p_Y(y) \text{ (discrete)}$ $f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) \text{ (continuous)}$

This means the joint distribution factors into the product of marginal distributions.

ExampleTwo Dice

Consider rolling two fair six-sided dice. Let $X$ be the outcome of the first die and $Y$ be the outcome of the second die.

Joint PMF: $p_{X,Y}(x,y) = \frac{1}{36}$ for $x,y \in \{1, 2, 3, 4, 5, 6\}$

Marginal PMFs:

$p_X(x) = \sum_{y=1}^{6} p_{X,Y}(x,y) = \frac{1}{6}$
$p_Y(y) = \sum_{x=1}^{6} p_{X,Y}(x,y) = \frac{1}{6}$

Since $p_{X,Y}(x,y) = p_X(x) \cdot p_Y(y)$ , the dice rolls are independent.

ExampleHeight and Weight

Consider the relationship between height $H$ and weight $W$ of adults. These are typically not independent.

The joint PDF $f_{H,W}(h,w)$ describes how height and weight are distributed together in the population.

The marginal density $f_H(h) = \int_{0}^{\infty} f_{H,W}(h,w) dw$ gives the distribution of heights regardless of weight
The marginal density $f_W(w) = \int_{0}^{\infty} f_{H,W}(h,w) dh$ gives the distribution of weights regardless of height

Since height and weight are correlated, $f_{H,W}(h,w) \neq f_H(h) \cdot f_W(w)$ .

For more details on computing expectations with joint random variables, see Expectation and Variance.

Definition of Random Variable

Probability Functions

Cumulative Distribution Function (CDF)

Some Examples

Discrete Random Variable

Continuous Random Variable

Mixed Random Variable

Comparison: Discrete vs Continuous Random Variables

Joint Random Variables

Joint Probability Functions

Marginal Distributions

Independence

Discussion