跳转至

概率论


参考书:吴昊老师的讲义


Probability Space


A probability space is a triplet \((\Omega, \mathcal{F}, \mathbb{P})\) consisting of:

  • Sample space \(\Omega\): A non-empty set.

  • \(\sigma\)-field (or \(\sigma\)-algebra) \(\mathcal{F}\): A collection of events (subsets of \(\Omega\)).

  • Probability measure \(\mathbb{P}\): A function from \(\mathcal{F}\) to \([0,1]\).


Let \(\mathcal{F}\) be a non-empty collection of subsets of \(\Omega\).

  • It is a field if
\[ A \in \mathcal{F} \Longrightarrow A^c \in \mathcal{F} \]
\[ A_1, A_2 \in \mathcal{F} \Longrightarrow A_1 \cup A_2 \in \mathcal{F} \]
  • It is a monotone class if
\[ A_j\in \mathcal{F}, A_j \subset A_{j+1}, 1\le j<\infty \Longrightarrow \cup_j A_j \in \mathcal{F} \]
\[ A_j\in \mathcal{F}, A_j \supset A_{j+1}, 1\le j<\infty \Longrightarrow \cap_j A_j \in \mathcal{F} \]
  • It is a \(\sigma\)-field if
\[ A \in \mathcal{F} \Longrightarrow A^c \in \mathcal{F} \]
\[ A_j \in \mathcal{F}, 1\le j < \infty \Longrightarrow \cup_j A_j \in \mathcal{F} \]

Lemma: A field is a \(\sigma\)-field iff it is a monotone class.

Given any collection of subsets \(\mathcal{C}\), the \(\sigma\)-field generated by \(\mathcal{C}\), denoted as \(\sigma(\mathcal{C})\), is the intersection of all \(\sigma\)-fields containing \(\mathcal{C}\).For monotone classes the way to generate them is same.

Lemma: If \(\mathcal{A}\) is a field, the \(\sigma\)-field generated by \(\mathcal{A}\) is equal to the monotone class generated by \(\mathcal{A}\).


A probability measure \(\mathbb{P}\) satisfies the following axioms:

  1. \(\mathbb{P}[E] \ge 0\) for all \(E \in \mathcal{F}\).

  2. \(\mathbb{P}[\Omega] = 1\).

  3. If \(\{E_j\}\) is a countable collection of pairwise disjoint sets in \(\mathcal{F}\), then \(\mathbb{P}[\cup_j E_j] = \sum_j \mathbb{P}[E_j]\).

Important Consequences:

  • \(\mathbb{P}[E^c] = 1 - \mathbb{P}[E]\).

  • \(\mathbb{P}[E \cup F] + \mathbb{P}[E \cap F] = \mathbb{P}[E] + \mathbb{P}[F]\).

  • Continuity: If \(E_n \uparrow E\) or \(E_n \downarrow E\), then \(\mathbb{P}[E_n] \to \mathbb{P}[E]\).

  • \(\mathbb{P}[\cup_j E_j] \le \sum_j \mathbb{P}[E_j]\).

Theorem (Carathéodory's Extension Theorem): Let \(\mathcal{F}_0\) be a field and \(\mathcal{F} = \sigma(\mathcal{F}_0)\). If \(\mu\) is a probability measure on \(\mathcal{F}_0\), then there exists a unique probability measure on \(\mathcal{F}\) that extends \(\mu\).


  • Let \(\mathcal{C}\) be the collection of intervals \((a,b]\) with \(a<b\).

  • Let \(\mathcal{B}_0\) be the field generated by \(\mathcal{C}\): finite union of disjoint sets of \((a,b], (-\infty, b], (a, \infty)\).

  • Let \(\mathcal{B} = \sigma(\mathcal{C})\) .

A function \(F: \mathbb{R} \to [0,1]\) is a distribution function if it is increasing and right-continuous with \(F(-\infty) = 0\) and \(F(\infty) = 1\).

Proposition: Each probability measure \(\mu\) on \(\mathbb{R}\) uniquely determines a distribution function \(F\) via \(F(x) = \mu((-\infty, x])\). Conversely, each distribution function \(F\) uniquely determines a probability measure \(\mu\) on \(\mathbb{R}\) satisfying the above relation.

A probability space is complete if any subset of a set in \(\mathcal{F}\) with \(\mathbb{P}[F]=0\) also belongs to \(\mathcal{F}\).

Theorem: Given any probability space \((\Omega, \mathcal{F}, \mathbb{P})\), there exists a unique complete probability space \((\Omega, \overline{\mathcal{F}}, \overline{\mathbb{P}})\) such that \(\mathcal{F} \subset \overline{\mathcal{F}}\) and \(\overline{\mathbb{P}}|_{\mathcal{F}} = \mathbb{P}\).


Random Variables and Distributions

A real-valued random variable is a function \(X: \Omega \to \mathbb{R}\) such that \(X^{-1}(B) \in \mathcal{F},\forall B \in \mathcal{B}\).

Equivalently, a random variable is a measurable function from \((\Omega, \mathcal{F})\) to \((\mathbb{R}, \mathcal{B})\).

Each random variable \(X\) induces a probability measure \(\mu\) on \((\mathbb{R}, \mathcal{B})\) by the following correspondence:

\[ \mu[B] = P[X^{-1}(B)] = P[X \in B],\quad \forall B \in \mathcal{B} \]

The measure \(\mu\) is called the law(or the distribution) of \(X\), denoted by \(\mathcal{L}(X)\); its associated distribution function is called the distribution function of \(X\), denoted by \(F_X\).

Corollary If two probability measures on \(\mathbb{R}\) agree on all intervals of the form \((a,b]\) with \(a<b\), then they agree on \(\mathcal{B}\).


The distribution function \(F\) of \(X\) is given by

\[ F(x) = P[X \le x] ,\quad \forall x \in \mathbb{R} \]

If the distribution function \(F\) is absolutely continuous, there exists a Lebesgue-integrable function \(f\) such that

\[ F(b) - F(a) = \int_a^b f(x)dx,\quad \forall a<b \]

The function \(f\) equals the derivative of \(F\) almost everywhere and is called the density function of \(X\).


A function \(F:I\subset \mathbb{R}\to \mathbb{R}\) is absolutely continuous if, for every \(\epsilon>0\), there exists a \(\delta>0\) such that whenever a finite sequence of pairwise disjoint intervals \((x_k,y_k)\) of \(I\) satisfies \(\sum_k (y_k-x_k)<\delta\), then

\[ \sum_k |F(y_k)-F(x_k)|<\epsilon \]

Lemma The following conditions of \(F\) on a compact interval \(I=[a,b]\) are equivalent:

  1. \(F\) is absolutely continuous.

  2. \(F\) has derivative \(F'\) almost everywhere, the derivative is Lebesgue integrable, and

\[ F(x) = F(a) + \int_a^x F'(y)dy,\quad \forall x\in [a,b] \]
  1. There exists a Lebesgue integrable function \(g\) on \([a,b]\) such that
\[ F(x)= F(a) + \int_a^x g(y)dy,\quad \forall x\in [a,b] \]
\[ \left\{\text{continuously differentiable}\right\}\subset \left\{\text{Lipschitz continuous}\right\}\subset \left\{\text{absolutely continuous}\right\}\subset \left\{\text{differentiable a.e.}\right\} \]

Lemma If \(\left\{X_j,j\ge 1\right\}\) is a sequence of random variables, then

\[ \inf_j X_j = \lim_{n\to\infty} \inf_{1\le j\le n} X_j,\quad \sup_j X_j = \lim_{n\to\infty} \sup_{1\le j\le n} X_j \]

are random variables.


Let \(X,Y\) be two random variables on \((\Omega, \mathcal{F}, \mathbb{P})\). Then \((X,Y)\) is a random vector.

  • \((X,Y): (\Omega, \mathcal{F})\to (\mathbb{R}^2, \mathcal{B}^2)\) is measurable.

  • \((X,Y)\) induces a probability measure on \(\mathcal{B}^2\):

\[ \nu[A]= P[(X,Y)\in A],\quad \forall A\in \mathcal{B}^2 \]
  • If \(f:\mathbb{R}^2\to \mathbb{R}\) is measurable, then \(f(X,Y)\) is a random variable.

Expectation is essentially integration over the probability space \((\Omega, \mathcal{F}, \mathbb{P})\).

  • \(\mathbb{E}[X]\) is finite if and only if \(\mathbb{E}[|X|]\) is finite.

  • \(\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]\).

  • If \(X \ge 0\) a.s., then \(\mathbb{E}[X] \ge 0\)

  • If \(X \le Y\) a.s., then \(\mathbb{E}[X] \le \mathbb{E}[Y]\).


  • Dominated Convergence Theorem (DCT): If \(\lim_n X_n = X\) a.s. and \(|X_n| \le Y\) a.s., where \(\mathbb{E}[Y] < \infty\), then
\[ \lim_{n \to \infty} \mathbb{E}[X_n] = \mathbb{E}[X] \]
  • Monotone Convergence Theorem (MCT): If \(X_n \ge 0\) and \(X_n \uparrow X\) a.s., then
\[ \lim_{n \to \infty} \mathbb{E}[X_n] = \mathbb{E}[X] \]
  • Fatou's Lemma: If \(X_n \ge 0\) a.s., then
\[ \mathbb{E}[\liminf_{n \to \infty} X_n] \le \liminf_{n \to \infty} \mathbb{E}[X_n] \]
  • If \(\Lambda_n\) are disjoint and \(\cup_n \Lambda_n = \Omega\), then \(\mathbb{E}[X]= \sum_n \mathbb{E}[X\mathbf{1}_{\Lambda_n}]\).

Useful Lemma: For any random variable \(X\):

\[ \sum_{n=1}^\infty \mathbb{P}[|X| \ge n] \le \mathbb{E}[|X|] \le 1 + \sum_{n=1}^\infty \mathbb{P}[|X| \ge n] \]

Corollary: If \(X\) takes only integer values, we have

\[ \mathbb{E}[X] = \sum_{n=1}^\infty \mathbb{P}[X \ge n] \]

Theorem Suppose \(X\) is a random variable on \((\Omega, \mathcal{F}, \mathbb{P})\) inducing the probability space \((\mathbb{R}, \mathcal{B}, \mu)\). For any Borel measurable function \(f\), we have

\[ \mathbb{E}[f(X)] = \int_{\mathbb{R}} f(x)\mu[dx] \]

Theorem Suppose \((X,Y)\) is a random vector on \((\Omega, \mathcal{F}, \mathbb{P})\) inducing the probability space \((\mathbb{R}^2, \mathcal{B}^2, \nu)\). For any Borel measurable function \(f\), we have

\[ \mathbb{E}[f(X,Y)] = \int_{\mathbb{R}^2} f(x,y)\nu[dx,dy] \]

provided that either side exists.


For \(p \in (0, \infty)\), define

\[ L^p(\Omega, \mathcal{F}, \mathbb{P}) = \{X\text{ random variable on }(\Omega, \mathcal{F}, \mathbb{P}) : \mathbb{E}[|X|^p] < \infty\} \]
  • When \(p \ge 1\), \(L^p\) is a Banach space with the norm \(\|X\|_p := \mathbb{E}[|X|^p]^{1/p}\).
  • When \(1 \le p < q\), \(L^q \subset L^p\).
  • When \(p=2\), \(L^2\) is a Hilbert space with the inner product \(\langle X, Y \rangle = \mathbb{E}[XY]\).

Suppose \(X\in L^2(\Omega, \mathcal{F}, \mathbb{P})\), we define its variance and its deviation by

\[ \text{var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2],\quad \sigma(X) = \sqrt{\text{var}(X)} \]

Suppose \(X,Y\in L^2(\Omega, \mathcal{F}, \mathbb{P})\), we define their covariance by

\[ \text{cov}(X,Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]= \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] \]

We say that \(X\) and \(Y\) are uncorrelated if \(\text{cov}(X,Y) = 0\). In this case, we have

\[ \text{var}(X+Y) = \text{var}(X) + \text{var}(Y) \]

Chebyshev Inequality: For a strictly positive and increasing function \(\varphi\) on \([0, \infty)\), then for ech \(x > 0\):

\[ \mathbb{P}[|X| \ge x] \le \frac{\mathbb{E} [\varphi(|X|)]}{\varphi(x)} \]

Hölder Inequality: For \(1 < p < \infty\) and \(1/p + 1/q = 1\):

\[ \mathbb{E}[|XY|] \le \mathbb{E}[|X|^p]^{1/p}\mathbb{E}[|Y|^q]^{1/q} \]

Minkowski Inequality: For \(1 \le p < \infty\):

\[ \|X + Y\|_p \le \|X\|_p + \|Y\|_p \]

Jensen's Inequality: If \(\varphi\) is a convex function on \(\mathbb{R}\):

\[ \varphi(\mathbb{E}[X]) \le \mathbb{E}[\varphi(X)] \]

Random variables \(\{X_j\}_{j=1}^n\) are independent if,

\[ \mathbb{P}\left[\bigcap_{j=1}^n \{X_j \in B_j\}\right] = \prod_{j=1}^n \mathbb{P}[X_j \in B_j], \forall B_j \in \mathcal{B} \]
  • If \(\mu_j = \mathcal{L}(X_j)\) and \(\mu = \mathcal{L}((X_1, \dots, X_n))\), independence means
\[ \mu[B_1 \times \cdots \times B_n] = \prod_{j=1}^n \mu_j[B_j]. \]
  • \(F_j\sim X_j\) and \(F\sim (X_1, \dots, X_n)\), independence means
\[ F(x_1, \dots, x_n) = \prod_{j=1}^n F_j(x_j) \]

Proposition: If \(\left\{X_j\right\}\) are independent random variables, \(\left\{f_j(X_j)\right\}\) are Borel measurable functions, Then \(\left\{f_j(X_j)\right\}\) are also independent.

Proposition: If \(X\) and \(Y\) are independent random variables with finite expectations, then

\[ \mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y] \]

Corollary: If \(\{X_j\}_{j=1}^n\) are independent random variables with finite expectations, then

\[ \mathbb{E}\left[\prod_{j=1}^n X_j\right] = \prod_{j=1}^n \mathbb{E}[X_j] \]

A collection of sets \(\mathcal{A}\) is a \(\pi\)-system if it is closed under intersection.

Theorem: Suppose \(\{\mathcal{A}_j\}_{j=1}^n\) are independent and each \(\mathcal{A}_j\) is a \(\pi\)-system, then \(\{\sigma(\mathcal{A}_j)\}_{j=1}^n\) are independent.

Wald's Equation: Let \(\{X_n\}\) be i.i.d. with finite mean. For \(k\ge 1\), let

\[ \mathcal{F}_k=\sigma(X_1, \dots, X_k) \]

Suppose \(N\) is a random variable taking positive integer values such that

\[ \left\{N \le k\right\} \in \mathcal{F}_k, \forall k \]

and \(\mathbb{E}[N] < \infty\). Then we have

\[ \mathbb{E}\left[\sum_{j=1}^N X_j\right] = \mathbb{E}[N]\mathbb{E}[X_1] \]

The convolution of two distribution functions \(F_1\) and \(F_2\) is

\[ F(x) = \int F_1(x-y)F_2[dy], \quad \forall x \]

This is still a distribution function and we denote it by \(F = F_1 * F_2\). The corresponding measure is denoted by \(\mu = \mu_1 * \mu_2\).

Lemma: If \(X\) and \(Y\) are independent random variables, then

\[ P[X + Y \le z] = F_X * F_Y(z) \]

Lemma: If \(X\) and \(Y\) have density functions, then \(X+Y\) has the density function

\[ p_{X+Y}(z) = \int p_X(z-y)\mu_Y[dy]. \]

Example: Suppose \(\{X_j\}_{j=1}^n\) are independent and \(X_j \sim \mathcal{N}(m_j, \sigma_j^2)\). Then

\[ \sum_{j=1}^n X_j \sim \mathcal{N}\left(\sum_{j=1}^n m_j, \sum_{j=1}^n \sigma_j^2\right) \]

Cramér's Theorem: If \(\{X_j\}_{j=1}^n\) are independent real-valued random variables such that \(\sum_{j=1}^n X_j\) has a normal distribution, then all of \(\left\{X_j\right\}_{j=1}^n\) must have normal distributions as well.

Example: Suppose \(\{X_j\}_{j=1}^n\) are independent and \(X_j \sim \text{Poisson}(\lambda_j)\).

\[ \sum_{j=1}^n X_j \sim \text{Poisson}\left(\sum_{j=1}^n \lambda_j\right) \]

Raikov's Theorem: If \(\{X_j\}_{j=1}^n\) are independent non-negative random variables such that \(\sum_{j=1}^n X_j\) has a Poisson distribution, then of \(\{X_j\}_{j=1}^n\) must have Poisson distributions as well.

#概统#概率论