概率论
参考书:吴昊老师的讲义
Probability Space
A probability space is a triplet \((\Omega, \mathcal{F}, \mathbb{P})\) consisting of:
-
Sample space \(\Omega\): A non-empty set.
-
\(\sigma\)-field (or \(\sigma\)-algebra) \(\mathcal{F}\): A collection of events (subsets of \(\Omega\)).
-
Probability measure \(\mathbb{P}\): A function from \(\mathcal{F}\) to \([0,1]\).
Let \(\mathcal{F}\) be a non-empty collection of subsets of \(\Omega\).
- It is a field if
- It is a monotone class if
- It is a \(\sigma\)-field if
Lemma: A field is a \(\sigma\)-field iff it is a monotone class.
Given any collection of subsets \(\mathcal{C}\), the \(\sigma\)-field generated by \(\mathcal{C}\), denoted as \(\sigma(\mathcal{C})\), is the intersection of all \(\sigma\)-fields containing \(\mathcal{C}\).For monotone classes the way to generate them is same.
Lemma: If \(\mathcal{A}\) is a field, the \(\sigma\)-field generated by \(\mathcal{A}\) is equal to the monotone class generated by \(\mathcal{A}\).
A probability measure \(\mathbb{P}\) satisfies the following axioms:
-
\(\mathbb{P}[E] \ge 0\) for all \(E \in \mathcal{F}\).
-
\(\mathbb{P}[\Omega] = 1\).
-
If \(\{E_j\}\) is a countable collection of pairwise disjoint sets in \(\mathcal{F}\), then \(\mathbb{P}[\cup_j E_j] = \sum_j \mathbb{P}[E_j]\).
Important Consequences:
-
\(\mathbb{P}[E^c] = 1 - \mathbb{P}[E]\).
-
\(\mathbb{P}[E \cup F] + \mathbb{P}[E \cap F] = \mathbb{P}[E] + \mathbb{P}[F]\).
-
Continuity: If \(E_n \uparrow E\) or \(E_n \downarrow E\), then \(\mathbb{P}[E_n] \to \mathbb{P}[E]\).
-
\(\mathbb{P}[\cup_j E_j] \le \sum_j \mathbb{P}[E_j]\).
Theorem (Carathéodory's Extension Theorem): Let \(\mathcal{F}_0\) be a field and \(\mathcal{F} = \sigma(\mathcal{F}_0)\). If \(\mu\) is a probability measure on \(\mathcal{F}_0\), then there exists a unique probability measure on \(\mathcal{F}\) that extends \(\mu\).
-
Let \(\mathcal{C}\) be the collection of intervals \((a,b]\) with \(a<b\).
-
Let \(\mathcal{B}_0\) be the field generated by \(\mathcal{C}\): finite union of disjoint sets of \((a,b], (-\infty, b], (a, \infty)\).
-
Let \(\mathcal{B} = \sigma(\mathcal{C})\) .
A function \(F: \mathbb{R} \to [0,1]\) is a distribution function if it is increasing and right-continuous with \(F(-\infty) = 0\) and \(F(\infty) = 1\).
Proposition: Each probability measure \(\mu\) on \(\mathbb{R}\) uniquely determines a distribution function \(F\) via \(F(x) = \mu((-\infty, x])\). Conversely, each distribution function \(F\) uniquely determines a probability measure \(\mu\) on \(\mathbb{R}\) satisfying the above relation.
A probability space is complete if any subset of a set in \(\mathcal{F}\) with \(\mathbb{P}[F]=0\) also belongs to \(\mathcal{F}\).
Theorem: Given any probability space \((\Omega, \mathcal{F}, \mathbb{P})\), there exists a unique complete probability space \((\Omega, \overline{\mathcal{F}}, \overline{\mathbb{P}})\) such that \(\mathcal{F} \subset \overline{\mathcal{F}}\) and \(\overline{\mathbb{P}}|_{\mathcal{F}} = \mathbb{P}\).
Random Variables and Distributions
A real-valued random variable is a function \(X: \Omega \to \mathbb{R}\) such that \(X^{-1}(B) \in \mathcal{F},\forall B \in \mathcal{B}\).
Equivalently, a random variable is a measurable function from \((\Omega, \mathcal{F})\) to \((\mathbb{R}, \mathcal{B})\).
Each random variable \(X\) induces a probability measure \(\mu\) on \((\mathbb{R}, \mathcal{B})\) by the following correspondence:
The measure \(\mu\) is called the law(or the distribution) of \(X\), denoted by \(\mathcal{L}(X)\); its associated distribution function is called the distribution function of \(X\), denoted by \(F_X\).
Corollary If two probability measures on \(\mathbb{R}\) agree on all intervals of the form \((a,b]\) with \(a<b\), then they agree on \(\mathcal{B}\).
The distribution function \(F\) of \(X\) is given by
If the distribution function \(F\) is absolutely continuous, there exists a Lebesgue-integrable function \(f\) such that
The function \(f\) equals the derivative of \(F\) almost everywhere and is called the density function of \(X\).
A function \(F:I\subset \mathbb{R}\to \mathbb{R}\) is absolutely continuous if, for every \(\epsilon>0\), there exists a \(\delta>0\) such that whenever a finite sequence of pairwise disjoint intervals \((x_k,y_k)\) of \(I\) satisfies \(\sum_k (y_k-x_k)<\delta\), then
Lemma The following conditions of \(F\) on a compact interval \(I=[a,b]\) are equivalent:
-
\(F\) is absolutely continuous.
-
\(F\) has derivative \(F'\) almost everywhere, the derivative is Lebesgue integrable, and
- There exists a Lebesgue integrable function \(g\) on \([a,b]\) such that
Lemma If \(\left\{X_j,j\ge 1\right\}\) is a sequence of random variables, then
are random variables.
Let \(X,Y\) be two random variables on \((\Omega, \mathcal{F}, \mathbb{P})\). Then \((X,Y)\) is a random vector.
-
\((X,Y): (\Omega, \mathcal{F})\to (\mathbb{R}^2, \mathcal{B}^2)\) is measurable.
-
\((X,Y)\) induces a probability measure on \(\mathcal{B}^2\):
- If \(f:\mathbb{R}^2\to \mathbb{R}\) is measurable, then \(f(X,Y)\) is a random variable.
Expectation is essentially integration over the probability space \((\Omega, \mathcal{F}, \mathbb{P})\).
-
\(\mathbb{E}[X]\) is finite if and only if \(\mathbb{E}[|X|]\) is finite.
-
\(\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]\).
-
If \(X \ge 0\) a.s., then \(\mathbb{E}[X] \ge 0\)
-
If \(X \le Y\) a.s., then \(\mathbb{E}[X] \le \mathbb{E}[Y]\).
- Dominated Convergence Theorem (DCT): If \(\lim_n X_n = X\) a.s. and \(|X_n| \le Y\) a.s., where \(\mathbb{E}[Y] < \infty\), then
- Monotone Convergence Theorem (MCT): If \(X_n \ge 0\) and \(X_n \uparrow X\) a.s., then
- Fatou's Lemma: If \(X_n \ge 0\) a.s., then
- If \(\Lambda_n\) are disjoint and \(\cup_n \Lambda_n = \Omega\), then \(\mathbb{E}[X]= \sum_n \mathbb{E}[X\mathbf{1}_{\Lambda_n}]\).
Useful Lemma: For any random variable \(X\):
Corollary: If \(X\) takes only integer values, we have
Theorem Suppose \(X\) is a random variable on \((\Omega, \mathcal{F}, \mathbb{P})\) inducing the probability space \((\mathbb{R}, \mathcal{B}, \mu)\). For any Borel measurable function \(f\), we have
Theorem Suppose \((X,Y)\) is a random vector on \((\Omega, \mathcal{F}, \mathbb{P})\) inducing the probability space \((\mathbb{R}^2, \mathcal{B}^2, \nu)\). For any Borel measurable function \(f\), we have
provided that either side exists.
For \(p \in (0, \infty)\), define
- When \(p \ge 1\), \(L^p\) is a Banach space with the norm \(\|X\|_p := \mathbb{E}[|X|^p]^{1/p}\).
- When \(1 \le p < q\), \(L^q \subset L^p\).
- When \(p=2\), \(L^2\) is a Hilbert space with the inner product \(\langle X, Y \rangle = \mathbb{E}[XY]\).
Suppose \(X\in L^2(\Omega, \mathcal{F}, \mathbb{P})\), we define its variance and its deviation by
Suppose \(X,Y\in L^2(\Omega, \mathcal{F}, \mathbb{P})\), we define their covariance by
We say that \(X\) and \(Y\) are uncorrelated if \(\text{cov}(X,Y) = 0\). In this case, we have
Chebyshev Inequality: For a strictly positive and increasing function \(\varphi\) on \([0, \infty)\), then for ech \(x > 0\):
Hölder Inequality: For \(1 < p < \infty\) and \(1/p + 1/q = 1\):
Minkowski Inequality: For \(1 \le p < \infty\):
Jensen's Inequality: If \(\varphi\) is a convex function on \(\mathbb{R}\):
Random variables \(\{X_j\}_{j=1}^n\) are independent if,
- If \(\mu_j = \mathcal{L}(X_j)\) and \(\mu = \mathcal{L}((X_1, \dots, X_n))\), independence means
- \(F_j\sim X_j\) and \(F\sim (X_1, \dots, X_n)\), independence means
Proposition: If \(\left\{X_j\right\}\) are independent random variables, \(\left\{f_j(X_j)\right\}\) are Borel measurable functions, Then \(\left\{f_j(X_j)\right\}\) are also independent.
Proposition: If \(X\) and \(Y\) are independent random variables with finite expectations, then
Corollary: If \(\{X_j\}_{j=1}^n\) are independent random variables with finite expectations, then
A collection of sets \(\mathcal{A}\) is a \(\pi\)-system if it is closed under intersection.
Theorem: Suppose \(\{\mathcal{A}_j\}_{j=1}^n\) are independent and each \(\mathcal{A}_j\) is a \(\pi\)-system, then \(\{\sigma(\mathcal{A}_j)\}_{j=1}^n\) are independent.
Wald's Equation: Let \(\{X_n\}\) be i.i.d. with finite mean. For \(k\ge 1\), let
Suppose \(N\) is a random variable taking positive integer values such that
and \(\mathbb{E}[N] < \infty\). Then we have
The convolution of two distribution functions \(F_1\) and \(F_2\) is
This is still a distribution function and we denote it by \(F = F_1 * F_2\). The corresponding measure is denoted by \(\mu = \mu_1 * \mu_2\).
Lemma: If \(X\) and \(Y\) are independent random variables, then
Lemma: If \(X\) and \(Y\) have density functions, then \(X+Y\) has the density function
Example: Suppose \(\{X_j\}_{j=1}^n\) are independent and \(X_j \sim \mathcal{N}(m_j, \sigma_j^2)\). Then
Cramér's Theorem: If \(\{X_j\}_{j=1}^n\) are independent real-valued random variables such that \(\sum_{j=1}^n X_j\) has a normal distribution, then all of \(\left\{X_j\right\}_{j=1}^n\) must have normal distributions as well.
Example: Suppose \(\{X_j\}_{j=1}^n\) are independent and \(X_j \sim \text{Poisson}(\lambda_j)\).
Raikov's Theorem: If \(\{X_j\}_{j=1}^n\) are independent non-negative random variables such that \(\sum_{j=1}^n X_j\) has a Poisson distribution, then of \(\{X_j\}_{j=1}^n\) must have Poisson distributions as well.