统计推断
参考书:Statistical Inference
Probability Theory
样本空间(sample space) The set \(S\) of all possible outcomes of a particular experiment is called the sample space.
事件(event) An event is any collection of possible outcomes of an experiment, that is, any subset of \(S\) (including \(S\) itself).
σ-代数(sigma algebra) A collection of subsets of \(S\) is called a sigma algebra (or Borel field), denoted by \(\mathcal{B}\), if it satisfies the following three properties:
-
\(\emptyset \in \mathcal{B}\)
-
If \(A \in \mathcal{B}\), then \(A^c \in \mathcal{B}\)
-
If \(A_1, A_2, \ldots \in \mathcal{B}\), then \(\bigcup_{i=1}^{\infty} A_i \in \mathcal{B}\)
probability function Given a sample space \(S\) and an associated sigma algebra \(\mathcal{B}\), a probability function is a function \(P\) with domain \(\mathcal{B}\) that satisfies
-
\(P(A)\ge 0\) for all \(A \in \mathcal{B}\)
-
\(P(S) = 1\)
-
If \(A_1, A_2, \ldots \in \mathcal{B}\) are pairwise disjoint, then \(P(\bigcup_{i=1}^{\infty} A_i) = \sum_{i=1}^{\infty} P(A_i)\)
Theorem 1.2.6 Let \(S=\left\{s_1,\cdots,s_n\right\}\) be a finite set. Let \(\mathcal{B}\) be any sigma algebra of subsets of \(S\). Let \(p_1,\cdots,p_n\) be nonnegative numbers that sum to 1. For any \(A\in \mathcal{B}\), define \(P(A)\) by
Then \(P\) is a probability function on \(\mathcal{B}\). This remains true if \(S\) is a countable set.
Theorem 1.2.8 If \(P\) is a probability function and \(A\) is any set in \(\mathcal{B}\), then
-
\(P(\emptyset) = 0\)
-
\(P(A)\le 1\)
-
\(P(A^c) = 1 - P(A)\)
Theorem 1.2.9 If \(P\) is a probability function and \(A\) and \(B\) are any sets in \(\mathcal{B}\), then
-
\(P(B\cap A^c) = P(B) - P(A\cap B)\)
-
\(P(A\cup B) = P(A) + P(B) - P(A\cap B)\)
-
If \(A\subset B\), then \(P(A)\le P(B)\)
Theorem 1.2.10 If \(P\) is a probability function, then
-
\(P(A)=\sum_{i=1}^{\infty} P(A\cap C_i)\) for any partition \(C_1,C_2,\cdots\)
-
\(P(\bigcup_{i=1}^{\infty} A_i) \le \sum_{i=1}^{\infty} P(A_i)\) for any sequence of sets \(A_1,A_2,\cdots\)
If \(A\) and \(B\) are events in \(S\), and \(P(B)>0\), then the conditional probability of \(A\) given \(B\), written \(P(A|B)\), is
Bayes' Rule Let $A_1,A_2,\cdots $ be a partition of the sample space, and let \(B\) be any set. Then, for each \(i=1,2,\cdots\),
Two events, \(A\) and \(B\), are statistically independent if
Theorem 1.3.9 If \(A\) and \(B\) are independent events, then the following pairs are also independent:
-
\(A\) and \(B^c\)
-
\(A^c\) and \(B\)
-
\(A^c\) and \(B^c\)
A collection of events \(A_1,\cdots, A_n\) are mutually independent if for any subcollection \(A_{i_1},\cdots, A_{i_k}\), we have
A random variable is a function from a sample space \(S\) into the real numbers.
The cumulative distribution function or CDF of a random variable \(X\), denoted by \(F_X(x)\), is defined by
Theorem 1.5.3 The function \(F(x)\) is a cdf iff the following three conditions hold:
-
\(\lim_{x\to -\infty} F(x) = 0\) and \(\lim_{x\to \infty} F(x) = 1\)
-
\(F(x)\) is nondecreasing
-
\(F(x)\) is right continuous, that is, \(\lim_{x\to x_0^+} F(x) = F(x_0)\) for all \(x\).
A random variable \(X\) is continuous if \(F_X(x)\) is a continuous function of \(x\). A random variable \(X\) is discrete if \(F_X(x)\) is a step function of \(x\).
The random variables \(X\) and \(Y\) are identically distributed if, for every set \(A\in \mathcal{B}\), we have \(P(X\in A) = P(Y\in A)\).
Theorem 1.5.10 The following two statements are equivalent:
-
The random variables \(X\) and \(Y\) are identically distributed.
-
\(F_X(x) = F_Y(x)\) for all \(x\).
The probability mass function or pmf of a discrete random variable \(X\) is given by
The probability density function or pdf, \(f_X(x)\), of a continuous random variable \(X\) is the function that satisfies
notation "X has a distribution given by \(F_X(x)\)" is abbreviated symbolically by \(X\sim F_X(x)\). We can similarly write \(X\sim f_X(x)\), \(X\sim Y\).
Theorem 1.6.5 A function \(f_X(x)\) is a pdf(or pmf) of a random variable \(X\) iff
-
\(f_X(x)\ge 0\) for all \(x\)
-
\(\sum_{x} f_X(x) = 1\) (pmf) or \(\int_{-\infty}^{\infty} f_X(x) dx = 1\) (pdf)
Transformation and expectation
Transformation Let \(X\) be a random variable with cdf \(F_X(x)\) and pdf (or pmf) \(f_X(x)\). Let \(Y = g(X)\), where \(g\) is a one-to-one function, mapping the original sample space of \(X\), \(\mathcal{X}\), to a new sample space, \(\mathcal{Y}\), the sample space of random variable \(Y\). We can write for any set \(A\subset \mathcal{Y}\),
the pmf of \(Y\) is
If \(g(x)\) is an increasing function, we can write
else if \(g(x)\) is a decreasing function, we can write
Theorem 2.1.3 Let \(X\) have cdf \(F_X(x)\), let \(Y=g(X)\), and let \(\mathcal{X}\) and \(\mathcal{Y}\) be defined as \(\mathcal{X} = \left\{x: f_X(x)>0\right\}\) and \(\mathcal{Y} = \left\{y:y=g(x)\text{ for some }x\in \mathcal{X}\right\}\).
-
If \(g\) is an increasing function on \(\mathcal{X}\), \(F_Y(y) = F_X(g^{-1}(y))\) for \(y\in \mathcal{Y}\)
-
If \(g\) is a decreasing function on \(\mathcal{X}\) and \(X\) is a continuous random variable, \(F_Y(y) = 1 - F_X(g^{-1}(y))\) for \(y\in \mathcal{Y}\)
Theorem 2.1.5 Let \(X\) have pdf \(f_X(x)\) and let \(Y=g(X)\), where \(g\) is a monotone function. Let \(\mathcal{X}\) and \(\mathcal{Y}\) be defined as above. Suppose that \(f_X(x)\) is continuous on \(\mathcal{X}\) and that \(g^{-1}(y)\) has a continuous derivative on \(\mathcal{Y}\). Then the pdf of \(Y\) is given by
Theorem 2.1.8 Let \(X\) have pdf \(f_X(x)\) and let \(Y=g(X)\), and define the sample spaces \(\mathcal{X}\) as above. Suppose there exists a partition, \(A_0,A_1,\cdots,A_k\) of \(\mathcal{X}\) such that \(P(X\in A_0)=0\) and \(f_X(x)\) is continuous on each \(A_i\). Further, suppose there exist functions \(g_1,\cdots,g_k\), defined on \(A_1,\cdots,A_k\), respectively, satisfying
-
\(g(x)=g_i(x)\) for \(x\in A_i\)
-
\(g_i\) is monotone on \(A_i\)
-
the set \(\mathcal{Y} = \left\{y:y=g_i(x)\text{ for some }x\in A_i\right\}\) is the same for each \(i=1,\cdots,k\)
-
\(g_i^{-1}(y)\) has a continuous derivative on \(\mathcal{Y}\) for each \(i=1,\cdots,k\)
Then
Theorem 2.1.10 Let \(X\) have continuous cdf \(F_X(x)\) and define the random variable \(Y\) as \(Y=F_X(X)\). Then \(Y\) is uniformly distributed on \((0,1)\), that is, \(P(Y\le y) = y\) for \(0<y<1\).
The expected value or mean of a random variable \(g(X)\), denoted by \(\mathrm{E}[g(X)]\), is
For each integer \(n\), the \(n\)th moment of \(X\), \(\mu'_n\),is
The \(n\)th central moment of \(X\), \(\mu_n\), is
where \(\mu=\mu'_1=\mathrm{E}[X]\).
The variance of \(X\), denoted by \(\mathrm{Var}(X)=\mathrm{E}[(X-\mu)^2]\), is the second central moment of \(X\). The positive square root of \(\mathrm{Var}(X)\) is the standard deviation of \(X\).
Let \(X\) be a random variable with cdf \(F_X\). The moment generating function or mgf of \(X\), denoted by \(M_X(t)\), is
Theorem 2.3.7 If \(X\) has mgf \(M_X(t)\), then
Theorem 2.3.11 Let \(F_X(x)\) and \(F_Y(y)\) be two cdfs all of whose moments exist.
-
If \(X\) and \(Y\) have bounded support, then \(F_X(u)=F_Y(u)\) for all \(u\) iff \(\mathrm{E}[X^r] = \mathrm{E}[Y^r]\) for all \(r=0,1,2,\cdots\)
-
If the moment generating functions exist and \(M_X(t)=M_Y(t)\) for all \(t\) in some neighborhood of 0, then \(F_X(u)=F_Y(u)\) for all \(u\).
Theorem 2.3.12 Suppose \(\left\{X_i,i=1,2,\cdots\right\}\) is a sequence of random varibles, each with mgf \(M_{X_i}(t)\). Furthermore, suppose that
and \(M_X(t)\) is an mgf. Then there is a unique cdf \(F_X\) whose moments are determined by \(M_X(t)\), and for all \(x\) where \(F_X\) is continuous, we have
That is convergence for \(|t|<h\) of mgfs to an mgf implies convergence of cdfs.
Leibnitz's Rule If \(f(x,\theta)\),\(a(\theta)\), and \(b(\theta)\) are differentiable functions of \(\theta\), then
Theorem 2.4.2 Suppose the function \(h(x,y)\) is continuous at \(y_0\) for each \(x\), and there exists a function \(g(x)\) satisfying
-
\(|h(x,y)|\le g(x)\) for all \(x\) and \(y\)
-
\(\int_{-\infty}^{\infty} g(x) dx < \infty\)
Then
Theorem 2.4.3 Suppose \(f(x,\theta)\) is differentiable at \(\theta=\theta_0\), and there exists a function \(g(x,\theta_0)\) and a constant \(\delta_0>0\) such that
-
\(\left|\frac{f(x,\theta_0+\delta)-f(x,\theta_0)}{\delta}\right| \le g(x,\theta_0)\) for all \(x\) and \(|\delta|<\delta_0\)
-
\(\int_{-\infty}^{\infty} g(x,\theta_0) dx < \infty\)
Then
Common Families of Distributions
discrete uniform (1,N) distribution
hypergeometric distribution
binomial distribution
Poisson(\(\lambda\)) distribution
negative binomial(\(r,p\)) distribution
geometric(\(p\)) distribution
Uniform distribution
Gamma(\(\alpha,\beta\)) distribution
Normal(\(\mu,\sigma^2\)) distribution
beta(\(\alpha,\beta\)) distribution
where
Cauchy Distribution
Lognormal distribution
Double exponential distribution
exponential family
exponential(beta) distribution
Theorem 3.4.2 If \(X\) has a pdf (or pmf) in the exponential family, then
-
\(\mathrm{E}\left(\sum_{i=1}^{k} \dfrac{\partial w_i(\bm{\theta})}{\partial \theta_j}t_i(X)\right) = -\dfrac{\partial}{\partial \theta_j} \log c(\bm{\theta})\)
-
\(\mathrm{Var}\left(\sum_{i=1}^{k} \dfrac{\partial w_i(\bm{\theta})}{\partial \theta_j}t_i(X)\right) = -\dfrac{\partial^2}{\partial \theta_j^2} \log c(\bm{\theta}) - \mathrm{E}\left(\sum_{i=1}^{k} \dfrac{\partial^2 w_i(\bm{\theta})}{\partial \theta_j^2}t_i(X)\right)\)
A curved exponential family is a family of densities of the form above for which the dimension of the vector \(\bm{\theta}\) is equal to \(d<k\). If \(d=k\), the family is a full exponential family.
Theorem 3.5.1 Let \(f(x)\) be any pdf and let \(\mu\) and \(\sigma>0\) be any given constants. Then the function
is a pdf.
Let \(f(x)\) be any pdf. Then the family of pdfs \(f(x-\mu)\), indexed by the parameter \(\mu\), \(-\infty<\mu<\infty\), is called the location family with standard pdf \(f(x)\) and \(\mu\) is called the location parameter of the family. For any \(\sigma>0\), the family of pdfs \(\frac{1}{\sigma} f\left(\frac{x}{\sigma}\right)\), indexed by the parameter \(\sigma\), is called the scale family with standard pdf \(f(x)\) and \(\sigma\) is called the scale parameter of the family. The family of pdfs \(\frac{1}{\sigma} f\left(\frac{x-\mu}{\sigma}\right)\), indexed by the parameters \(\mu\) and \(\sigma\), is called the location-scale family with standard pdf \(f(x)\), where \(\mu\) is a location parameter and \(\sigma\) is a scale parameter.
Theorem 3.5.6 Let \(f(\cdot)\) be any pdf. Let \(\mu\) be any real number, and let \(\sigma\) be any positive number. Then \(X\) is a random varible with pdf \(\frac{1}{\sigma} f\left(\frac{x-\mu}{\sigma}\right)\) iff there exists a random variable \(Z\) with pdf \(f(z)\) and \(X= \mu + \sigma Z\).
Theorem 3.5.7 Let \(Z\) be a random variable with pdf \(f(z)\). Suppose \(\mathrm{E}Z\) and \(\mathrm{Var}Z\) exist. If \(X\) is a random variable with pdf \(\frac{1}{\sigma} f\left(\frac{x-\mu}{\sigma}\right)\), then
In particular, if \(\mathrm{E}Z=0\) and \(\mathrm{Var}Z=1\), then \(\mathrm{E}X = \mu\) and \(\mathrm{Var}X = \sigma^2\).
Chebyshev Let \(X\) be a random variable and let \(g(x)\) be a nonnegative function. Then for any \(r>0\),
Theorem 3.6.4 Let \(X_{\alpha,\beta}\) denote a gamma\((\alpha,\beta)\) random variable with pdf \(f(x|\alpha,\beta)\), where \(\alpha>1\). Then for any constants \(a\) and \(b\),
Stein's Lemma Let \(X\sim n(\theta,\sigma^2)\) and let \(g\) be a differentiable function satisfying \(\mathrm{E}|g'(X)|<\infty\). Then
Theorem 3.6.7 Let \(\chi_p^2\) denote a chi-square random variable with \(p\) degrees of freedom, which has pdf \(f(x|p)=\frac{1}{2^{p/2}\Gamma(p/2)} x^{p/2-1} e^{-x/2}, x>0\). Then for any function \(h(x)\),
provided that the expectations exist.
Theorem 3.6.8 Let \(g(x)\) be a function with \(-\infty<\mathrm{E}g(X)<\infty\) and \(-\infty<g(-1)<\infty\). Then:
- If \(X\sim \text{Poisson}(\lambda)\), then
- If \(X\sim \text{negative binomial}(r,p)\), then
Multiple Random Variables
An \(n\)-dimensional random vector is a function from a sample space \(S\) into \(\mathbb{R}^n\), \(n\)-dimensional Euclidean space.
Let \((X,Y)\) be a discrete bivariate random vector. Then the function \(f(x,y)\) from \(\mathbb{R}^2\) into \(\mathbb{R}\) defined by \(f(x,y)=P(X=x,Y=y)\) is called the joint probability mass function or joint pmf of \((X,Y)\).
Theorem 4.1.6 Let \((X,Y)\) be a discrete bivariate random vector with joint pmf \(f_{X,Y}(x,y)\). Then the marginal pmfs of \(X\) and \(Y\), \(f_X(x)=P(X=x)\) and \(f_Y(y)=P(Y=y)\), are given by
A function \(f(x,y)\) from \(\mathbb{R}^2\) into \(\mathbb{R}\) is called a joint probability density function or joint pdf of the continuous bivariate random vector \((X,Y)\) if for every set \(A\subset \mathbb{R}^2\),
the expected value of \(g(X,Y)\) is defined to be
the marginal probability density functions of \(X\) and \(Y\) are defined to be
Let \((X,Y)\) be a discrete bivariate random vector with joint pmf \(f(x,y)\) and marginal pmfs \(f_X(x)\) and \(f_Y(y)\). For any \(x\) such that \(P(X=x)= f_X(x)>0\), the conditional pmf of \(Y\) given that \(X=x\) is the function of \(y\) denoted by \(f(y|x)\) and defined by
For any \(y\) such that \(P(Y=y)= f_Y(y)>0\), the conditional pmf of \(X\) given that \(Y=y\) is the function of \(x\) denoted by \(f(x|y)\) and defined by
For continuous random variables, the conditional pdf is same as above.
If \(g(Y)\) is a function of \(Y\), then the conditional expected value of \(g(Y)\) given that \(X=x\) is denoted by \(\mathrm{E}[g(Y)|x]\) and given by
The variance of the probability distribution described by \(f(y|x)\) is called the conditional variance of \(Y\) given \(X=x\), we have
Let \((X,Y)\) be a bivariate random vector with joint pdf or pmf \(f(x,y)\) and marginal pdfs or pmfs \(f_X(x)\) and \(f_Y(y)\). Then \(X\) and \(Y\) are called independent random variables if for all \(x\in \mathbb{R}\) and \(y\in \mathbb{R}\), we have
Lemma 4.2.7 Let \((X,Y)\) be a bivariate random vector with joint pdf or pmf \(f(x,y)\) and marginal pdfs or pmfs \(f_X(x)\) and \(f_Y(y)\). Then \(X\) and \(Y\) are independent iff there exist functions \(g(x)\) and \(h(y)\) such that, for every \(x\in \mathbb{R}\) and \(y\in \mathbb{R}\),
Theorem 4.2.10 Let \(X\) and \(Y\) be independent random variables.
-
For any \(A\subset \mathbb{R}\) and \(B\subset \mathbb{R}\), \(P(X\in A, Y\in B) = P(X\in A)P(Y\in B)\) that is, the events \(\left\{X\in A\right\}\) and \(\left\{Y\in B\right\}\) are independent events.
-
Let \(g(x)\) be a function only of \(x\) and let \(h(y)\) be a function only of \(y\). Then
- The moment generating function of \(Z=X+Y\) is given by
For example, if \(X\sim n(\mu,\sigma^2)\) and \(Y\sim n(\gamma,\tau^2)\) be independent, then \(Z=X+Y\sim n(\mu+\gamma,\sigma^2+\tau^2)\). If \(X\sim \text{Poisson}(\theta)\) and \(Y\sim \text{Poisson}(\lambda)\) be independent, then \(Z=X+Y\sim \text{Poisson}(\theta+\lambda)\).
Let \((X,Y)\) be a bivariate random vector with a known probability distribution. Now cansider a new bivariate random vector \((U,V)\) defined by \(U=g_1(X,Y)\) and \(V=g_2(X,Y)\), where \(g_1(x,y)\) and \(g_2(x,y)\) are some specified functions. Then the joint pdf of \((U,V)\) is given by
Theorem 4.3.5 Let \(X\) and \(Y\) be independent random variables. Let \(g(x)\) be a function only of \(x\) and let \(h(y)\) be a function only of \(y\). Then the random variables \(U=g(X)\) and \(V=h(Y)\) are independent.
Theorem 4.4.3 If \(X\) and \(Y\) are any two random variables, then
A random variable \(X\) is said to have a mixture distribution if the distribution of \(X\) depends on a quantity that also has a distribution.
Conditional variance identity: For any two random variables \(X\) and \(Y\),
provided that the expectations exist.
We use notation \(\mu_X=\mathrm{E}X\), \(\sigma_X^2=\mathrm{Var}X\).
The covariance of \(X\) and \(Y\) is the number defined by
The correlation of \(X\) and \(Y\) is the number defined by
The value \(\rho_{XY}\) is also called the correlation coefficient.
Theorem 4.5.3
Theorem 4.5.5 If \(X\) and \(Y\) are independent, then \(\mathrm{Cov}(X,Y)=0\) and \(\rho_{XY}=0\).
Theorem 4.5.6 If \(X\) and \(Y\) are random variables and \(a\) and \(b\) are constants, then
If \(X\) and \(Y\) are independent, then
Theorem 4.5.7 For any random variables \(X\) and \(Y\),
-
\(-1\le \rho_{XY} \le 1\)
-
\(|\rho_{XY}|=1\) iff there exist constants \(a\) and \(b\) such that \(P(Y=aX+b)=1\). If \(\rho_{XY}=1\), then \(a>0\); if \(\rho_{XY}=-1\), then \(a<0\).
Let \(-\infty<\mu_X,\mu_Y<\infty\), \(0<\sigma_X,0<\sigma_Y\), and \(-1<\rho<1\). The bivariate normal pdf is given by
-
The marginal distributions of \(X\) is \(n(\mu_X,\sigma_X^2)\).
-
The marginal distributions of \(Y\) is \(n(\mu_Y,\sigma_Y^2)\).
-
The correlation between \(X\) and \(Y\) is \(\rho_{XY}=\rho\).
-
For any constants \(a\) and \(b\), the distribution of \(aX+bY\) is \(n(a\mu_X+b\mu_Y,a^2\sigma_X^2+b^2\sigma_Y^2+2ab\rho\sigma_X\sigma_Y)\).
The random vector \(\bm{X}=(X_1,\cdots,X_n)\) has a sample space that is a subset of \(\mathbb{R}^n\). If \((X_1,\cdots,X_n)\) is a discrete random vector, then the joint pmf of \((X_1,\cdots,X_n)\) is the function defined by \(f(\bm{x})=f(x_1,\cdots,x_n)=P(X_1=x_1,\cdots,X_n=x_n)\). Then for any \(A\subset \mathbb{R}^n\),
If \((X_1,\cdots,X_n)\) is a continuous random vector, then the joint pdf of \((X_1,\cdots,X_n)\) is the function defined by \(f(\bm{x})=f(x_1,\cdots,x_n)\) that satisfies,
Let \(g(\bm{x})=g(x_1,\cdots,x_n)\) be a real-valued function defined on the sample space of \(\bm{X}\). Then \(g(\bm{X})\) is a random variable and the expected value of \(g(\bm{X})\) is
The marginal distribution of \((X_1,\cdots,X_k)\) is given by the pdf or pmf
or
The conditional pdf or pmf of \((X_{k+1},\cdots,X_n)\) given \((X_1,\cdots,X_k)=(x_1,\cdots,x_k)\) is defined by
Let \(n\) and \(m\) be positive integers and let \(p_1,\cdots,p_n\) be numbers satisfying \(0\le p_i\le 1\) and \(\sum_{i=1}^{n} p_i=1\). Then the random vector \((X_1,\cdots,X_n)\) has a multinomial distribution with m trials and cell probabilities \(p_1,\cdots,p_n\) if the joint pmf of \((X_1,\cdots,X_n)\) is
Let \(\bm{X}_1,\cdots,\bm{X}_n\) be random vectors with joint pdf or pmf \(f(\bm{x}_1,\cdots,\bm{x}_n)\). Let \(f_{\bm{X}}(\bm{x})\) denote the marginal pdf or pmf of \(\bm{X}_i\). Then \(\bm{X}_1,\cdots,\bm{X}_n\) are called mutually independent random vectors if for every \((\bm{x}_1,\cdots,\bm{x}_n)\),
If the \(\bm{X}_i\) are all one-dimensional, then the \(\bm{X}_i\) are called mutually independent random vectors. Then
For \(Z=\bm{X}_1+\cdots+\bm{X}_n\), the mgf of \(Z\) is given by
Theorem 4.6.11 Let \(\bm{X}_1,\cdots,\bm{X}_n\) be random vectors. Then \(\bm{X}_1,\cdots,\bm{X}_n\) are mutually independent iff there exist functions \(g_i(\bm{x}_i)\) such that, the joint pdf or pmf of \((\bm{X}_1,\cdots,\bm{X}_n)\) can be written as
Theorem 4.6.12 Let \(\bm{X}_1,\cdots,\bm{X}_n\) be mutually independent random vectors. Let \(g_i(\bm{x}_i)\) be a function only of \(\bm{x}_i\). Then the random variables \(U_i=g_i(\bm{X}_i)\) are mutually independent.
Let \((X_1,\cdots,X_n)\) be a random vector with pdf \(f_{\bm{X}}(x_1,\cdots,x_n)\). Consider \(U_i=g_i(X_1,\cdots,X_n)\), we have the following representation of the joint pdf of \((U_1,\cdots,U_n)\):
Young For \(a,b,p,q>0\) such that \(\frac{1}{p}+\frac{1}{q}=1\), we have
Holder Let \(X\) and \(Y\) be random variables and let \(p,q>0\) such that \(\frac{1}{p}+\frac{1}{q}=1\). Then
Minkowski Let \(X\) and \(Y\) be random variables and let \(1\le p < \infty\). Then
A function \(g(x)\) is convex if \(g(\lambda x+(1-\lambda)y)\le \lambda g(x) + (1-\lambda)g(y)\) for all \(x,y\) and \(0\le \lambda \le 1\). A function \(g(x)\) is concave if \(-g(x)\) is convex.
Jensen For any random variable \(X\) and any convex function \(g(x)\), we have
Covariance Inequality Let \(X\) be any random variable and \(g(x)\) and \(h(x)\) any functions such that \(\mathrm{E}g(X), \mathrm{E}h(X), \mathrm{E}(g(X)h(X))\) exist. Then
- If \(g(x)\) is nondeceasing funcction and \(h(x)\) is a nonincreasing function, then
- If \(g(x)\) and \(h(x)\) are either both nondecreasing or both nonincreasing, then
Properties of a Random Sample
The random variables \(X_1,\cdots,X_n\) are called a random sample of size \(n\) from the population \(f(x)\) if \(X_1,\cdots,X_n\) are mutually independent and the marginal pdf or pmf of each \(X_i\) is \(f(x)\). Alternatively, \(X_1,\cdots,X_n\) are called independent and identically distributed random variables with pdf or pmf \(f(x)\).
Let \(X_1,\cdots,X_n\) be a random sample of size \(n\) from a population and let \(T(x_1,\cdots,x_n)\) be a real-valued or vector-valued function whose domain includes the sample space of \((X_1,\cdots,X_n)\). Then the random variable or random vector \(Y=T(X_1,\cdots,X_n)\) is called a statistic. The probability distribution of \(Y\) is called the sampling distribution of \(Y\).
The sample mean is the arithmetic average of the values in a random sample. It is usually denoted by
The sample variance is the statistic defined by
The sample standard deviation is the statistic defined by \(S=\sqrt{S^2}\).
Lemma 5.2.5 Let \(X_1,\cdots,X_n\) be a random sample from a population and let \(g(x)\) be a function such that \(\mathrm{E}g(X_1)\) and \(\mathrm{Var}g(X_1)\) exist. Then
Theorem 5.2.9 If \(X\) and \(Y\) are independent continuous random variables with pdfs \(f_X(x)\) and \(f_Y(y)\), then the pdf of \(Z=X+Y\) is given by
Theorem 5.2.11 Sippose \(X_1,\cdots,X_n\) is a random sample from a pdf or pmf \(f(x|\theta)\), where
is a member of an exponential family. Define statistics \(T_1,\cdots,T_k\) by
If the set \(\left\{(w_1(\theta),\cdots,w_k(\theta)):\theta\in \Theta\right\}\) contains an open subset of \(\mathbb{R}^k\), then the distribution of \((T_1,\cdots,T_k)\) is an exponential family of the form
Theorem 5.3.1 Let \(X_1,\cdots,X_n\) be a random sample from a \(n(\mu,\sigma^2)\) distribution, and let \(X=(1/n)\sum_{i=1}^{n} X_i\) and \(S^2=[1/(n-1)]\sum_{i=1}^{n} (X_i-\bar{X})^2\). Then
-
\(\bar{X}\) and \(S^2\) are independent random variables.
-
\(\bar{X}\) has a \(n(\mu,\sigma^2/n)\) distribution.
-
\((n-1)S^2/\sigma^2\) has a chi squared distribution with \(n-1\) degrees of freedom.
If \(X_1,\cdots,X_n\) are a random sample from a \(n(\mu,\sigma^2)\), we know that the quantity
has a \(n(0,1)\) distribution. If \(\sigma^2\) is unknown, we can replace \(\sigma\) by \(S\), the sample standard deviation. Then the quantity
The numerator is a \(n(0,1)\) random variable and the denominator is \(\sqrt{\chi^2_{n-1}/(n-1)}\), independent of the numerator. Thus, the distribution of \(T\) can be found by solving the simplified problem of finding the distribution of \(U/\sqrt{V/p}\), where \(U\sim n(0,1)\) and \(V\sim \chi^2_p\), and \(U\) and \(V\) are independent. This gives us Student's \(t\) distribution. Equivalently, a random variable \(T\) has a Student's \(t\) distribution with \(p\) degrees of freedom, and we write \(T\sim t_p\), if it has pdf
Let \(X_1,\cdots,X_n\) be a random sample from a \(n(\mu_X,\sigma_X^2)\) population, and let \(Y_1,\cdots,Y_m\) be a random sample from an independent \(n(\mu_Y,\sigma_Y^2)\) population. The random variable \(F=\frac{S_X^2/\sigma_X^2}{S_Y^2/\sigma_Y^2}\) has Snedecor's \(F\) distribution with \(n-1\) and \(m-1\) degrees of freedom. Equivalently, the random variable \(F\) has the \(F\) distribution with \(p\) and \(q\) degrees of freedom if it has pdf
Theorem 5.3.8
-
If \(X~F_{p,q}\), then \(1/X~F_{q,p}\).
-
If \(X~t_q\), then \(X^2~F_{1,q}\).
-
If \(X~F_{p,q}\), then \((p/q)X/(1+(p/q)X)~Beta(p/2,q/2)\).
The order statistics of a random sample \(X_1,\cdots,X_n\) are the sample values placed in ascending order. They are denoted by \(X_{(1)},\cdots,X_{(n)}\).
The order statistics are random variables that stisfy \(X_{(1)}\le \cdots \le X_{(n)}\). In particular,
The sample range is the statistic defined by \(R=X_{(n)}-X_{(1)}\). The sample median is the statistic defined by
The notation \(\left\{b\right\}\) is defined to be the number \(b\) rounded to the nearest integer in the usual way. More precisely, if \(i\) is an integer and \(i-.5\le b<i+.5\), then \(\left\{b\right\}=i\).
Theorem 5.4.3 Let \(X_1,\cdots,X_n\) be a random sample from a discrete distribution with pmf \(f_X(x_i)=p_i\), where \(x_1<x_2<\cdots\) are the possible values of \(X\) in ascending order. Define
Let \(X_{(1)},\cdots,X_{(n)}\) denote the order statistics from the sample. Then
and
Theorem 5.4.4 Let \(X_{(1)},\cdots,X_{(n)}\) denote the order statistics of a random sample, \(X_1,\cdots,X_n\), from a continuous population with cdf \(F_X(x)\) and pdf \(f_X(x)\). Then the pdf of \(X_{(j)}\) is
Theorem 5.4.6 Let \(X_{(1)},\cdots,X_{(n)}\) denote the order statistics of a random sample, \(X_1,\cdots,X_n\), from a continuous population with cdf \(F_X(x)\) and pdf \(f_X(x)\). Then the joint pdf of \(X_{(i)}\) and \(X_{(j)}\), where \(1\le i<j\le n\), is
A sequence of random variables \(X_1,X_2,\cdots\) converges in probability to a random variable \(X\) if, for every \(\epsilon>0\),
or equivalently
Weak Law of Large Numbers Let \(X_1,X_2,\cdots\) be iid random variables with \(\mathrm{E}X_i=\mu\) and \(\mathrm{Var}X_i=\sigma^2<\infty\). Define \(\bar{X}_n=(1/n)\sum_{i=1}^{n} X_i\). Then, for every \(\epsilon>0\),
that is, \(\bar{X}_n\) converges in probability to \(\mu\).
Theorem 5.5.4 Suppose that \(X_1,X_2,\cdots\) converges in probability to \(X\) and \(h\) is a continuous function. Then \(h(X_1),h(X_2),\cdots\) converges in probability to \(h(X)\).
A sequence of random variables, \(X_1,X_2,\cdots\), converges almost surely to a random variable \(X\) if, for every \(\epsilon>0\),
Strong Law of Large Numbers Let \(X_1,X_2,\cdots\) be iid random variables with \(\mathrm{E}X_i=\mu\) and \(\mathrm{Var}X_i=\sigma^2<\infty\). Define \(\bar{X}_n=(1/n)\sum_{i=1}^{n} X_i\). Then for every \(\epsilon>0\),
that is, \(\bar{X}_n\) converges almost surely to \(\mu\).
A sequence of random variables \(X_1,X_2,\cdots\) converges in distribution to a random variable \(X\) if
at all points \(x\) where \(F_X(x)\) is continuous.
Theorem 5.5.12 If the sequence of random variables, \(X_1,X_2,\cdots\), converges in probability to a random variable \(X\), then it also converges in distribution to \(X\).
Theorem 5.5.13 The sequence of random variables, \(X_1,X_2,\cdots\), converges in probability to a constant \(\mu\) iff the sequence also converges in distribution to \(\mu\). That is the statement
is equivalent to
Central Limit Theorem Let \(X_1,X_2,\cdots\) be a sequence of iid random variables whose mgfs exist in a neighborhood of \(0\). Let \(\mu=\mathrm{E}X_i\) and \(\sigma^2=\mathrm{Var}X_i>0\). Define \(\bar{X}_n=(1/n)\sum_{i=1}^{n} X_i\). Let \(G_n(x)\) denote the cdf of \(\sqrt{n}(\bar{X}_n-\mu)/\sigma\). Then, for every \(x\), \(-\infty<x<\infty\),
that is, \(\sqrt{n}(\bar{X}_n-\mu)/\sigma\) has a limiting standard normal distribution.
Stronger form of the Central Limit Theorem Let \(X_1,X_2,\cdots\) be a sequence of iid random variables with \(\mathrm{E}X_i=\mu\) and \(0<\mathrm{Var}X_i=\sigma^2<\infty\). Define \(\bar{X}_n=(1/n)\sum_{i=1}^{n} X_i\). Let \(G_n(x)\) denote the cdf of \(\sqrt{n}(\bar{X}_n-\mu)/\sigma\). Then, for every \(x\), \(-\infty<x<\infty\),
that is, \(\sqrt{n}(\bar{X}_n-\mu)/\sigma\) has a limiting standard normal distribution.
Slutsky's Theorem If \(X_n\to X\) in distribution and \(Y_n\to a\), a constant, in probability, then
-
\(Y_nX_n\to aX\) in distribution.
-
\(X_n+Y_n\to X+a\) in distribution.
If a function \(g(x)\) has derivatives of order \(r\), that is, \(g^{(r)}(x)=\frac{d^r }{dx^r}g(x)\) exists, then for any constant \(a\), the Taylor polynomial of order \(r\) about \(a\) is defined by
Taylor If \(g^{(r)}(a)\) exists, then
Delta Method Let \(Y_n\) be a sequence of random variables that satisfies \(\sqrt{n}(Y_n-\theta)\to n(0,\sigma^2)\) in distribution. For a given function \(g\) and a specific value of \(\theta\), suppose that \(g'(\theta)\) exists and is not \(0\). Then