807补充(十一)(鞍论与随机逼近理论篇)
一.高等概率论初步
Theorem: σ − \sigma- σ− 代数,如果样本空间 Ω \Omega Ω 的一系列子集的集合 F \mathcal{F} F 满足:
(1) ∅ ∈ F \emptyset \in \mathcal{F} ∅∈F
(2) 若 A ∈ F A \in \mathcal{F} A∈F, 则 A c ∈ F A^c \in \mathcal{F} Ac∈F
(3) 若 A 1 , A 2 … ∈ F A_1, A_2 \ldots \in \mathcal{F} A1,A2…∈F, 则 ⋃ i = 1 ∞ A i ∈ F \bigcup_{i=1}^{\infty} A_i \in \mathcal{F} ⋃i=1∞Ai∈F
我们称 F \mathcal{F} F 为一个 σ − \sigma- σ− 代数, 或者 σ − \sigma- σ− 域。
概率三元组是建立严格概率论的基础,它也被称为概率空间或概率测度空间。概率三元组由三个元素组成。
-
Ω : \Omega: Ω:称为样本空间(或结果空间)的集合。其中的任何元素(或点) Ω \Omega Ω, 表示为 ω \omega ω,称为结果。该集合包含随机抽样过程的所有可能结果。
-
F : \mathcal{F}: F:事件空间的集合。它是 Ω \Omega Ω的 σ − \sigma- σ−代数(或 σ − \sigma- σ−域), F \cal F F中的一个元素,表示为 A A A,称为事件。基本事件是指样本空间中的单个结果。一个事件可以是基本事件或多个基本事件的组合。
-
P : \Bbb P: P:从 F \mathcal F F到 [ 0 , 1 ] [0,1] [0,1]的映射。是一个概率测度,任何 A ∈ F A\in \mathcal F A∈F都是一个集合,其中包含一些属于 Ω \Omega Ω的点。 P ( A ) \Bbb P(A) P(A)就是这个集合的测度。
P ( A ) = 0 \Bbb P(A)=0 P(A)=0等价于 A A A是一个零测集,空集是零测集但零测集不是空集。
随机变量:在概率三元组的基础上,我们将随机变量定义为从样本空间到实数的映射 X ( ω ) X(\omega) X(ω): Ω ⇒ R \Omega \Rightarrow \Bbb R Ω⇒R,当然,并不是所有的映射都可以被定义为随机变量,随机变量的完整形式定义如下:
A function X : Ω → R X: \Omega \rightarrow \mathbb{R} X:Ω→R is a random variable if
A = { ω ∈ Ω ∣ X ( ω ) ≤ x } ∈ F ∀ x ∈ R A=\{\omega \in \Omega \mid X(\omega) \leq x\} \in \mathcal{F} \qquad \forall x\in \Bbb R A={ω∈Ω∣X(ω)≤x}∈F∀x∈R
这个定义表明,只有当 X ( ω ) ≤ x X(\omega)\leq x X(ω)≤x是 F \cal F F中的一个事件时, X X X才是随机变量
二.随机变量的条件期望
条件期望经常出现在随机序列的收敛性分析中,先考虑三种情况
- E X [ X ∣ Y = 5 ] \Bbb E_{X}[X\mid Y=5] EX[X∣Y=5]
- E X [ X ∣ Y = y ] \Bbb E_{X}[X\mid Y=y] EX[X∣Y=y]
- E X [ X ∣ Y ] \Bbb E_{X}[X\mid Y] EX[X∣Y]
不难看出,第一种情况是一个与 X , Y X,Y X,Y都无关的常数,第二种情况是与 Y = y Y=y Y=y有关的函数,而第三种情况由随机变量的定义可知为关于 Y Y Y的随机变量。条件期望也可写成 E X ∼ P ( X ∣ Y ) [ X ] \Bbb E_{X\sim P(X\mid Y)}[X] EX∼P(X∣Y)[X],表示为当 X ∼ P ( X ∣ Y ) X\sim P(X\mid Y) X∼P(X∣Y)时 X X X的期望。条件期望有以下常用性质
Theorem: Let X , Y , Z X, Y, Z X,Y,Z be random variables. The following properties hold.
(a) E X [ X = a ∣ Y ] = a \mathbb{E}_{X}[X=a \mid Y]=a EX[X=a∣Y]=a, where a a a is a given number.
(b) E X , Z [ a X + b Z ∣ Y ] = a E X , Z [ X ∣ Y ] + b E X , Z [ Z ∣ Y ] \mathbb{E}_{X,Z}[a X+b Z \mid Y]=a \mathbb{E}_{X,Z}[X \mid Y]+b \mathbb{E}_{X,Z}[Z \mid Y] EX,Z[aX+bZ∣Y]=aEX,Z[X∣Y]+bEX,Z[Z∣Y].
© E X [ X ∣ Y ] = E X [ X ] \mathbb{E}_{X}[X \mid Y]=\mathbb{E}_{X}[X] EX[X∣Y]=EX[X] if X , Y X, Y X,Y are independent.
(d) E X [ X f ( Y ) ∣ Y ] = f ( Y ) E X [ X ∣ Y ] \mathbb{E}_{X}[X f(Y) \mid Y]=f(Y) \mathbb{E}_{X}[X \mid Y] EX[Xf(Y)∣Y]=f(Y)EX[X∣Y].
(e) E Y [ f ( Y ) ∣ Y ] = f ( Y ) \mathbb{E}_{Y}[f(Y) \mid Y]=f(Y) EY[f(Y)∣Y]=f(Y).
(f) E X [ X ∣ Y , f ( Y ) ] = E X [ X ∣ Y ] \mathbb{E}_{X}[X \mid Y, f(Y)]=\mathbb{E}_{X}[X \mid Y] EX[X∣Y,f(Y)]=EX[X∣Y].
(g) If X ≥ 0 X \geq 0 X≥0, then E X [ X ∣ Y ] ≥ 0 \mathbb{E}_{X}[X \mid Y] \geq 0 EX[X∣Y]≥0.
(h) If X ≥ Z X \geq Z X≥Z, then E X [ X ∣ Y ] ≥ E X [ Z ∣ Y ] \mathbb{E}_{X}[X \mid Y] \geq \mathbb{E}_{X}[Z \mid Y] EX[X∣Y]≥EX[Z∣Y].
Proof.我们在这里仅证明其中一些性质,其他的可以用同样的方法证明
- (a)对于任何 y y y, E X [ X = a ∣ Y = y ] \Bbb E_{X}[X=a\mid Y=y] EX[X=a∣Y=y]的取值都是个常数 a a a,所以得证
- (b) E X , Z [ a X + b Z ∣ Y ] = ∑ X , Z [ a X + b Z ] ⋅ P ( X , Z ∣ Y ) = ∑ X , Z a ⋅ X ⋅ P ( X , Z ∣ Y ) + ∑ X , Z b ⋅ Z ⋅ P ( X , Z ∣ Y ) = a E X , Z [ X ∣ Y ] + b E X , Z [ Z ∣ Y ] \Bbb E_{X,Z}[aX+bZ\mid Y]=\sum_{X,Z}[aX+bZ]\cdot P(X,Z\mid Y)=\sum_{X,Z}a\cdot X\cdot P(X,Z\mid Y)+\sum_{X,Z}b\cdot Z\cdot P(X,Z\mid Y)=a \mathbb{E}_{X,Z}[X \mid Y]+b \mathbb{E}_{X,Z}[Z \mid Y] EX,Z[aX+bZ∣Y]=X,Z∑[aX+bZ]⋅P(X,Z∣Y)=X,Z∑a⋅X⋅P(X,Z∣Y)+X,Z∑b⋅Z⋅P(X,Z∣Y)=aEX,Z[X∣Y]+bEX,Z[Z∣Y]
- ©因 X , Y X,Y X,Y独立,则 P ( X ∣ Y ) = P ( X ) P(X\mid Y)=P(X) P(X∣Y)=P(X)
- (d) E X [ X f ( Y ) ∣ Y ] = ∑ X X f ( Y ) P ( X ∣ Y ) = f ( Y ) ∑ X X P ( X ∣ Y ) = \mathbb{E}_{X}[X f(Y) \mid Y]=\sum_X X f(Y) P(X \mid Y)=f(Y) \sum_X X P(X \mid Y)= EX[Xf(Y)∣Y]=∑XXf(Y)P(X∣Y)=f(Y)∑XXP(X∣Y)= f ( y ) E [ X ∣ Y = y ] = f ( Y ) E X [ X ∣ Y ] f(y) \mathbb{E}[X \mid Y=y]=f(Y) \mathbb{E}_{X}[X \mid Y] f(y)E[X∣Y=y]=f(Y)EX[X∣Y]
- (e) E Y [ f ( Y ) = f ( y ) ∣ Y = y ] = f ( y ) \Bbb E_{Y}[f(Y)=f(y)\mid Y=y]=f(y) EY[f(Y)=f(y)∣Y=y]=f(y),所以得证
- (g)因 X ≥ 0 , P ( X ∣ Y ) ≥ 0 X\geq0,P(X\mid Y)\geq0 X≥0,P(X∣Y)≥0,得证
- (h)利用(g)可证
三.重期望
Theorem: Let X , Y , Z X, Y, Z X,Y,Z be random variables. The following properties hold.
(a) E Y [ E X [ X ∣ Y ] ] = E X [ X ] \mathbb{E}_{Y}[\mathbb{E}_{X}[X \mid Y]]=\mathbb{E}_{X}[X] EY[EX[X∣Y]]=EX[X].
(b) E Y , Z [ E X [ X ∣ Y , Z ] ] = E X [ X ] \mathbb{E}_{Y,Z}[\mathbb{E}_{X}[X \mid Y, Z]]=\mathbb{E}_{X}[X] EY,Z[EX[X∣Y,Z]]=EX[X].
© E Y [ E X [ X ∣ Y ] ∣ Y ] = E X [ X ∣ Y ] \mathbb{E}_{Y}[\mathbb{E}_{X}[X \mid Y] \mid Y]=\mathbb{E}_{X}[X \mid Y] EY[EX[X∣Y]∣Y]=EX[X∣Y].
proof:
- (a)考虑 E X [ X ∣ Y ] \Bbb E_{X}[X|Y] EX[X∣Y]是 Y Y Y的函数,定义为 f ( Y ) f(Y) f(Y)= E X [ X ∣ Y ] \Bbb E_{X}[X|Y] EX[X∣Y]
E Y [ E X [ X ∣ Y ] ] = E Y [ f ( Y ) ] = ∑ y f ( Y = y ) P ( y ) = ∑ y E [ X ∣ Y = y ] P ( y ) = ∑ y ( ∑ x x P ( x ∣ y ) ) P ( y ) = ∑ x x ∑ y P ( x ∣ y ) P ( y ) = ∑ x x ∑ y P ( x , y ) = ∑ x x P ( x ) = E X [ X ] . \begin{aligned} \mathbb{E}_{Y}[\mathbb{E}_{X}[X \mid Y]]=\mathbb{E}_{Y}[f(Y)] & =\sum_y f(Y=y) P(y) \\ & =\sum_y \mathbb{E}[X \mid Y=y] P(y) \\ & =\sum_y\left(\sum_x x P(x \mid y)\right) P(y) \\ & =\sum_x x \sum_y P(x \mid y) P(y) \\ & =\sum_x x \sum_y P(x, y) \\ & =\sum_x x P(x) \\ & =\mathbb{E}_{X}[X] . \end{aligned} EY[EX[X∣Y]]=EY[f(Y)]=y∑f(Y=y)P(y)=y∑E[X∣Y=y]P(y)=y∑(x∑xP(x∣y))P(y)=x∑xy∑P(x∣y)P(y)=x∑xy∑P(x,y)=x∑xP(x)=EX[X].
- (b)将 Y , Z Y,Z Y,Z记作 Q Q Q利用(a)可证
- 利用条件期望公式(e)可证
四.随机序列收敛定义
我们在测度论的基础上建立概率论的一个主要原因是它能够严格地描述随机序列的收敛性。
考虑随机序列 { X k } = { X 1 , X 2 , … , X k , … } \left\{X_k\right\} =\left\{X_1, X_2, \ldots, X_k, \ldots\right\} {Xk}={X1,X2,…,Xk,…}这个序列中的每个元素都是定义在三元组上的随机变量 ( Ω , F , P ) (\Omega, \mathcal{F}, \mathbb{P}) (Ω,F,P)。
- Sure convergence:(点收敛)
Definition: { X k } \left\{X_k\right\} {Xk} converges surely (or everywhere or pointwise) to X X X if
lim k → ∞ X k ( ω ) = X ( ω ) , for all ω ∈ Ω \lim _{k \rightarrow \infty} X_k(\omega)=X(\omega), \quad \text { for all } \omega \in \Omega k→∞limXk(ω)=X(ω), for all ω∈Ω
It means that lim k → ∞ X k ( ω ) = X ( ω ) \lim _{k \rightarrow \infty} X_k(\omega)=X(\omega) limk→∞Xk(ω)=X(ω) is valid for all points in Ω \Omega Ω. This definition can be equivalently stated as
A = Ω where A = { ω ∈ Ω : lim k → ∞ X k ( ω ) = X ( ω ) } A=\Omega \quad \text { where } \quad A=\left\{\omega \in \Omega: \lim _{k \rightarrow \infty} X_k(\omega)=X(\omega)\right\} A=Ω where A={ω∈Ω:k→∞limXk(ω)=X(ω)}
-
Almost sure convergence:(几乎必然收敛)
Definition: { X k } \left\{X_k\right\} {Xk} converges almost surely (or almost everywhere or with probability 1 or w.p.1) to X X X if
P ( A ) = 1 where A = { ω ∈ Ω : lim k → ∞ X k ( ω ) = X ( ω ) } . \mathbb{P}(A)=1 \quad \text { where } \quad A=\left\{\omega \in \Omega: \lim _{k \rightarrow \infty} X_k(\omega)=X(\omega)\right\} . P(A)=1 where A={ω∈Ω:k→∞limXk(ω)=X(ω)}.It means that lim k → ∞ X k ( ω ) = X ( ω ) \lim _{k \rightarrow \infty} X_k(\omega)=X(\omega) limk→∞Xk(ω)=X(ω) is valid for almost all points in Ω \Omega Ω. The points, for which this limit is invalid, form a set of zero measure. For the sake of simplicity,which is often written as
P ( lim k → ∞ X k = X ) = 1 \mathbb{P}\left(\lim _{k \rightarrow \infty} X_k=X\right)=1 P(k→∞limXk=X)=1Almost sure convergence can be denoted as X k → a.s. X X_k \xrightarrow{\text { a.s. }} X Xk a.s. X.
-
Convergence in probability:(依概率收敛)
Definition: { X k } \left\{X_k\right\} {Xk} converges in probability to X X X if for any ϵ > 0 \epsilon>0 ϵ>0,
lim k → ∞ P ( A k ) = 0 where A k = { ω ∈ Ω : ∣ X k ( ω ) − X ( ω ) ∣ > ϵ } . \lim _{k \rightarrow \infty} \mathbb{P}\left(A_k\right)=0 \quad \text { where } \quad A_k=\left\{\omega \in \Omega:\left|X_k(\omega)-X(\omega)\right|>\epsilon\right\} . k→∞limP(Ak)=0 where Ak={ω∈Ω:∣Xk(ω)−X(ω)∣>ϵ}.For simplicity,the equation can be written as
lim k → ∞ P ( ∣ X k − X ∣ > ϵ ) = 0 \lim _{k \rightarrow \infty} \mathbb{P}\left(\left|X_k-X\right|>\epsilon\right)=0 k→∞limP(∣Xk−X∣>ϵ)=0 -
Convergence in mean:( L p L^{p} Lp收敛)
Definition: { X k } \left\{X_k\right\} {Xk} converges in the r r r-th mean (or in the L r L^r Lr norm) to X X X if
lim k → ∞ E [ ∣ X k − X ∣ r ] = 0 \lim _{k \rightarrow \infty} \mathbb{E}\left[\left|X_k-X\right|^r\right]=0 k→∞limE[∣Xk−X∣r]=0The most frequently used cases are r = 1 r=1 r=1 and r = 2 r=2 r=2. It is worth mentioning that convergence in mean is not equivalent to lim k → ∞ E [ X k − X ] = 0 \lim _{k \rightarrow \infty} \mathbb{E}\left[X_k-X\right]=0 limk→∞E[Xk−X]=0 or lim k → ∞ E [ X k ] = \lim _{k \rightarrow \infty} \mathbb{E}\left[X_k\right]= limk→∞E[Xk]= E [ X ] \mathbb{E}[X] E[X], which indicates that E [ X k ] \mathbb{E}\left[X_k\right] E[Xk] converges but the variance may not.
-
Convergence in distribution:(依分布收敛)
Definition: The cumulative distribution function of X k X_k Xk is defined as P ( X k ≤ a ) \mathbb{P}\left(X_k \leq a\right) P(Xk≤a) where a ∈ R a \in \mathbb{R} a∈R. Then, { X k } \left\{X_k\right\} {Xk} converges to X X X in distribution if the cumulative distribution function converges:
lim k → ∞ P ( X k ≤ a ) = P ( X ≤ a ) , for all a ∈ R . \lim _{k \rightarrow \infty} \mathbb{P}\left(X_k \leq a\right)=\mathbb{P}(X \leq a), \quad \text { for all } a \in \mathbb{R} . k→∞limP(Xk≤a)=P(X≤a), for all a∈R.A compact expression is
lim k → ∞ P ( A k ) = P ( A ) \lim _{k \rightarrow \infty} \mathbb{P}\left(A_k\right)=\mathbb{P}(A) k→∞limP(Ak)=P(A)
where
A k ≐ { ω ∈ Ω : X k ( ω ) ≤ a } , A ≐ { ω ∈ Ω : X ( ω ) ≤ a } . A_k \doteq\left\{\omega \in \Omega: X_k(\omega) \leq a\right\}, \quad A \doteq\{\omega \in \Omega: X(\omega) \leq a\} . Ak≐{ω∈Ω:Xk(ω)≤a},A≐{ω∈Ω:X(ω)≤a}.
点收敛要求在 Ω \Omega Ω上所有的点都收敛至 X ( ω ) X(\omega) X(ω),几乎必然收敛允许 Ω \Omega Ω上有部分点不收敛,但这部分集合必需是零测集。依概率收敛是比几乎必然收敛更宽松的条件,仅对随机变量的值进行要求,并未对自变量 ω \omega ω进行要求。依分布收敛进要求累积分布函数一致,并未做更多的要求。
.