【SI152笔记】part1:方程与优化

SI152: Numerical Optimization

Lec 1.

Optimization

Three elements of an optimization problem: Objective(目标), Variables(变量), Constraints(约束条件).

\[\textbf{Objective} : \min_{x\in\mathbb{R}^n} f(x) \\ \textbf{Variables} : x \\ \textbf{Constraints} : \begin{cases}c_i(x) = 0, &i\in\mathcal{E} \\c_i(x) < 0, &i\in\mathcal{I} \\x\in\Omega\end{cases} \]

Classifacation:

  • Linear Optimization v.s. Nonlinear Optimization
  • Constrained Optimization v.s. Unconstrained Optimization
  • Continuous Optimization v.s. Integer Optimization
  • Stochastic Optimization v.s. Deterministic Optimization
  • Convex Optimization v.s. Nonconvex Optimization
  • Single-objective Optimization v.s. Multi-objective Optimization
  • Bilevel Optimization v.s. Single-level Optimization
    • Bilevel Optimization: \(F(x, y(x))\) is the objective function of the upper level problem, which depends on the solution \(y(x)\) of the lower level problem.
  • Global Optimization v.s. Local Optimization

Equation \(\iff\) Optimization

Iterative algorithms: \(x_{k+1} = \mathcal{M} (x_k)\)

Generally, the sequence from iterative algorithms converges to an “optimal solution”.

  • Globally convergent algorithm v.s. Locally convergent algorithm

Convergence rates

For sequence \(\{x_k\}\) converges to \(x^*\).

Q-linear convergence: If there exists a constant \(c \in [0,1)\) and \(\hat{k}\geq 0\) such that

\[|| x_{k+1} - x^* ||_2 \leq c || x_{k} - x^* ||_2 ~,~\forall k\geq \hat{k} \]

then \(\{x_k\}\) converges Q-linearly to \(x^*\).

  • i.e.

\[\limsup_{k\to\infty} \dfrac{|| x_{k+1} - x^* ||_2}{|| x_{k} - x^* ||_2 } < 1 \]

  • also called geometric convergence.

Q-superlinear convergence: If there exists a sequence \(\{c_k\} \to 0\) such that

\[|| x_{k+1} - x^* ||_2 \leq c_k || x_{k} - x^* ||_2 \]

then \(\{x_k\}\) converges Q-superlinearly to \(x^*\).

  • i.e.

\[\lim_{k\to\infty} \dfrac{|| x_{k+1} - x^* ||_2}{|| x_{k} - x^* ||_2 } = 0 \]

Q-quadratic convergence: If there exists a constant \(c \geq 0\) and \(\hat{k}\geq 0\) such that

\[|| x_{k+1} - x^* ||_2 \leq c || x_{k} - x^* ||_2^2 ~,~\forall k\geq \hat{k} \]

then \(\{x_k\}\) converges Q-linearly to \(x^*\).

“R-” convergence: skipped. So we’ll drop the “Q-”.

Sublinear convergence (Arithmetic Convergence): If the sequence \(\{r_k\}\) converges to \(r^*\) in such a way that

\[|| r_{k+1} - r^* ||_2 \leq C \dfrac{|| r_{0} - r^* ||_2}{k^p} , k\geq 1, 0<p<\infty \]

where \(C\) is a fixed positive number, the sequence is said to converge arithmetically to \(r^∗\) with order \(p\).

  • i.e.

\[\limsup_{k\to\infty} \dfrac{|| r_{k+1} - r^* ||_2}{|| r_{k} - r^* ||_2 } = 1 \]

Lec 2.

Linear equations

Solution:

  1. Direct methods: Gaussian elimination
  2. Iterative methods
  3. Conjugate gradient method

The Jacobi Iteration Method

For \(n \times n\) linear system \(Ax=b\), let solution be \(x^*\):

Let \(A=L+D+U\)

\[L=\begin{bmatrix} 0 & \cdots & \cdots & 0 \\ a_{21} & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ a_{n1} & \cdots & a_{n,n-1} & 0 \end{bmatrix}, \quad D=\begin{bmatrix} a_{11} & 0 & \cdots & 0 \\ 0 & a_{22} & & \vdots \\ \vdots & & \ddots & 0 \\ 0 & \cdots & 0 & a_{nn} \end{bmatrix}, \quad U=\begin{bmatrix} 0 & a_{12} & \cdots & a_{1n} \\ \vdots & \ddots & \ddots & \vdots \\ \vdots & & \ddots & a_{n-1,n} \\ 0 & \cdots & \cdots & 0 \end{bmatrix} \]

Trasform \((D+L+U)x=b\) into:

\[x = D^{-1} b - D^{-1}(L+U)x \]

So Jacobi iterative technique (Matrix form):

\[x^{(k+1)} = D^{-1} b - D^{-1}(L+U)x^{(k)} \stackrel{def}{=} Mx^{(k)} + c \]

(Elementwisely form):

\[x^{(k+1)}_i = \dfrac{1}{a_{ii}}\left(b_i - \sum_{j\neq i} a_{ij} x^{(k)}_{j} \right) \]

Convergence:

\[x^{(k+1)} - x^* = M(x^{(k)} - x^*) \implies ||x^{(k+1)} - x^*|| \leq ||M||\cdot||(x^{(k)} - x^*) || \]

So $||M||<1 \implies \text{linear convergence} $ , \(||M||\) is spectral norm(谱范数) of \(M\).

Weighted Jacobi method (1):

\[x^{(k+1)} = (1-\omega)x^{(k)} + \omega D^{-1}(b - (L+U)x^{(k)}) \]

Weighted Jacobi method (2):

\[\begin{aligned} x^{(k+1)} &= (1-\omega)x^{(k)} + \omega D^{-1}(b - (L+U)x^{(k)}) \\&= \end{aligned} \]

Gauss-Seidel Method

(Matrix form):

\[x^{(k+1)} = (L+D)^{-1} (b - U x^{(k)}) \]

  • Often converges faster than Jacobi.
  • Still depends on the spectral radius but with improved convergence

Nonlinear equations

Nonlinear equation \(f(x)=0\), solution is \(x_*\).

Definition

A function \(G\) is Lipschitz continuous with constant \(L \geq 0\) in \(\mathcal{X}\) if

\[\lVert G(x_1) -G(x_2) \rVert \leq L \lVert x_1 - x_2 \rVert \]

Bisection Method

In use of Intermediate Value Theorem:
If \(f(x)\) is continuous on \([a, b]\) and \(f(a) \cdot f(b) < 0\), then \(\exist c\in (a, b) , f(c) = 0\).

Newton’s method

From:

\[f(x) = f(x_k) + f'(x_k)(x-x_k) + o(x-x_k) \]

Then:

\[x_{k+1} := x_{k} - \dfrac{f(x_{k})}{f'(x_{k})} \]

Quadratic convergence:
Let \(e_k = x_k - x_*\)

\[\begin{aligned} e_{k+1} &= e_{k} - \dfrac{0 + f'(x_*)e_k + \frac{f''(x_*)}{2}e_k^2 + O(e_k^3)}{f'(x_*)+f''(x_*)e_k + O(e_k^2)} \\&= e_{k} - e_{k}\left(1 + \dfrac{f''(x_*)}{2f'(x_*)}e_k + O(e_k^2)\right) \left(1 - \dfrac{f''(x_*)}{f'(x_*)}e_k + O(e_k^2)\right) \\&= \dfrac{f''(x_*)}{2f'(x_*)}e_k^2 \end{aligned} \]

Problem of Newton’s method

  • May cause cycling and even divergence. (\(f(x) = \arctan(x)\))
  • May have \(f'(x_k) = 0\). (\(f(x) = x^3\))
  • \(f(x_k)\) may be undefined.

Secant method

Change tangent into secant:

\[x_{k+1} := x_{k} - \dfrac{x_{k} - x_{k-1}}{f(x_{k}) - f(x_{k-1})} f(x_{k}) \]

Superline convergence:
Let \(e_k = x_k - x_*\), \(M=\dfrac{f''(x_*)}{2f'(x_*)}\)

\[\begin{aligned} f(x_*+e_k) &\approx e_k f'(x_*) (1+M e_k) \\ f(x_*+e_k) - f(x_*+e_{k-1}) &\approx f'(x_*) (e_k-e_{k-1}) (1+M(e_k+e_{k-1})) \\ e_{k+1} &\approx e_{k} - \dfrac{e_k (1+M e_k)}{1+M(e_k+e_{k-1})} \\ &= \dfrac{M e_{k-1}e_{k}}{1+M(e_k+e_{k-1})} \approx M e_{k-1}e_{k} \end{aligned} \]

If \(|e_{k+1}|\approx C|e_k|^p\), then

\[C|e_k|^p \approx |M||e_{k-1}||e_{k}| \\ |e_k| = C|e_{k-1}|^p \approx \left(\dfrac{|M|}{C}\right)^{\frac{1}{p-1}} |e_{k-1}|^{\frac{1}{p-1}} \]

So we can get:

\[|e_{k+1}|\approx \left| \dfrac{f''(x_*)}{2f'(x_*)} \right|^{\frac{\sqrt{5}-1}{2}}|e_k|^{\frac{\sqrt{5}+1}{2}} \]

Newton’s method for multivariable

Suppose we have $F: \mathbb{R}^m \mapsto \mathbb{R}^n $ and solve \(F(x) = 0\).

Then

\[\begin{gathered} \nabla F(x_k)^T d = -F(x_k) \\ x_{k+1} = x_k + d_k \end{gathered} \]

Quadratical convergence:

Newton's method converges quadratically! (under nice assumptions)

  • \(F\) is continuously differentiable in an open convex set \(\mathcal{X}\subset\mathbb{R}^n\) with $ x_* \in\mathcal{X} $.
  • The Jacobian of \(F\) at \(x_*\) is invertible and is bounded in norm by \(M > 0\), i.e.,

    \[\lVert \nabla F(x_*)^{-T} \rVert _2 \leq M \]

    • If \(M\) is large, then following the gradient may send us far away.
  • For some neighborhood of \(x_∗\) with radius \(r > 0\) contained in \(\mathcal{X}\), i.e.,

\[\mathbb{B}(x_∗, r) := \{ x\in\mathbb{R}^n \mid \lVert x-x_* \rVert _2 \leq r \}\subset \mathcal{X} \]

the Jacobian of F(x) is Lipschitz continuous with constant \(L\) in \(\mathbb{B}(x_∗, r)\).

  • If \(L\) is large, then the gradients of the functions change rapidly.

Equations and Optimization

Linear

\[Kx = y \implies \min_x \dfrac{1}{2}\lVert Kx-y \rVert _2 \iff K^T(Kx-y)=0 \]

Nonlinear

  • Nonlinear equation \(\implies\) Optimization
    • \(F(x) = 0 \implies \min_{x\in\mathbb{R}} f(x) := \frac{1}{2}\lVert F(x) \rVert _2^2\)
    • However, we generally do not prefer solve the latter, since

      \[ \nabla f(x) = F(x)^T \nabla F(x), \nabla^2 f(x) = \nabla F(x)^T \nabla F(x) \]

  • Optimization \(\implies\) Nonlinear equation
    • For any \(C^1\)-smooth,

      \[\min_{x\in\mathbb{R}} f(x) \iff \nabla f(x) = 0 \]

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/856347.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

2024-12-21:从魔法师身上吸取的最大能量。用go语言,在一个神秘的地牢里,有 n 名魔法师排成一列。每位魔法师都有一个能量属性,有的提供正能量,而有的则会消耗你的能量。 你被施加了一种诅咒,吸

2024-12-21:从魔法师身上吸取的最大能量。用go语言,在一个神秘的地牢里,有 n 名魔法师排成一列。每位魔法师都有一个能量属性,有的提供正能量,而有的则会消耗你的能量。 你被施加了一种诅咒,吸收来自第 i 位魔法师的能量后,你会立即被传送到第 (i + k) 位魔法师。在这个…

平替兼容MFRC523|国产13.56MHz智能门锁NFC读卡芯片KYN523

NFC是一种非接触式识别和互联技术,可以在移动设备、消费类电子产品等设备间进行近距离无线通信。通过 NFC 可实现数据传输、移动支付等功能。 KYN523是一款高度集成的工作在 13.56MHZ 下的非接触读写器芯片,支持符合ISO/IEC 14443 TypeA、ISO/IEC 14443 TypeB 协议的非接触读…

redis-cli (error) NOAUTH Authentication required问题解决

1.查找redis-cli所在目录 which redis-cli2.切换到redis-cli目录3.切换到usr/bin 目录 执行以下命令redis-cli -h ip -p port 4. 验证redis登录密码 auth password5.获取redis数据

快速幂优化高精度乘法

NOI 1.6 12 题目描述题目给出的 \(n\) 最大可以取到 \(100\) ,即计算 \(2^{100}\) ,明显是超过 long long 的 \(2^{63}-1\),所以需要使用高精度来计算幂次方的乘法简单的高精度,即每次计算一个小整数乘上一个大整数循环 \(n\) 次,每次对上一次的积都乘以 \(2\) vector<…

Docker网络基础知识

Docker 网络是 Docker 容器之间以及容器与主机或其他网络资源之间通信的基础。Docker网络基础1.默认网络当你启动一个容器是,如果没有特别指定网络,它会自动连接到Docker的默认桥接网络(bridge network)。 这个默认的桥接网络通常被称为bridge,它允许容器之间通过IP地址相…

川土微代理商深圳|CA-IS3740,CA-IS3741,CA-IS3742高性能四通道数字隔离芯片

CA-IS3740,CA-IS3741,CA-IS3742产品特性 •信号传输速率:DCto150Mbps •宽电源电压范围:2.5Vto5.5V •宽温度范围:‐40Cto125C •无需启动初始化 •默认输出高电平和低电平选项 •优异的电磁抗扰度 •高CMTI:150kV/s(典型值) •低功耗,(典型值): ▪电流为1.5mA/通道(@5…

大学8086汇编debug——关于int3断点之后继续调试的方法

预先 在汇编中打入int 3,然后在debug中利用G,就可以一路运行到断点处。 正文 在断点上可以用U来查看上下代码的位置断点后运行 然后用-g=xxxx:xxxx可以运行到下一个断点,或是直接运行至结束 还可以用-t=xxxx:xxxx逐步运行 注意:xxxx:xxxx是int 3下一个命令的地址

sunny替换响应体

本文来自博客园,作者:__username,转载请注明原文链接:https://www.cnblogs.com/code3/p/18620658

AuthBy pg walkthrough Intermediate window

nmap └─# nmap -p- -A -sS 192.168.226.46 Starting Nmap 7.94SVN ( https://nmap.org ) at 2024-12-21 01:01 UTC Stats: 0:01:06 elapsed; 0 hosts completed (1 up), 1 undergoing SYN Stealth Scan SYN Stealth Scan Timing: About 52.12% done; ETC: 01:04 (0:01:00 rem…

学期(2024-2025-1) 学号(20241420) 《计算机基础与程序设计》第十三周学习总结

学期(2024-2025-1) 学号(20241420) 《计算机基础与程序设计》第十三周学习总结 作业信息这个作业属于哪个课程 <班级的链接>(如2024-2025-1-计算机基础与程序设计)这个作业要求在哪里 <作业要求的链接>(2024-2025-1计算机基础与程序设计第十三周作业)这个作业…

纪念 Jonathan Barzilai

我在读博期间研究了关于非线性优化算法中的Barzilai-Borwein(BB)步长。正如我的导师安老师所言:梯度下降法是优化算法的基石(只考虑一般可微的目标函数)。在该算法中,迭代点每次沿着当前的负梯度方向"走一步",一直迭代下去,逼近最优解(当然是局部的)。一个自然地…