优化器刺客之limit 1--Order by col limit n 代价预估优化探索

一、现象

order by 排序加了limit后更慢了?

test=# explain analyze  select userid from dba_users where  username like '%aaaaaaaaaaaaaaaaaa%' order by userid ;QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------Sort  (cost=2327.46..2328.96 rows=600 width=4) (actual time=109.316..109.318 rows=0 loops=1)Sort Key: useridSort Method: quicksort  Memory: 25kB->  Bitmap Heap Scan on dba_users  (cost=61.47..2299.78 rows=600 width=4) (actual time=109.311..109.312 rows=0 loops=1)Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaaaaaaaa%'::text)Rows Removed by Index Recheck: 40904Heap Blocks: exact=31502->  Bitmap Index Scan on dba_users_username_idx  (cost=0.00..61.32 rows=600 width=0) (actual time=22.520..22.520 rows=40904 loops=1)Index Cond: ((username)::text ~~ '%aaaaaaaaaaaaaaaaaa%'::text)Planning Time: 0.149 msExecution Time: 109.350 ms
(11 rows)test=# 
test=# explain analyze  select userid from dba_users where  username like '%aaaaaaaaaaaaaaaaaa%'    order by userid   limit 1 ;                                                                                                         QUERY PLAN                                                          
---------------------------------------------------------------------------------------------------------------------------------------------Limit  (cost=0.43..408.59 rows=1 width=4) (actual time=3558.960..3558.961 rows=0 loops=1)->  Index Scan using dba_users_pkey on dba_users  (cost=0.43..244895.74 rows=600 width=4) (actual time=3558.958..3558.959 rows=0 loops=1)Filter: ((username)::text ~~ '%aaaaaaaaaaaaaaaaaa%'::text)Rows Removed by Filter: 6000000Planning Time: 0.171 msExecution Time: 3558.983 ms
(6 rows)test=#

dba_users有600w条数据,username符合检索条件的0行, order by userid limit 1 耗时3558.983 ms,order by userid 耗时109.350 ms。
limit 1是匹配到第一条数据后就返回,这里却更慢了? 看起来不符合预期。
表结构如下:

test=# \d+ dba_usersTable "public.dba_users"Column  |         Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
----------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------userid   | integer               |           | not null |         | plain    |             |              | username | character varying(64) |           |          |         | extended |             |              | password | character varying(64) |           |          |         | extended |             |              | 
Indexes:"dba_users_pkey" PRIMARY KEY, btree (userid)"dba_users_password_idx" btree (password)"dba_users_username_idx" gin (username gin_trgm_ops)
Access method: heaptest=#通常我们可以给order by字段做运算或者类型转换来矫正优化器走实际更优的执行计划。
test=# explain analyze select userid from dba_users where  username like '%aaaaaaaaaaaaaaaaaa%' order by userid + 0 limit 1;QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------Limit  (cost=2302.78..2304.28 rows=1 width=4) (actual time=109.117..109.119 rows=0 loops=1)->  Sort  (cost=2302.78..2304.28 rows=600 width=4) (actual time=109.116..109.117 rows=0 loops=1)Sort Key: ((userid + 0))Sort Method: quicksort  Memory: 25kB->  Bitmap Heap Scan on dba_users  (cost=61.47..2299.78 rows=600 width=4) (actual time=109.110..109.111 rows=0 loops=1)Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaaaaaaaa%'::text)Rows Removed by Index Recheck: 40904Heap Blocks: exact=31502->  Bitmap Index Scan on dba_users_username_idx  (cost=0.00..61.32 rows=600 width=0) (actual time=20.856..20.856 rows=40904 loops=1)Index Cond: ((username)::text ~~ '%aaaaaaaaaaaaaaaaaa%'::text)Planning Time: 0.156 msExecution Time: 109.149 ms
(12 rows)test=#

或者order by字段和where字段建个组合索引? 多列统计信息?这些都是一些规避的方法,优化器自身为什么选择了不优的计划?

二、分析

这个问题一直被诟病,不少文章分析过很多场景,统计信息不准?索引损坏?数据分布问题?我们一起深入探索其中的奥秘。
PostgreSQL的优化器是自底向上生成执行计划,当查询小于12个表,使用的是动态规划算法,在每个计划节点执行各种可能的path和检索方法,然后计算出最小代价path作为最优解,也就是最终的执行计划。

DEBUG跟踪这个过程。
limit 的cost计算是在adjust_limit_rows_costs函数里进行的,根据上一节点subpath的startup_cost和total_cost计算出limit count后对应的cost

如下subpath是T_IndexPath, startup_cost=0.4325 total_cost=244895.745
limit 1对应的startup_cost=0.4325 total_cost= startup_cost + (input_total_cost - input_startup_cost) count_rows / input_rows;
= 0.4325 + (244895.745 - 0.4325 ) * 1/600
= 408.5913541666666667
这个对应的是order by userid limit 1执行计划的cost:Limit (cost=0.43…408.59 rows=1 width=4)

在set_cheapest里比较表扫描的最优方式,循环比较pathlist的每个节点的startup_cost和total_cost,
以下最优total_cost是T_BitmapHeapPath,最优startup_cost是T_seqScan

最终set_cheapest函数里一直角逐出Limit节点最优的startup_cost是0.4325,total_cost是408.59135416666669,这个对应到了order by userid Limit 1 这个plan的total_cost:Limit (cost=0.43…408.59 rows=1 width=4)

在get_cheapest_fractional_path函数里返回best_path 也就是以上total_cost=408.59135416666669的path,即order by userid Limit 1的plan。
并以此创建执行计划,最终执行器执行。

那么从整个过程来看,我们需要关注的是整个计划最终节点的startup_cost和total_cost,优化器会选择最优total_cost的path作为best_path。
代价预估order by userid limit 1时,total_cost为:408.59(实际执行的total_time为:3558.961 ms)
预估order by userid + 0 limit 1时,total_cost为:2304.28(实际执行的total_time为:109.119 ms)
显而易见优化器选择了total_cost更小的408.59所在的path作为执行计划。
很明显代价预估有问题,我们开始就跟踪了order by userid limit 1的cost计算,计算的数值本身没有问题。
是优化器的缺陷?再把执行计划拉出来遛遛。

test=# explain analyze  select userid from dba_users where  username like '%aaaaaaaaaaaaaaaaaa%'    order by userid   limit 1 ;                                                                                                         QUERY PLAN                                                          
---------------------------------------------------------------------------------------------------------------------------------------------Limit  (cost=0.43..408.59 rows=1 width=4) (actual time=3558.960..3558.961 rows=0 loops=1)->  Index Scan using dba_users_pkey on dba_users  (cost=0.43..244895.74 rows=600 width=4) (actual time=3558.958..3558.959 rows=0 loops=1)Filter: ((username)::text ~~ '%aaaaaaaaaaaaaaaaaa%'::text)Rows Removed by Filter: 6000000Planning Time: 0.171 msExecution Time: 3558.983 ms
(6 rows)test=#

cost我们之前已经计算过了,但是从actual time来看,代价预估偏差还是比较大的,
cost:Index Scan startup_cost=0.43 total_cost=244895.74 , limit startup_cost=0.43 total_cost=408.59
从预估来看因为没有offset 所以startup_cost都是0.43,根据公式计算的 limit 的total_cost=408.59,比244895.74小了很多。

actual: Index Scan startup_time=3558.958 total_time=3558.959,limit startup_time=3558.960 total_time=3558.961
从实际执行来看Index startup_time=3558.958 total_time=3558.959,索引扫描startup_time启动代价3558.958ms?

startup_time可以理解为扫描到第一条数据的时间,这里可以虽然走了userid的pkey索引,但是根据username like ‘%aaaaaaaaaaaaaaaaaa%’ filter了600w行,就是说回表匹配了一遍所有行,因此耗时主要是在这里。

再看下limit 节点total_cost计算代价公式

*total_cost=startup_cost + (input_total_cost - input_startup_cost) count_rows / input_rows
总代价 = 父节点启动代价 + (总代价 - 启动代价)即父节点运行代价 * limit 行数/ 预估输出的总行数

这里在where 条件不含排序字段走排序字段索引情况下,有可能通过索引匹配到第一条符合条件的数据会比较久,就是说要考虑索引扫描的整体代价作为limit的启动代价(当前默认逻辑是通过索引很快找到第一条数据,然后输出limit n行,因此整体cost在这个场景下是偏小的),最极端的场景很可能是先扫描了整个索引并且回表去匹配数据,这个cost要预估进去。

因此这里的startup_cost 需要替换为total_cost, 计算公式可以调整为:
*total_cost=total_cost + (input_total_cost - input_startup_cost) count_rows / input_rows

三、方案

当where 条件不含order by字段走order by字段索引不进行sort的情况下,flag(limit_total_cost)会置为true,这个时候就走新的计算逻辑。

        if (count_est != 0){double          count_rows;if (count_est > 0) count_rows = (double) count_est;elsecount_rows = clamp_row_est(input_rows * 0.10);if (count_rows > *rows)count_rows = *rows;if (input_rows > 0){if (limit_total_cost){*total_cost = *total_cost + (input_total_cost - input_startup_cost)* count_rows / input_rows;}else*total_cost = *startup_cost +(input_total_cost - input_startup_cost)* count_rows / input_rows;}*rows = count_rows;if (*rows < 1)*rows = 1;}
}

四、验证

执行计划显示order by userid limit 1和之前order by userid +0 limit 1的执行计划相同,sql耗时符合预期。

test=# explain analyze  select userid from dba_users where  username like '%aaaaaaaaaaaaaaaaaa%' order by userid limit 1 ;QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------Limit  (cost=2302.78..2304.28 rows=1 width=4) (actual time=140.581..140.585 rows=0 loops=1)->  Sort  (cost=2302.78..2304.28 rows=600 width=4) (actual time=140.576..140.579 rows=0 loops=1)Sort Key: useridSort Method: quicksort  Memory: 25kB->  Bitmap Heap Scan on dba_users  (cost=61.47..2299.78 rows=600 width=4) (actual time=140.534..140.536 rows=0 loops=1)Recheck Cond: ((username)::text ~~ '%aaaaaaaaaaaaaaaaaa%'::text)Rows Removed by Index Recheck: 40904Heap Blocks: exact=31502->  Bitmap Index Scan on dba_users_username_idx  (cost=0.00..61.32 rows=600 width=0) (actual time=22.800..22.802 rows=40904 loops=1)Index Cond: ((username)::text ~~ '%aaaaaaaaaaaaaaaaaa%'::text)Planning Time: 1.413 msExecution Time: 141.032 ms
(12 rows)test=#

再验证一个稍微复杂一点的场景。
修改前:

test=# explain analyze  select userid from dba_users where  username like '%aaaaaaaaaaaaaaaaaa%'   and userid in (select city_id from measurement where logdate > '2023-02-01' and logdate < '2023-05-01' and name like '%Nickyoung%' order by city_id   limit 5) ;QUERY PLAN                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Nested Loop  (cost=2143.01..2185.00 rows=1 width=4) (actual time=8153.711..8153.717 rows=0 loops=1)->  HashAggregate  (cost=2142.57..2142.62 rows=5 width=4) (actual time=8153.710..8153.713 rows=0 loops=1)Group Key: measurement.city_idBatches: 1  Memory Usage: 24kB->  Limit  (cost=1.32..2142.56 rows=5 width=4) (actual time=8153.707..8153.710 rows=0 loops=1)->  Merge Append  (cost=1.32..582419.24 rows=1360 width=4) (actual time=8153.705..8153.708 rows=0 loops=1)Sort Key: measurement.city_id->  Index Scan using measurement_y2023m02_city_id_idx on measurement_y2023m02 measurement_1  (cost=0.43..59989.39 rows=140 width=4) (actual time=855.005..855.005 rows=0 loops=1)Filter: ((logdate > '2023-02-01'::date) AND (logdate < '2023-05-01'::date) AND ((name)::text ~~ '%Nickyoung%'::text))Rows Removed by Filter: 1399554->  Index Scan using measurement_y2023m03_city_id_idx on measurement_y2023m03 measurement_2  (cost=0.43..265374.77 rows=620 width=4) (actual time=3747.483..3747.483 rows=0 loops=1)Filter: ((logdate > '2023-02-01'::date) AND (logdate < '2023-05-01'::date) AND ((name)::text ~~ '%Nickyoung%'::text))Rows Removed by Filter: 6197829->  Index Scan using measurement_y2023m04_city_id_idx on measurement_y2023m04 measurement_3  (cost=0.43..257037.48 rows=600 width=4) (actual time=3551.214..3551.214 rows=0 loops=1)Filter: ((logdate > '2023-02-01'::date) AND (logdate < '2023-05-01'::date) AND ((name)::text ~~ '%Nickyoung%'::text))Rows Removed by Filter: 6001729->  Memoize  (cost=0.44..8.46 rows=1 width=4) (never executed)Cache Key: measurement.city_idCache Mode: logical->  Index Scan using dba_users_pkey on dba_users  (cost=0.43..8.45 rows=1 width=4) (never executed)Index Cond: (userid = measurement.city_id)Filter: ((username)::text ~~ '%aaaaaaaaaaaaaaaaaa%'::text)Planning Time: 3.282 msExecution Time: 8153.877 ms
(24 rows)test=#

修改后:

test=# explain analyze  select userid from dba_users where  username like '%aaaaaaaaaaaaaaaaaa%'   and userid in (select city_id from measurement where logdate > '2023-02-01' and logdate < '2023-05-01' and name like '%Nickyoung%' order by city_id   limit 5) ;QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------Nested Loop  (cost=5982.58..6024.57 rows=1 width=4) (actual time=0.129..0.131 rows=0 loops=1)->  HashAggregate  (cost=5982.13..5982.18 rows=5 width=4) (actual time=0.128..0.130 rows=0 loops=1)Group Key: measurement.city_idBatches: 1  Memory Usage: 24kB->  Limit  (cost=5982.11..5982.12 rows=5 width=4) (actual time=0.126..0.127 rows=0 loops=1)->  Sort  (cost=5982.11..5985.51 rows=1360 width=4) (actual time=0.124..0.126 rows=0 loops=1)Sort Key: measurement.city_idSort Method: quicksort  Memory: 25kB->  Append  (cost=177.08..5959.52 rows=1360 width=4) (actual time=0.109..0.110 rows=0 loops=1)->  Bitmap Heap Scan on measurement_y2023m02 measurement_1  (cost=177.08..696.08 rows=140 width=4) (actual time=0.041..0.041 rows=0 loops=1)Recheck Cond: ((name)::text ~~ '%Nickyoung%'::text)Filter: ((logdate > '2023-02-01'::date) AND (logdate < '2023-05-01'::date))->  Bitmap Index Scan on measurement_y2023m02_name_idx  (cost=0.00..177.05 rows=140 width=0) (actual time=0.039..0.039 rows=0 loops=1)Index Cond: ((name)::text ~~ '%Nickyoung%'::text)->  Bitmap Heap Scan on measurement_y2023m03 measurement_2  (cost=377.60..2665.41 rows=620 width=4) (actual time=0.031..0.031 rows=0 loops=1)Recheck Cond: ((name)::text ~~ '%Nickyoung%'::text)Filter: ((logdate > '2023-02-01'::date) AND (logdate < '2023-05-01'::date))->  Bitmap Index Scan on measurement_y2023m03_name_idx  (cost=0.00..377.45 rows=620 width=0) (actual time=0.031..0.031 rows=0 loops=1)Index Cond: ((name)::text ~~ '%Nickyoung%'::text)->  Bitmap Heap Scan on measurement_y2023m04 measurement_3  (cost=377.50..2591.23 rows=600 width=4) (actual time=0.036..0.036 rows=0 loops=1)Recheck Cond: ((name)::text ~~ '%Nickyoung%'::text)Filter: ((logdate > '2023-02-01'::date) AND (logdate < '2023-05-01'::date))->  Bitmap Index Scan on measurement_y2023m04_name_idx  (cost=0.00..377.35 rows=600 width=0) (actual time=0.036..0.036 rows=0 loops=1)Index Cond: ((name)::text ~~ '%Nickyoung%'::text)->  Memoize  (cost=0.44..8.46 rows=1 width=4) (never executed)Cache Key: measurement.city_idCache Mode: logical->  Index Scan using dba_users_pkey on dba_users  (cost=0.43..8.45 rows=1 width=4) (never executed)Index Cond: (userid = measurement.city_id)Filter: ((username)::text ~~ '%aaaaaaaaaaaaaaaaaa%'::text)Planning Time: 2.047 msExecution Time: 0.254 ms
(32 rows)test=#

五、小结

也算牛刀小试修正了下limit的cost预估。不过这种修改方式看起来是不优雅不专业的,内核中可能有很多特例,我们不可能只是一股脑堆if else switch case逻辑。就像Linus说的,排除特例完美覆盖所有情况才是好的代码。

针对这个case,我认为优化器目前limit节点的cost计算逻辑需要加强,计算公式可能需要更复杂的关系因子,或者使用更合理的数学表达式。鄙人不才,还需持续学习积累。

代价预估在一些特定场景下难免会有偏差,其中统计信息不准导致的场景可能会多一些,可以参考这篇了解下统计信息的原理<深入浅出统计信息内核原理(上):Compressed Histogram>。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/443236.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

CentOS7中安装ElasticSearch

文章目录 检测是否安装了Elasticsearch安装JDK下载java配置 下载Elasticsearch解压安装Elasticsearch修改配置文件启动Elasticsearch常见问题 ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎&#xff0c;基于RESTful web接口。Elasti…

【数据结构 02】队列

一、原理 队列通常是链表结构&#xff0c;只允许在一端进行数据插入&#xff0c;在另一端进行数据删除。 队列的特性是链式存储&#xff08;随机增删&#xff09;和先进先出&#xff08;FIFO&#xff1a;First In First Out&#xff09;。 队列的缺陷&#xff1a; 不支持随机…

【C/C++】C/C++编程——整型(一)

整型 C 中的整型是基本的数据类型之一&#xff0c;用于表示没有小数部分的数。这包括正整数、负整数以及零。C 提供了多种整型&#xff0c;以适应不同大小的数值需求和优化内存使用。 整型的种类 C 中的整型可以根据其大小&#xff08;即占用的字节数&#xff09;和能够表示…

方法阻塞的解决方案之一

1、简单使用 一个h一个cpp文件 #pragma once #include <iostream> #include <thread> #include <atomic> #include <chrono> #include <string>class Person {public:struct dog {std::string name;int age;};public:void a(std::atomic<bo…

笔记本从零安装ubuntu server系统+环境配置

文章目录 前言相关链接ubuntu Server 安装教程屏幕自动息屏关上盖子不休眠MobaXterm外网SSH内网穿透IPV6远程 为什么我要笔记本装Linux为什么要换ubuntu Server版能否连接wifi之后Linux 配置清单总结 前言 之前装了个ubuntu desktop 版&#xff0c;发现没有命令行&#xff0c;…

《HTML 简易速速上手小册》第4章:HTML 的表单与输入(2024 最新版)

文章目录 4.1 表单的基础&#xff08;&#x1f4dd;&#x1f680;&#x1f4ac; 开启沟通的大门&#xff09;4.1.1 表单基础知识点4.1.2 基础示例&#xff1a;创建一个简单的注册表单4.1.3 案例扩展一&#xff1a;创建一个调查问卷4.1.4 案例扩展二&#xff1a;创建一个预订表单…

【01】Linux 基本操作指令

带⭐的为重要指令 &#x1f308; 01、ls 展示当前目录下所有文件&#x1f308; 02、pwd 显示用户当前所在路径&#x1f308; 03、cd 进入指定目录&#x1f308; 04、touch 新建文件&#x1f308; 05、tree 以树形结构展示所有文件⭐ 06、mkdir 新建目录⭐ 07、rmdir 删除目录⭐…

【Java 数据结构】LinkedList与链表

LinkedList与链表 1. ArrayList的缺陷2. 链表2.1 链表的概念及结构2.2 链表的实现 3. LinkedList的模拟实现4.LinkedList的使用4.1 什么是LinkedList4.2LinkedList的使用 5. ArrayList和LinkedList的区别 1. ArrayList的缺陷 上节课已经熟悉了ArrayList的使用&#xff0c;并且…

虚拟机扩容后黑屏卡死解决方法

亲测有效&#xff0c;首先一般是在扩容后黑屏的&#xff0c;现象为开机后看到个横线光标不闪&#xff0c;黑屏&#xff0c;进入不了桌面。原因是硬盘已经满了&#xff0c;所以解决方法就是清理硬盘。所以首先还是要解决登录问题。 开机时按 esc 键进入 GNU GRUB&#xff0c;选择…

windows平台使用tensorRT部署yolov5详细介绍,整个流程思路以及细节。

目录 Windows平台上使用tensorRT部署yolov5 前言&#xff1a; 环境&#xff1a; 1.为什么要部署&#xff1f; 2.那为什么部署可以解决这个问题&#xff1f;&#xff08;基于tensorRT&#xff09; 3.怎么部署&#xff08;只讨论tensorRT&#xff09; 3.0部署的流程 3.1怎…

GoLang和GoLand的安装和配置

1. GoLang 1.1 特点介绍 Go 语言保证了既能达到静态编译语言的安全和性能&#xff0c;又达到了动态语言开发维护的高效率&#xff0c;使用一个表达式来形容 Go 语言&#xff1a;Go C Python , 说明 Go 语言既有 C 静态语言程序的运行速度&#xff0c;又能达到 Python 动态语…

闭包的理解?闭包使用场景

说说你对闭包的理解&#xff1f;闭包使用场景 #一、是什么 一个函数和对其周围状态&#xff08;lexical environment&#xff0c;词法环境&#xff09;的引用捆绑在一起&#xff08;或者说函数被引用包围&#xff09;&#xff0c;这样的组合就是闭包&#xff08;closure&#…