PG优化-业务场景需求实现-大表低基数列group by优化
阅读使人充实,讨论使人敏捷,写作使人精确。
前言
今天抽空聊一个和成本优化相关的话题。
说到成本优化,大家觉得优化多少算不错呢?
10%? 20%?
成本优化的空间到底有多大呢?
今天分享的案例,将从数据库角度出发,介绍如何将业务场景需求的性能提高上百倍以上,而性能优化的同义词,就是成本优化。
通常,当一个数据库系统出现性能问题,为保障业务体验及服务质量,处理方法可以归纳为两类:
1.资源扩容,即为业务请求提供更多资源。
2.性能优化,即降低单次业务请求的资源消耗。
简单来说,两种方式中,前者是增加(资源)供给,后者是降低(资源)需求。
通过资源扩容方式来维持服务质量,或许土豪公司可以维持,但扩容方式始终是暂时性的方案,并且成本难以得到有效的节约。
相对而言,性能优化方式的效果则会更明显,更彻底、更有持续性。前者会使服务成本越来越高,后者则会使成本越来越低。
好了,我们马上进入正题。
业务场景需求介绍
这是我前天接触的现网问题:
千万级用户的社交平台,业务需求是在移动端APP展示某一周全量活跃对象的周排行榜清单列表。
这里的对象,指的是该平台内注册的一个个话题小组,大家可以理解为一个平台内创建的一个个社交小圈。当前业务的发展情况,一周内活跃的小组大概为K级别。
业务开发逻辑实现如下:
1.表设计:采用分布式架构,按周将数据打散
create table user_active_week_rank (
uin bigint,
community_id bigint, --需求对象id:社区小组id
year bigint,
week int,
week_shard text default '' not null,
active bigint,
status smallint,
is_hit smallint,
update_time timestamp with time zone
) Distribute By shard(week_shard);
2.请求语句:
select community_id FROM user_active_week_rank
WHERE status = 0 AND year = 2021 AND week = 44
GROUP BY community_id
通过上面的展示可以看到,其实这里的业务场景并不复杂,业务需求也就简单一个group by查询可见实现的功能。
如果你是该项目的负责人或相关同学,你觉得这里面存在哪些问题呢?
问题及挑战分析
业务逻辑虽简单,但在我看来,上面的这个需求实现方案至少存在以下几个挑战:
1.该业务场景作为千万级用户平台的app应用场景,单表数据将随着上线时长会变得越来越大。
2.社交类业务,活跃用户行为通常表现为少部分对象(比如二八原则),一些泛娱乐类社交平台,可能存在某些大流量用户对象,如偶像明星、热点话题等。因此,该表无论在对象id上,还是在时间维度上(如一周内),存在数据倾斜的可能。如某些明星举办活动,小组内的粉丝可能异常活跃。类似某些淘宝出现的某些大电商用户。
3.周内活跃对象id排行榜,相对于动态明细数据表来说,应该是一个低基数列,因此group by操作的工作量将会越来越大。
很快,该需求功能上线不久,出现的慢查查如下:
2021-11-05 16:45:05.395 CST,"plat_logic_2","community",27857,coord(27857.0),"9.147.xxx.xxx:46035",6184ef09.6cd1,coord(27857.0),1,"SELECT",2021-11-05 16:44:57 CST,450/0,0,LOG,00000,"
duration: 2811.823 ms statement: SELECT community_id FROM user_active_week_rank WHERE status = 0 AND year = 2021 AND week = 44 GROUP BY community_id",,,,,,,,,""
可以看到,该功能实现的SQL耗时2811.823ms,相当于2个核心支持1个查询请求,对于一个高并发的业务场景来说,这个性能可以说是成本非常高的。
这里说的成本高是什么概念呢?
按照32C+64G+SSD(6T)+万兆网卡的机型配置,单台设备成本约1k元/月来计算,假设该业务QPS为1w/s,那么这里将需要2w个核心来支撑,那么,设备成本将高达600多台设备62.5万/月。
那么这里的问题我们应该怎么优化呢?
方案优化一
在优化前,我们先看看上面的SQL为什么会这么慢
可以看到:with
a as (select count(*) all_rows from user_active_week_rank),
b as (select count(*) where_sets FROM user_active_week_rank WHERE status = 0 AND year = 2021 AND week = 44)
select a.all_rows,b.where_sets,round(b.where_sets/a.all_rows::float,2)*100||'%' set_pct from a,b;
all_rows | where_sets | set_pct
----------+-------------+---------
45997686 | 1581690 | 3%
(1 row)
1.当前业务表单表数据量为4500w+
2.过滤条件where命中的结果集为158w
3.整个逻辑是从四千多万数据过滤出158w条记录进行group by聚合之后全量返回。
执行计划如下:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Remote Subquery Scan on all (dn001,dn002) (cost=1700413.39..1700420.77 rows=738 width=8) (actual time=2706.289..2706.852 rows=1116 loops=1)
Output: community_id
Planning time: 0.283 ms
Execution time: 2710.601 ms
(4 rows)
优化器走的全表扫描,聚合后返回1116条记录,即单周内业务community_id基数为1116个。
下面先用btree索引,看看优化效果如何。原表索引:user_active_week_rank_year_week_uin_community_id_idx
尝试强制走索引看看:
效率比原来的更差,这里也说明了pg优化器选择全表扫描是对的。当前业务设计使用了分布表,相当于启用了并行及物理分区优化能力,但性能未达预期,那么,这里是不是已经没有优化空间了呢?
如果业务选型使用的是MySQL、Oracle等传统数据库,那么这里可能要遇到吃力的问题了。
幸运的是,我们这里使用的PG,接下来我们开始对该功能SQL进行优化。
先看方案一,为方便大家理解,这里先做一个优化反例。
具体思路是:将week字段的数据类型从int改为int[]数组,其他字段不变。目的主要是利用pg数组多值列的特性对这种海量数据的业务场景进行数据压缩。
1.表结构设计如下:
2.将原表数据插入到新表:
db_aken=# insert into user_active_week_rank_arr01(uin,community_id,year,week,active,status,is_hit,week_shard) select uin,community_id,year,array_agg(week),active,status,is_hit,week_shard
from user_active_week_rank
group by uin,community_id,year,active,status,is_hit,week_shard;
INSERT 0 45997686 <<<<
Time: 205883.836 ms (03:25.884)
db_aken=#
可以看到,上面数据插入的过程中,数组没有起到任何压缩效果,新表和原表数据量一样。主要原因是group by条件的唯一性很高,导致每行数组只有一个元素。
db_aken=# select * from user_active_week_rank_arr01 limit 3;
uin | community_id | year | week | week_shard | active | status | is_hit | update_time
-------+--------------+------+------+-------------+--------+--------+--------+-------------
10128 | 291 | 2020 | {47} | 291_2020_47 | 3 | 0 | 0 |
10165 | 407 | 2020 | {43} | 407_2020_43 | 0 | 0 | 0 |
10176 | 144 | 2021 | {23} | 144_2021_23 | 0 | 0 | 0 |
(3 rows)
Time: 9.459 ms
db_aken=#
这种情况下会比原方案性能更慢:飙升到16秒。
db_aken=# create index idx_gin on user_active_week_rank_arr01 using gin(week);
CREATE INDEX
db_aken=# explain analyze SELECT community_id FROM user_active_week_rank_arr01 WHERE status = 0 AND year = 2021 AND week @> array[44];
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0) (actual time=41.939..16212.935 rows=1581690 loops=1)
Node/s: dn001, dn002
Planning time: 0.064 ms
Execution time: 16285.131 ms
(4 rows)
db_aken=#
所以,这会是一个反面的优化例子。
接着,我在优化方案二中,通过了解业务场景,将需求和DB特性结合起来,来达到性能优化的目的。
方案优化二
1.表设计
思路依旧是利用pg的数组特性,提高数据对原表记录数的压缩率:
create table user_active_week_rank_arr02 (
community_id bigint,
year bigint,
week int[],
status smallint
);
2.将数据全量写入新表02
db_aken=#
insert into user_active_week_rank_arr02(community_id,year,week,status) select community_id,year,array_agg(week),status
from user_active_week_rank
group by community_id,year,status;
INSERT 0 1687
Time: 158900.312 ms (02:38.900)
3.索引设计:btree联合索引
db_aken=# create index idx_01 on user_active_week_rank_arr02(year,status,community_id);
CREATE INDEX
Time: 13.381 ms
4.验证性能:耗时从原来2811ms降到308.34ms,性能提高9倍。
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0) (actual time=272.321..308.231 rows=1116 loops=1)
Output: user_active_week_rank_arr02.community_id
Node/s: dn001, dn002
Remote query: SELECT community_id FROM user_active_week_rank_arr02 WHERE ((status = 0) AND (year = 2021) AND (week @> ARRAY[44]))
Planning time: 0.056 ms
Execution time: 308.340 ms <<<<<
5.验证结果的正确性
db_aken=#
with
a as (SELECT community_id FROM user_active_week_rank WHERE status = 0 AND year = 2021 AND week = 44 GROUP BY community_id order by community_id),
b as (SELECT community_id FROM user_active_week_rank_arr02 WHERE status = 0 AND year = 2021 AND week @> array[44] order by community_id)
select a.community_id 原方案,b.community_id 方案02 from a,b where a.community_id=b.community_id;
原方案 | 方案02
--------------+--------------
64 | 64
66 | 66
67 | 67
... (中间省略N行)...
1300 | 1300
1302 | 1302
1304 | 1304
(1116 rows)
db_aken=#
通过方案二优化后,性能得到了9倍的提升,这里为了方便换算,假设性能可以直接折算成资金成本(即业务独占和隔离,1个db实例只运行1个业务功能SQL),那么所需业务设备资源成本是原来的九分之一,也就是从62.5w/月下降到6.9w/月。
优化效果很明显,设备成本下降90%绝对是一个可观的成果。但是,1个SQL耗时300ms,在并发较低的系统,或者OLAP类场景或许能够接受,但在高并发的OLTP场景里面,这个性能还远远不够,资源成本还是太高了。
方案优化三
PG作为世界上功能最丰富的开源数据库,除了数据类型丰富,如上面优化方案二用到的数组类型,还有比传统数据库如mysql、Oracle等更多的功能特性,因此,用户对具体业务的优化手段也更为丰富,能有效支持的业务场景也就更多。
这就是一个数据库产品在数据类型和功能特性多的重要意义。所以,成本的优化,其实和数据库的产品选型也密切相关。选择一个更为健壮的DB产品,往往意味着在成本优化上可以发挥的空间就更大。
在这次的业务场景中,我们可以在pg数组特性的基础上,结合pg独特的partial index和gin索引技术,来对业务做进一步的优化,这将会起到上百倍的性能提升。具体如下:
1.表设计:在方案2的基础上增加hashweek字段
create table user_active_week_rank_hash (
community_id bigint,
year bigint,
hashweek bigint,
week int[],
status smallint
);
2.将原表全量数据4500w+条记录插入到新表
db_aken=# insert into user_active_week_rank_hash(community_id,year,hashweek,week,status)
select community_id,year,abs(mod(hashint8(week),50)),array_agg(week),status from user_active_week_rank group by 1,2,3,5;
INSERT 0 29108
db_aken=#
3.创建索引:
do language plpgsql $$
declare
begin
for i in 0..49 loop
execute format( 'create index idx_week_p%s on user_active_week_rank_hash using gin(week) where hashweek=%s', i,i);
end loop;
end;
$$;
4.改写SQL,增加hashweek条件
1)优化前:
SELECT community_id FROM user_active_week_rank WHERE status = 0 AND year = 2021 AND week = 44 GROUP BY community_id;
2)优化后:
select community_id from user_active_week_rank_hash where status = 0 AND year = 2021 AND week @> array[44] and hashweek=abs(mod(hashint8(44),50));
3.验证性能:
业务请求语句耗时直接从2811ms下降到4.719ms,性能提升463倍。
相当于原来qps=1w/s需要625台设备才能支持的业务需求,现在只需要1.3台设备即可。
6.验证结果的正确性:
db_aken=# with
a as (SELECT community_id FROM user_active_week_rank WHERE status = 0 AND year = 2021 AND week = 44 GROUP BY community_id order by community_id),
b as (SELECT community_id FROM user_active_week_rank_arr02 WHERE status = 0 AND year = 2021 AND week @> array[44] order by community_id),
c as (select community_id from user_active_week_rank_hash where status = 0 AND year = 2021 AND week @> array[44] and hashweek=abs(mod(hashint8(44),50)))
select a.community_id 原方案,b.community_id 方案02,c.community_id 方案03 from a,b,c where a.community_id=b.community_id and a.community_id=c.community_id;
原方案 | 方案02 | 方案03
--------+--------+--------
64 | 64 | 64
66 | 66 | 66
67 | 67 | 67
...(中间省略N行)...
1299 | 1299 | 1299
1300 | 1300 | 1300
1302 | 1302 | 1302
1304 | 1304 | 1304
(1116 rows)
db_aken=#
这个方案的优化效果不仅仅于此,下面我们看看将原表的数据量增加10倍,即原表的数据增长到4.5亿+之后的性能如何。
1.将原表数据量放大10倍:
db_aken=# do language plpgsql $$
declare
begin
for i in 0..9 loop
execute format( 'insert into user_active_week_rank_hash(community_id,year,hashweek,week,status) select community_id,year,abs(mod(hashint8(week),50)),array_agg(week),status from user_active_week_rank group by 1,2,3,5');
end loop;
end;
$$;
2.查看性能效果
可以看到,数亿级的数据量的查询只需要24ms,如果是最初的业务实现方案,无论是性能还是业务体验,肯定是烂到大街了。db_aken=# explain (analyze on,buffers on,verbose on,timing on)
select community_id from user_active_week_rank_hash where status = 0 AND year = 2021 AND week @> array[44] and hashweek=abs(mod(hashint8(44),50));
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0) (actual time=7.317..23.486 rows=11160 loops=1)
Output: user_active_week_rank_10.community_id
Node/s: dn001, dn002
Remote query: SELECT community_id FROM user_active_week_rank_hash WHERE ((status = 0) AND (year = 2021) AND (week @> ARRAY[44]) AND (hashweek = abs(mod(hashint8((44)::bigint), 50))))
Planning time: 0.079 ms
Execution time: 24.373 ms
(6 rows)
db_aken=#
总结
1.成本优化应该从业务场景出发,结合数据库特性,将成本优化的效果发挥到最大化;
2.成本优化和数据库选型密切相关,数据库本身具备的特性,能为业务及成本优化提供更大的可能性,而pg以其丰富的功能特性,不失为一款优秀的产品。
3.日常运维工作如扩缩容、资源管理等,我们可以交给自动化工具平台完成,而场景需求的实现及优化方案的产出,则无法通过自动化实现。DBO(operation)和DBA(administrator)两者之间有着本质的区别,优化就如九层妖塔打怪,看你的功力到底有多深。
>>>
参考资料
1.https://www.postgresql.org/docs/current/arrays.html
2.https://www.postgresql.org/docs/current/gin-implementation.html
3.https://www.postgresql.org/docs/current/indexes-partial.html
往期推荐
1.MySQL横向扩展-分库分表解决方案总结
2.Oracle等待事件-latch:cache buffer chains原理
3.PostgreSQL等待事件-源码窥探
------让学习成为一种习惯-Aken