1.需求
2.sql实现
这道题先需要去分析结果集,本质上是一个迭代累加的过程,先要得到如下结果
如果在面试数仓中实现了以上结果,基本上面试官会很通过,也在短时间内可以实现,实现sql如下
with tb as (
select 1 as s,'a' as pv
union all
select 2 as s,'b' as pv
union all
select 3 as s,'c' as pv
union all
select 4 as s,'d' as pv
union all
select 5 as s,'e' as pv
union all
select 6 as s,'f' as pv
)select s,pv,concat_ws('+',collect_list(re1) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) from
(select *,concat_ws('+',collect_list(pv) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as re1 from tb) tb1;
首先开窗将pv分组连接起来作为一个结果集,然后再将结果集分组连接起来。核心知识点:collect_list,concat_ws,开窗函数及开窗中的ROWS BETWEEN。其中`ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`为开始行到当前行
另一种实现方式,通过不断解析实现,分组实现,这种实现主要考察对sql的深度理解和组装,实现起来还是比较复杂的,sql如下
with tb as (
select '1' as s,'a' as pv
union all
select '2' as s,'b' as pv
union all
select '3' as s,'c' as pv
union all
select '4' as s,'d' as pv
union all
select '5' as s,'e' as pv
union all
select '6' as s,'f' as pv
),
tb1 as
(select *,
concat_ws(',',collect_list(pv) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as pv_n,
reverse(concat_ws(',',collect_list(s) OVER (ORDER BY s ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))) as s_n
from tb),
tb2 as
(select s,pv,pv_n,s_n from tb1
lateral view explode(SPLIT(pv_n,',')) tb1 as pv_n),
tb3 as
(select *,row_number() over(partition by s order by pv_n) as rn from tb2)
select s,pv,concat_ws('+',collect_list(concat(pv_n,'*',split(s_n,',')[rn-1]))) as res from tb3 group by s,pv order by s;
该种实现方式包含的知识点比较多,包括collect_list,concat_ws,lateral view explode,开窗函数。其中将开窗出来的值作为切割出来的数组下标来使用这种思路在实际中不容易想到。但这个sql本质上有隐患,其中reverse函数如果s字段超过个位数后存在问题,当做练sql吧 实际中要多想想
上图中,要取pv_n*s_n[rn-1],然后再将结果连接起来
然后再分组连接起来