MySQL 8.0 引入CTE(Common Table Expression)功能,CTE除了替代派生表以外,还有一个重要的功能,实现递归查询。在CTE功能引入之前,MySQL很难在SQL语句层实现递归查询,一种间接的方式是通过创建存储过程,而引入CTE后,SQL语句级的递归查询将变得很容易,本文将简单介绍CTE递归查询的使用。
一、什么是CTE递归查询?
CTE递归查询可以看作是一个子查询在重复调用自己,它的使用场景如下:
- 生成序列
- 遍历层次或树形结构
CTE递归语法:WITH RECURSIVE cte AS ( initial_query — "seed" member UNION ALL recursive_query — recusive member that references to the same CTE name)SELECT * FROM cte; — main query
上述CTE递归语法中,RECURSIVE关键字是必不可少的,除此之外,还有两个必不可少的成员,一个是seed member,种子成员,它是一个初始化的查询,将在第一次迭代时被执行。另外一个是recusive member,递归成员,它是递归迭代的主要过程,它将会在主查询中生成所有后续的item。
整个递归过程在任何一次迭代不再返回记录时结束,这里要注意,避免由于迭代次数过多,导致内存耗尽。为递归成员设置一个结束条件是非常重要的,当然也可以通过设置递归深度、最大执行时间来限制递归执行,以便在超过限制时,能够强制结束递归CTE。
- cte_max_recursion_depth 参数默认值为1000,限制CTE递归深度,超过阈值,将被强制终止。
- max_execution_time 参数限制查询的最大执行时间,超过该时间,也将被强制终止。
二、CTE递归的简单使用案例
2.1 单层次的序列
目标:创建一个1到10的整数序列,如下:WITH RECURSIVE natural_sequence AS ( SELECT 1 AS n — seed member: our sequence starts from 1 UNION ALL SELECT n + 1 FROM natural_sequence — recursive member: reference to itself WHERE n < 10 — stop condition )SELECT * FROM natural_sequence; — main query+——+| n |+——+| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 |+——+
如果不加结束条件,由于递归深度超过1000,被强制终止,如下:
mysql> WITH RECURSIVE natural_sequence AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM natural_sequence ) SELECT * FROM natural_sequence;ERROR 3636 (HY000): Recursive query aborted after 1001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.
另外一个阶乘的例子,如下:mysql> WITH RECURSIVE factorial(n, fact) AS ( SELECT 0, 1 UNION ALL SELECT n + 1, fact * (n+1) FROM factorial WHERE n < 20 ) SELECT * from factorial;+——+———————+| n | fact |+——+———————+| 0 | 1 || 1 | 1 || 2 | 2 || 3 | 6 || 4 | 24 || 5 | 120 || 6 | 720 || 7 | 5040 || 8 | 40320 || 9 | 362880 || 10 | 3628800 || 11 | 39916800 || 12 | 479001600 || 13 | 6227020800 || 14 | 87178291200 || 15 | 1307674368000 || 16 | 20922789888000 || 17 | 355687428096000 || 18 | 6402373705728000 || 19 | 121645100408832000 || 20 | 2432902008176640000 |+——+———————+
2.2 双层次的序列
实现一种序列,N+2的值由前两个值N+1与N计算而来,最典型的例子就是斐波那契数列,最开始的两个数是0,1,后面的数都是前两个数之和。使用递归CTE实现,如下:mysql> WITH RECURSIVE fibonacci (n, fib_n, next_fib_n) AS ( SELECT 1, 0, 1 UNION ALL SELECT n + 1, next_fib_n, fib_n + next_fib_n FROM fibonacci WHERE n < 20 ) SELECT * FROM fibonacci;+——+——-+————+| n | fib_n | next_fib_n |+——+——-+————+| 1 | 0 | 1 || 2 | 1 | 1 || 3 | 1 | 2 || 4 | 2 | 3 || 5 | 3 | 5 || 6 | 5 | 8 || 7 | 8 | 13 || 8 | 13 | 21 || 9 | 21 | 34 || 10 | 34 | 55 || 11 | 55 | 89 || 12 | 89 | 144 || 13 | 144 | 233 || 14 | 233 | 377 || 15 | 377 | 610 || 16 | 610 | 987 || 17 | 987 | 1597 || 18 | 1597 | 2584 || 19 | 2584 | 4181 || 20 | 4181 | 6765 |+——+——-+————+
另外一个例子,日期序列。有一个需求,需要按天分组,查询每天的销售总额,传统查询方法,使用group by,如下:SELECT order_date, SUM(price) AS salesFROM salesGROUP BY order_date;+————+———+| order_date | sales |+————+———+| 2020-02-01 | 500.49 || 2020-02-02 | 1249.00 || 2020-02-04 | 1199.00 || 2020-02-06 | 1319.40 || 2020-02-07 | 609.00 |+————+———+
这个方式有一个问题,假如有一天没有卖出商品,那么那天的记录就没有,比如上面2020-02-03,2020-02-05这两天就没有数据。
使用递归CTE,就不会有这个问题,如下:WITH RECURSIVE dates(date) AS ( SELECT ‘2020-02-01’ UNION ALL SELECT date + INTERVAL 1 DAY FROM dates WHERE date < ‘2020-02-07’ )SELECT dates.date, COALESCE(SUM(price), 0) salesFROM dates LEFT JOIN sales ON dates.date = sales.order_dateGROUP BY dates.date;+————+———+| date | sales |+————+———+| 2020-02-01 | 500.49 || 2020-02-02 | 1249.00 || 2020-02-03 | 0.00 || 2020-02-04 | 1199.00 || 2020-02-05 | 0.00 || 2020-02-06 | 1319.40 || 2020-02-07 | 609.00 |+————+———+
2.3 层次数据遍历
公司的组织架构、文件夹目录、家族成员关系等等,都是层次关系的数据。以公司员工上下级关系为例,来说明使用递归CTE实现层次数据的遍历。原始数据如下:# create the tableCREATE TABLE orgchart(id INT PRIMARY KEY,name VARCHAR(20),role VARCHAR(20),manager_id INT,FOREIGN KEY (manager_id) REFERENCES orgchart(id));
# insert the rowsINSERT INTO orgchart VALUES(1,’Matthew’,’CEO’,NULL),(2,’Caroline’,’CFO’,1),(3,’Tom’,’CTO’,1),(4,’Sam’,’Treasurer’,2),(5,’Ann’,’Controller’,2),(6,’Anthony’,’Dev Director’,3),(7,’Lousie’,’Sys Admin’,3),(8,’Travis’,’Senior DBA’,3),(9,’John’,’Developer’,6),(10,’Jennifer’,’Developer’,6),(11,’Maria’,’Junior DBA’,8);
# let’s see the table, The CEO has no manager, so the manager_id is set to NULLSELECT * FROM orgchat;+—-+———-+————–+————+| id | name | role | manager_id |+—-+———-+————–+————+| 1 | Matthew | CEO | NULL || 2 | Caroline | CFO | 1 || 3 | Tom | CTO | 1 || 4 | Sam | Treasurer | 2 || 5 | Ann | Controller | 2 || 6 | Anthony | Dev Director | 3 || 7 | Lousie | Sys Admin | 3 || 8 | Travis | Senior DBA | 3 || 9 | John | Developer | 6 || 10 | Jennifer | Developer | 6 || 11 | Maria | Junior DBA | 8 |+—-+———-+————–+————+
使用CTE递归遍历这种层次结构,如下:# find the reporting chain for all the employeesmysql> WITH RECURSIVE reporting_chain(id, name, path) AS ( SELECT id, name, CAST(name AS CHAR(100)) FROM org_chart WHERE manager_id IS NULL UNION ALL SELECT oc.id, oc.name, CONCAT(rc.path,’ -> ‘,oc.name) FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) SELECT * FROM reporting_chain;+——+———-+—————————————+| id | name | path |+——+———-+—————————————+| 1 | Matthew | Matthew || 2 | Caroline | Matthew -> Caroline || 3 | Tom | Matthew -> Tom || 4 | Sam | Matthew -> Caroline -> Sam || 5 | Ann | Matthew -> Caroline -> Ann || 6 | Anthony | Matthew -> Tom -> Anthony || 7 | Lousie | Matthew -> Tom -> Lousie || 8 | Travis | Matthew -> Tom -> Travis || 9 | John | Matthew -> Tom -> Anthony -> John || 10 | Jennifer | Matthew -> Tom -> Anthony -> Jennifer || 11 | Maria | Matthew -> Tom -> Travis -> Maria |+——+———-+—————————————+
这里比较关键的一点是使用了 CAST 函数在CTE的种子成员里,如果不使用CAST函数,则会报错,如下:mysql> WITH RECURSIVE reporting_chain(id, name, path) AS ( SELECT id, name, name FROM org_chart WHERE manager_id IS NULL UNION ALL SELECT oc.id, oc.name, CONCAT(rc.path,’ -> ‘,oc.name) FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) SELECT * FROM reporting_chain;ERROR 1406 (22001): Data too long for column ‘path’ at row 1上面这个SQL语法上是正确的,但是问题在于path字段的类型由非递归的SELECT决定,所以它是CHAR(7),也就是字符串Matthew的长度,所以在CTE递归调用中,将会导致一个字符串截断的报错。
更进一步,我们打印层级的深度level,如下:mysql> WITH RECURSIVE reporting_chain(id, name, path, level) AS ( SELECT id, name, CAST(name AS CHAR(100)), 1 FROM org_chart WHERE manager_id IS NULL UNION ALL SELECT oc.id, oc.name, CONCAT(rc.path,’ -> ‘,oc.name), rc.level+1 FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) SELECT * FROM reporting_chain ORDER BY level;+——+———-+—————————————+——-+| id | name | path | level |+——+———-+—————————————+——-+| 1 | Matthew | Matthew | 1 || 2 | Caroline | Matthew -> Caroline | 2 || 3 | Tom | Matthew -> Tom | 2 || 4 | Sam | Matthew -> Caroline -> Sam | 3 || 5 | Ann | Matthew -> Caroline -> Ann | 3 || 6 | Anthony | Matthew -> Tom -> Anthony | 3 || 7 | Lousie | Matthew -> Tom -> Lousie | 3 || 8 | Travis | Matthew -> Tom -> Travis | 3 || 9 | John | Matthew -> Tom -> Anthony -> John | 4 || 10 | Jennifer | Matthew -> Tom -> Anthony -> Jennifer | 4 || 11 | Maria | Matthew -> Tom -> Travis -> Maria | 4 |+——+———-+—————————————+——-+
来看一个更复杂的树形结构,家谱。数据中包含祖父母、父母和孩子,原始数据如下。CREATE TABLE genealogy(id INT PRIMARY KEY,name VARCHAR(20),father_id INT,mother_id INT,FOREIGN KEY(father_id) REFERENCES genealogy(id),FOREIGN KEY(mother_id) REFERENCES genealogy(id));
# populate the tableINSERT INTO genealogy VALUES(1,’Maria’,NULL,NULL),(2,’Tom’,NULL,NULL),(3,’Robert’,NULL,NULL),(4,’Claire’,NULL,NULL),(5,’John’,2,1),(6,’Jennifer’,2,1),(7,’Sam’,3,4),(8,’James’,7,6);
SELECT * FROM genealogy;+—-+———-+———–+———–+| id | name | father_id | mother_id |+—-+———-+———–+———–+| 1 | Maria | NULL | NULL || 2 | Tom | NULL | NULL || 3 | Robert | NULL | NULL || 4 | Claire | NULL | NULL || 5 | John | 2 | 1 || 6 | Jennifer | 2 | 1 || 7 | Sam | 3 | 4 || 8 | James | 7 | 6 |+—-+———-+———–+———–+
通过CTE递归,我们可以查询某一个人的祖先及与他的关系,如下:mysql> WITH RECURSIVE ancestors AS ( SELECT *, CAST(‘son’ AS CHAR(20)) AS relationship, 0 level FROM genealogy WHERE name=’James’ UNION ALL SELECT g.*, CASE WHEN g.id=a.father_id AND level=0 THEN ‘father’ WHEN g.id=a.mother_id AND level=0 THEN ‘mother’ WHEN g.id=a.father_id AND level=1 THEN ‘grandfather’ WHEN g.id=a.mother_id AND level=1 THEN ‘grandmother’ END, level+1 FROM genealogy g, ancestors a WHERE g.id=a.father_id OR g.id=a.mother_id) SELECT * FROM ancestors;+——+———-+———–+———–+————–+——-+| id | name | father_id | mother_id | relationship | level |+——+———-+———–+———–+————–+——-+| 8 | James | 7 | 6 | son | 0 || 6 | Jennifer | 2 | 1 | mother | 1 || 7 | Sam | 3 | 4 | father | 1 || 1 | Maria | NULL | NULL | grandmother | 2 || 2 | Tom | NULL | NULL | grandfather | 2 || 3 | Robert | NULL | NULL | grandfather | 2 || 4 | Claire | NULL | NULL | grandmother | 2 |+——+———-+———–+———–+————–+——-+
2.4 图结构遍历
看一个交通线路的例子,创建一个表,包含各个站点之间的线路,以及它们之间的距离,原始数据如下:CREATE TABLE train_route(id INT PRIMARY KEY,origin VARCHAR(20),destination VARCHAR(20),distance INT);
# populate the tableINSERT INTO train_route VALUES(1,’MILAN’,’TURIN’,150),(2,’TURIN’,’MILAN’,150),(3,’MILAN’,’VENICE’,250),(4,’VENICE’,’MILAN’,250),(5,’MILAN’,’GENOA’,200),(6,’MILAN’,’ROME’,600),(7,’ROME’,’MILAN’,600),(8,’MILAN’,’FLORENCE’,380),(9,’TURIN’,’GENOA’,160),(10,’GENOA’,’TURIN’,160),(11,’FLORENCE’,’VENICE’,550),(12,’FLORENCE’,’ROME’,220),(13,’ROME’,’FLORENCE’,220),(14,’GENOA’,’ROME’,500),(15,’ROME’,’NAPLES’,210),(16,’NAPLES’,’VENICE’,800);
SELECT * FROM train_route;+—-+———-+————-+———-+| id | origin | destination | distance |+—-+———-+————-+———-+| 1 | MILAN | TURIN | 150 || 2 | TURIN | MILAN | 150 || 3 | MILAN | VENICE | 250 || 4 | VENICE | MILAN | 250 || 5 | MILAN | GENOA | 200 || 6 | MILAN | ROME | 600 || 7 | ROME | MILAN | 600 || 8 | MILAN | FLORENCE | 380 || 9 | TURIN | GENOA | 160 || 10 | GENOA | TURIN | 160 || 11 | FLORENCE | VENICE | 550 || 12 | FLORENCE | ROME | 220 || 13 | ROME | FLORENCE | 220 || 14 | GENOA | ROME | 500 || 15 | ROME | NAPLES | 210 || 16 | NAPLES | VENICE | 800 |+—-+———-+————-+———-+
返回起点为 Milan,可到达的所有目的地,如下:mysql> WITH RECURSIVE train_destination AS ( SELECT origin AS dest FROM train_route WHERE origin=’MILAN’ UNION SELECT tr.destination FROM train_route tr JOIN train_destination td ON td.dest=tr.origin) SELECT * from train_destination;+———-+| dest |+———-+| MILAN || TURIN || VENICE || GENOA || ROME || FLORENCE || NAPLES |+———-+
从一个地点出发,到达另外一个地点可以有多条路径,比如起点为Milan,终点任意,我们看下有多少条路线,每条路线的距离是多少,如下:mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS ( SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin=’MILAN’ UNION SELECT CONCAT(paths.cur_path, ‘ -> ‘, train_route.destination), train_route.destination, paths.tot_distance+train_route.distance FROM paths, train_route WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,’ -> ‘,’,’) ) ) SELECT * FROM paths;+——————————————————-+———-+————–+| cur_path | cur_dest | tot_distance |+——————————————————-+———-+————–+| MILAN | MILAN | 0 || MILAN -> TURIN | TURIN | 150 || MILAN -> VENICE | VENICE | 250 || MILAN -> GENOA | GENOA | 200 || MILAN -> ROME | ROME | 600 || MILAN -> FLORENCE | FLORENCE | 380 || MILAN -> TURIN -> GENOA | GENOA | 310 || MILAN -> GENOA -> TURIN | TURIN | 360 || MILAN -> GENOA -> ROME | ROME | 700 || MILAN -> ROME -> FLORENCE | FLORENCE | 820 || MILAN -> ROME -> NAPLES | NAPLES | 810 || MILAN -> FLORENCE -> VENICE | VENICE | 930 || MILAN -> FLORENCE -> ROME | ROME | 600 || MILAN -> TURIN -> GENOA -> ROME | ROME | 810 || MILAN -> GENOA -> ROME -> FLORENCE | FLORENCE | 920 || MILAN -> GENOA -> ROME -> NAPLES | NAPLES | 910 || MILAN -> ROME -> FLORENCE -> VENICE | VENICE | 1370 || MILAN -> ROME -> NAPLES -> VENICE | VENICE | 1610 || MILAN -> FLORENCE -> ROME -> NAPLES | NAPLES | 810 || MILAN -> TURIN -> GENOA -> ROME -> FLORENCE | FLORENCE | 1030 || MILAN -> TURIN -> GENOA -> ROME -> NAPLES | NAPLES | 1020 || MILAN -> GENOA -> ROME -> FLORENCE -> VENICE | VENICE | 1470 || MILAN -> GENOA -> ROME -> NAPLES -> VENICE | VENICE | 1710 || MILAN -> FLORENCE -> ROME -> NAPLES -> VENICE | VENICE | 1610 || MILAN -> TURIN -> GENOA -> ROME -> FLORENCE -> VENICE | VENICE | 1580 || MILAN -> TURIN -> GENOA -> ROME -> NAPLES -> VENICE | VENICE | 1820 |+——————————————————-+———-+————–+
也可以找出一个起点与一个终点之间的最短路径,比如起点为MILAN,终点为NAPLES,如下:# shortest path from MILAN to NAPLESmysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS ( SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin=’MILAN’ UNION SELECT CONCAT(paths.cur_path, ‘ -> ‘, train_route.destination), train_route.destination, paths.tot_distance+train_route.distance FROM paths, train_route WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,’ -> ‘,’,’) ) ) SELECT * FROM paths WHERE cur_dest=’NAPLES’ ORDER BY tot_distance ASC LIMIT 1+————————-+———-+————–+| cur_path | cur_dest | tot_distance |+————————-+———-+————–+| MILAN -> ROME -> NAPLES | NAPLES | 810 |+————————-+———-+————–+
三、递归CTE使用限制
之间提到过,CTE递归有递归深度和执行时间的限制,除此之外,递归CTE的SELECT不能包含如下语句:
- 聚合函数,如SUM
- group by
- order by
- distinct
- 窗口函数
这些限制只针对递归CTE,如果是非递归的CTE查询,则没有这些限制。
四、总结
CTE递归查询是MySQL 8.0 增加的非常有用的一个特性,以往使用存储过程实现递归的方式完全可以使用递归CTE代替,并且更加简单,相对于存储过程需要额外的授权,递归CTE就像普通的SQL一样,并不需要额外权限。当然,递归CTE相对于非递归CTE确实更加复杂,不仅在语法上,逻辑上也不易理解,需要仔细想想清楚。
原文:https://www.percona.com/blog/2020/02/13/introduction-to-mysql-8-0-recursive-common-table-expression-part-2/
未经允许不得转载:搬瓦工VPS_美国VPS » MySQL 8.0 CTE 递归查询