Recursive Queries, CTE (Common Table Expression) or Self Join, Recursive Views

The CTEs are like temporary tables that exist only during the execution of the query.

WITH RECURSIVE cte_name  [(col_1,..., col_n)]  AS(
    CTE_query_definition -- non-recursive term
    UNION [ALL]
    CTE_query definion  -- recursive term
) SELECT * FROM cte_name;

a simple example:

WITH RECURSIVE nums (n) AS (
    SELECT 1
  UNION ALL
    SELECT n+1 FROM nums WHERE n+1 <= 11
)
SELECT n FROM nums;

Another example using a function

CREATE OR REPLACE FUNCTION yildiz(n integer) RETURNS text AS $$
declare star text;
BEGIN
for counter in 1..n loop
star:=concat(star,'*');
end loop;
return star;
END;
$$ LANGUAGE plpgsql;

WITH RECURSIVE nums (n) AS (
    SELECT 1
  UNION ALL
    SELECT n+1 FROM nums WHERE n+1 <= 4
)
SELECT yildiz(n)  FROM nums;

Getting “manager tree” for some employee:

drop table if exists employees cascade;

CREATE TABLE employees (
   id serial PRIMARY KEY,
   name varchar(255) NOT NULL,
   salary integer,
   job varchar(255),
   manager_id int,
   FOREIGN KEY (manager_id) REFERENCES employees (id) ON DELETE CASCADE
);

INSERT INTO employees (id,name,manager_id,salary,job)
VALUES
(1,  'John', NULL, 10000, 'CEO'),
(2,  'Ben', 5, 1400, 'Junior Developer'),
(3,  'Barry', 6, 500, 'Intern'),
(4,  'George', 5, 1800, 'Developer'),
(5,  'James', 7, 3000, 'Manager'),
(6,  'Steven', 7, 2400, 'DevOps Engineer'),
(7,  'Alice', 1, 4200, 'VP'),
(8,  'Jerry', 1, 3500, 'Manager'),
(9,  'Adam', 8, 2000, 'Data Analyst'),
(10, 'Grace', 8, 2500, 'Developer'),
(11, 'Nicola',5,3000,'Developer'),
(12, 'Alexandra',5,2500,'System Engineer'),
(13, 'Dominic', 5,2750,'Accountant'),
(14, 'Leonard', 8,3450,'Finans Manager'),
(15, 'Eric', 4,2500,'Junior Developer'),
(16, 'Piers Paige', 8,4000,'Executive Director')
;
with recursive submanager as ( 
select id,name,job,manager_id,0 as level from employees where id=1 
union 
select e.id,e.name,e.job,e.manager_id,a.level+1 as level from employees e join submanager a on a.id=e.manager_id) 
select * from submanager;

another example with string_agg function:

WITH RECURSIVE pathman AS (
SELECT id,name, manager_id, job, 0 AS level
FROM employees
WHERE id = 3 -- Barry, the intern
UNION ALL
SELECT e.id,e.name, e.manager_id, e.job, p.level + 1 AS level
FROM employees e
JOIN pathman p ON e.id = p.manager_id
)
SELECT STRING_AGG(name || '(' ||job||')',' > ' 
ORDER BY level ASC) AS path FROM pathman;

Recursion on Graphs

Graphs as a data structure are an ideal candidate for traversing using recursion.

drop table if exists Roads;
CREATE TABLE Roads ( src integer, dest integer,distance integer);
INSERT INTO Roads ( src, dest,distance) VALUES
    (1, 2,4),(2, 3,3),(2, 4,5),(3, 4,2),(4, 1,7),(3, 5,1);

find paths as graph. For that, we will need a PostgreSQL special — ARRAY data type. Arrays are not a common relational database feature but are very handy here for storing paths. If you are not familiar with this data type, then these are the things you need for understanding SQL below:

  • Create array — ARRAY[value_1, value_2]
  • Concatenate array — ARRAY[value_1, value_2] || ARRAY[value_3]
  • ALL function – "x" != ALL(ARRAY["a", "b", "c"]) – compares "x" to each value in array, results is true if all comparisons result in true
WITH RECURSIVE paths (src, dest, distance, path) AS (
        SELECT e.src, e.dest, e.distance,ARRAY[e.src, e.dest]
        FROM roads e
    UNION
        SELECT p.src, e.dest,p.distance+e.distance, p.path || ARRAY[e.dest]
        FROM paths p
        JOIN roads e
        ON p.dest = e.src AND e.dest != ALL(p.path)
)
SELECT * FROM paths order by src,dest,distance;

Recursive VIEW

CREATE RECURSIVE VIEW paths (src, dest, distance, path) AS (
        SELECT e.src, e.dest, e.distance,ARRAY[e.src, e.dest]
        FROM roads e
    UNION
        SELECT p.src, e.dest,p.distance+e.distance, p.path || ARRAY[e.dest]
        FROM paths p
        JOIN roads e
        ON p.dest = e.src AND e.dest != ALL(p.path)
);

SELECT src, dest, path, distance FROM paths WHERE src = 4;

In Oracle the same is done by:

hierarchical_query_clause::=

Description of hierarchical_query_clause.gif follows
SELECT last_name, employee_id, manager_id, LEVEL
      FROM employees
      START WITH employee_id = 100
      CONNECT BY PRIOR employee_id = manager_id
      ORDER SIBLINGS BY last_name;

LAST_NAME                 EMPLOYEE_ID MANAGER_ID      LEVEL
------------------------- ----------- ---------- ----------
King                              100                     1
Cambrault                         148        100          2
Bates                             172        148          3
Bloom                             169        148          3
Fox                               170        148          3
Kumar                             173        148          3
Ozer                              168        148          3
Smith                             171        148          3
De Haan                           102        100          2
Hunold                            103        102          3
Austin                            105        103          4
Ernst                             104        103          4
Lorentz                           107        103          4
Pataballa                         106        103          4
Errazuriz                         147        100          2
Ande                              166        147          3
Banda                             167        147          3

Connect By – An alternative approach

An alternative mechanism for querying tree structures is to use the CONNECT BY statement (or connectby in PostgreSQL). CONNECT BY is a standard feature of ORACLE RDBMS, but is not part of the PostgreSQL core functionality, and instead, it is included in the tablefunc Additional Supplied Module.

Assuming your PostgreSQL install has access to tablefunc, you can install its functions into your database with:

CREATE EXTENSION tablefunc;

The format of connectby is:

connectby(text relname, text keyid_fld, text parent_keyid_fld [, text orderby_fld ], text start_with, int max_depth [, text branch_delim ])

select e.id,e.name,e.manager_id,orderbyManager from 
connectby('employees', 'id', 'manager_id', 'id','1', 0,'->') 
as t(id int, manager_id int, level int,orderbyManager text,managerid int) join employees e on e.id=t.id;

So why would you potentially use connectby, rather than WITH RECURSIVE? You would probably only use it where you already had legacy code that you wished to remain consistent with. Some limited testing of a slightly more complex tree containing 4,000+ records seems to show performance that is in favour of the RECURSIVE CTE approach (550ms vs 850ms). Testing was by no means exhaustive, and your mileage may vary.

Leave a Reply

Your email address will not be published. Required fields are marked *