3

Good afternoon everyone

I found just one post here within the last year about this, but it doesn't help my situation.

I have been working with MySQL and trying to improve my knowledge of recursive CTE. The version of MySQL is 8.0.19 on a Windows device. The table that I have is generated with:

DROP TABLE IF EXISTS test_table;
CREATE TABLE test_table (
  `id` int(5) NOT NULL AUTO_INCREMENT,
  `source` varchar(20) NOT NULL,
  `destination` varchar(20) NOT NULL,
  `route` varchar (200) NOT NULL,
  `open` BOOLEAN DEFAULT true,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Values are entered using:

INSERT INTO test_table VALUES 
  (1, 'STA_A', 'STA_B', 'Line1', true),
  (2, 'STA_B', 'STA_A', 'Line1', true),
  (3, 'STA_B', 'STA_C', 'Line1', true),
  (4, 'STA_C', 'STA_B', 'Line1', true),
  (5, 'STA_C', 'STA_D', 'Line1', true),
  (6, 'STA_D', 'STA_C', 'Line1', true),
  (7, 'STA_D', 'STA_E', 'Line1', true),
  (8, 'STA_E', 'STA_D', 'Line1', true),
  (9, 'STA_K', 'STA_B', 'Line2', true),
  (10, 'STA_B', 'STA_K', 'Line2', true),
  (11, 'STA_B', 'STA_L', 'Line2', true),
  (12, 'STA_L', 'STA_B', 'Line2', true),
  (13, 'STA_L', 'STA_M', 'Line2', true),
  (14, 'STA_M', 'STA_L', 'Line2', true),
  (15, 'STA_M', 'STA_N', 'Line2', true),
  (16, 'STA_N', 'STA_M', 'Line2', true);

Finally, here is the recursive CTE:

SET profiling = 1;
SET @from = 'STA_A';
SET @to = 'STA_M';
SET @via = 'STA_M';
SET @avoid = 'XXXXX';
WITH RECURSIVE cte AS (
    -- Anchor
    SELECT test_table.destination, CONCAT(test_table.source, ' => ', test_table.route, ' => ', test_table.destination) path, 1 length
    FROM test_table
    WHERE test_table.source = @from
UNION ALL

-- Recursive member
SELECT test_table.destination, CONCAT(cte.path, ' => ', test_table.route, ' => ', test_table.destination) path, cte.length + 1 length
FROM cte
INNER JOIN test_table
ON test_table.source = cte.destination
WHERE NOT FIND_IN_SET(test_table.destination, REPLACE(path, ' => ', ','))
AND open = TRUE

) SELECT * FROM cte WHERE cte.destination = @to AND FIND_IN_SET(@via, REPLACE(path, ' => ', ',')) AND NOT FIND_IN_SET(@avoid, REPLACE(path, ' => ', ',')) ORDER BY cte.length LIMIT 500; SHOW PROFILES;

This pulls successfully the route from Station A to Station M, listing the stations and line segments between the source and destination:

STA_A => Line1 => STA_B => Line2 => STA_L => Line2 => STA_M

What I need to do is remove one of the duplicate line segments (Line2) and the associated station, leaving:

STA_A => Line1 => STA_B => Line2 => STA_M

Is this possible within the CTE, possibly sequential or recursive string manipulation, or would I need to push all the results into a separate "de-duplicating" process? The real database might produce 100 different routes that would need to be de-duplicated in this way and there could be several duplicated line segments in sequence, such as:

STA_A => Line1 => STA_B => Line1 => STA_C => Line1 => STA_D => Line1 => STA_E => Line2 => STA_F => Line2 => STA_G => Line2 => STA_H => Line2 => STA_I => Line2 => STA_J

would de-duplicate to:

STA_A => Line1 => STA_E => Line2 => STA_J

Thank you for your time (and patience!).

nicodemus
  • 33
  • 3

1 Answers1

2

Instead of adding the destination of a step with that step, add it as the source of the next step. Then you can decide to only add it if the line is about to change. You'll then just need to add the final destination at the end as an extra concatenation. This removes the need to manipulate the string afterwards (it comes out right) other than adding the destination after the CTE “loop”, or producing an intermediate result (directly or via nested CTEs).

See https://dbfiddle.uk/f1ZzWy6E for a modified version of your example that outputs both paths, one with all calling/through points listed and one with only the source, line change, and destination points.

Code from the above link in case it becomes unavailable:

DROP TABLE IF EXISTS test_table;
CREATE TABLE test_table (
  `id` int(5) NOT NULL AUTO_INCREMENT,
  `source` varchar(20) NOT NULL,
  `destination` varchar(20) NOT NULL,
  `route` varchar (200) NOT NULL,
  `open` BOOLEAN DEFAULT true,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO test_table VALUES (1, 'STA_A', 'STA_B', 'Line1', true), (2, 'STA_B', 'STA_A', 'Line1', true), (3, 'STA_B', 'STA_C', 'Line1', true), (4, 'STA_C', 'STA_B', 'Line1', true), (5, 'STA_C', 'STA_D', 'Line1', true), (6, 'STA_D', 'STA_C', 'Line1', true), (7, 'STA_D', 'STA_E', 'Line1', true), (8, 'STA_E', 'STA_D', 'Line1', true), (9, 'STA_K', 'STA_B', 'Line2', true), (10, 'STA_B', 'STA_K', 'Line2', true), (11, 'STA_B', 'STA_L', 'Line2', true), (12, 'STA_L', 'STA_B', 'Line2', true), (13, 'STA_L', 'STA_M', 'Line2', true), (14, 'STA_M', 'STA_L', 'Line2', true), (15, 'STA_M', 'STA_N', 'Line2', true), (16, 'STA_N', 'STA_M', 'Line2', true);

/SET profiling = 1;/ SET @from = 'STA_A'; SET @to = 'STA_M'; SET @via = 'STA_M'; SET @avoid = 'XXXXX'; WITH RECURSIVE cte AS ( -- Anchor SELECT test_table.destination , CONCAT(test_table.source, ' => ', test_table.route, ' => ', test_table.destination) path , CONCAT(test_table.source, ' => ', test_table.route, ' => ') pathdedup , 1 length , test_table.destination ddedup , test_table.route , 1 lengthdedup FROM test_table WHERE test_table.source = @from

UNION ALL

-- Recursive member
SELECT test_table.destination
     , CONCAT(cte.path, ' => ', test_table.route, ' => ', test_table.destination) path
     , CASE WHEN test_table.route = cte.route 
            THEN cte.pathdedup
            ELSE CONCAT(cte.pathdedup, cte.ddedup, ' => ', test_table.route, ' => ') 
       END pathdedup
     , cte.length + 1 length
     , CASE WHEN test_table.route = cte.route 
            THEN cte.destination 
            ELSE test_table.destination 
       END ddedup
     , test_table.route
     , CASE WHEN test_table.route = cte.route
            THEN lengthdedup
            ELSE lengthdedup + 1
       END lengthdedup
  FROM cte
 INNER JOIN test_table ON test_table.source = cte.destination
 WHERE NOT FIND_IN_SET(test_table.destination, REPLACE(path, ' => ', ','))
   AND open = TRUE

) SELECT path, length, CONCAT(pathdedup, @to) pathdedup, lengthdedup FROM cte WHERE cte.destination = @to AND FIND_IN_SET(@via, REPLACE(path, ' => ', ',')) AND NOT FIND_IN_SET(@avoid, REPLACE(path, ' => ', ',')) ORDER BY cte.length LIMIT 500; /SHOW PROFILES;/

Notes:

  1. Your second example doesn't work for the sample data provided even in the original query, as some of the required graph points and connections are not there, but it should have the same effect even over such longer combinations.
  2. You may need to use the MAXRECURSION hint for large graphs though be careful to carefully test performance if this is likely to be needed. The default is 100, the maximum is 32,767. In fact be careful to test performance anyway, depending on the graph data the number of options could balloon in a way that consumes a lot of CPU & memory to process even before you get close to 100 steps.
David Spillett
  • 32,593
  • 3
  • 50
  • 92