Walking a BTREE index as far as possible in MySQL

Question

Suppose one has a column of words on which one builds a BTREE index:

CREATE TABLE myTable (
  words VARCHAR(25),
  INDEX USING BTREE (words)
);

LOAD DATA LOCAL INFILE '/usr/share/dict/words' INTO TABLE myTable (words);

And now one wants to find the records which share the longest common prefix with some search query, e.g. 'foobar'. I thought to do the following:

SELECT DISTINCT words
FROM   myTable
WHERE  words LIKE CASE
  WHEN NOT EXISTS (SELECT * FROM myTable WHERE words LIKE 'f%') THEN '%'
  WHEN NOT EXISTS (SELECT * FROM myTable WHERE words LIKE 'fo%') THEN 'f%'
  WHEN NOT EXISTS (SELECT * FROM myTable WHERE words LIKE 'foo%') THEN 'fo%'
  WHEN NOT EXISTS (SELECT * FROM myTable WHERE words LIKE 'foob%') THEN 'foo%'
  WHEN NOT EXISTS (SELECT * FROM myTable WHERE words LIKE 'fooba%') THEN 'foob%'
  WHEN NOT EXISTS (SELECT * FROM myTable WHERE words LIKE 'foobar%') THEN 'fooba%'
  ELSE 'foobar%'
END

Which is fine: it's very readable and performant; and it can easily be generated in application code.

However, this search should be even simpler to resolve: just walk through the index tree according to the search term until a branch does not exist, then return all the results that branch from the current node.

Granted that walking a path through the index only once instead of multiple times is probably a needless micro-optimisation, but it feels as though it should be possible: is it?

score 1 · Answer 1 · 2013-01-23T18:55:20.220

If you really want to walk just the b-tree index, using the innodb_ruby project can help http://blog.jcole.us/2013/01/14/efficiently-traversing-innodb-btrees-with-the-page-directory/

I think your logic is reversed in your query. It'll start with the shortest word.

How I would handle it is this

DROP PROCEDURE IF EXISTS `find_longest_prefix`;

DELIMITER $$

CREATE PROCEDURE `find_longest_prefix`(IN `word` varchar(255), OUT `word_prefix` varchar(255))
BEGIN
    SET max_sp_recursion_depth = 255;
    SET @nextWord = LEFT(`word`, LENGTH(`word`)-1);

    SELECT COUNT(DISTINCT `words`) FROM `myTABLE` WHERE `words` LIKE CONCAT(`word`, '%') INTO @word_count;

    IF (@word_count > 0)
    THEN
        SET `word_prefix` = `word`;
    ELSE
        IF (LENGTH(@nextWord) > 0)
        THEN
            Call `find_longest_prefix`(@nextWord, `word_prefix`);
        ELSE
            SET `word_prefix` = '';
        END IF;
    END IF;
END$$

DELIMITER ;

This uses the fact that finding a miss on a btree is fast so we just loop around calling recursively until we have a hit.

Examples

No results

mysql> CALL find_longest_prefix(';autobon', @prefix);
Query OK, 1 row affected (0.01 sec)

mysql> SELECT @prefix;
+---------+
| @prefix |
+---------+
|         |
+---------+
1 row in set (0.00 sec)

A few results

mysql> CALL find_longest_prefix('autobon', @prefix);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT @prefix;
+---------+
| @prefix |
+---------+
| auto    |
+---------+
1 row in set (0.00 sec)

And as you can see, the data is correct:

mysql> SELECT * FROM myTable WHERE words LIKE 'auto%' OR words LIKE ';auto%';
+----+-----------------+
| id | words           |
+----+-----------------+
| 19 | AUTOCOMMIT      |
| 20 | AUTOEXTEND_SIZE |
+----+-----------------+
2 rows in set (0.00 sec)

This method should be much faster then actually checking every step before the longest and getting a lot more returned data.

It should be easy for you to have the stored proc SELECT the rows once it finds the correct prefix and return them if you'd prefer.

Walking a BTREE index as far as possible in MySQL

1 Answers1