13

I'm currently trying to run some queries against a data dump of Stack Overflow's comments. Here's what the schema looks like:

CREATE TABLE `socomments` (
  `Id` int(11) NOT NULL,
  `PostId` int(11) NOT NULL,
  `Score` int(11) DEFAULT NULL,
  `Text` varchar(600) NOT NULL,
  `CreationDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `UserId` int(11) NOT NULL,
  PRIMARY KEY (`Id`),
  KEY `idx_socomments_PostId` (`PostId`),
  KEY `CreationDate` (`CreationDate`),
  FULLTEXT KEY `Text` (`Text`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

I ran this query against the table, and it ran incredibly slow (It does have 29 million rows, but it has a Full-Text index):

SELECT *
FROM socomments
WHERE MATCH (Text) AGAINST ('"fixed the post"' IN BOOLEAN MODE)

So I profiled it, the results of which are:

|| Status                     || Duration ||
|| starting                   || 0.000058 ||
|| checking permissions       || 0.000006 ||
|| Opening tables             || 0.000014 ||
|| init                       || 0.000019 ||
|| System lock                || 0.000006 ||
|| optimizing                 || 0.000007 ||
|| statistics                 || 0.000013 ||
|| preparing                  || 0.000005 ||
|| FULLTEXT initialization    || 207.1112 ||
|| executing                  || 0.000009 ||
|| Sending data               || 0.000856 ||
|| end                        || 0.000004 ||
|| query end                  || 0.000004 ||
|| closing tables             || 0.000006 ||
|| freeing items              || 0.000059 ||
|| logging slow query         || 0.000037 ||
|| cleaning up                || 0.000046 ||

As you can see, it spends a long time in FULLTEXT initialization. Is this normal? If not, how would I fix it?

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
hichris123
  • 231
  • 1
  • 2
  • 7

4 Answers4

6

If you are using InnoDB FULLTEXT indexes, queries will often hang in the "FULLTEXT initialization" state if you are querying against a table that has a large number of deleted rows. In InnoDB's FULLTEXT implementation, deleted rows are not pruned until a subsequent OPTIMIZE operation is run against the affected table. See: https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html

To remove full-text index entries for deleted records, you must run OPTIMIZE TABLE on the indexed table with innodb_optimize_fulltext_only=ON to rebuild the full-text index.

One may also inspect the number of deleted but not purged records by querying information_schema.innodb_ft_deleted

To resolve this, one should regularly run OPTIMIZE TABLE against tables with InnoDB FULLTEXT indexes.

Tyler
  • 61
  • 1
  • 1
6

Others have found this a troublesome situation

Since the MySQL Documentation is very terse on this thread state

FULLTEXT initialization

The server is preparing to perform a natural-language full-text search.

your only recourse would be to make preparation with less data. How ?

SUGGESTION #1

Look at your query again. It is selecting all columns. I would refactor the query to collect only the id columns from socomments. Then, join those retrieved ids back to the socomments table.

SELECT B.* FROM
(SELECT id FROM socomments
WHERE MATCH (Text) AGAINST ('"fixed the post"' IN BOOLEAN MODE)) A
LEFT JOIN socomments B USING (id);

This might produce an uglier EXPLAIN plan but I think profiling will change for the better. The basic idea is : If you have an aggressive FULLTEXT Search, make it gather the least amount of data during that FULLTEXT initialization phase, thus reducing the time.

I have recommended this many times before

SUGGESTION #2

Please make sure you are setting the InnoDB-based FULLTEXT options, not the ones for MyISAM. The two options you should be concerned are

Think about it for a moment. The text field is VARCHAR(600). Say the average is 300 bytes. You have 29,000,000 million of them. That would be that would be a little of 8GB. Perhaps increasing innodb_ft_cache_size and innodb_ft_total_cache_size may also help.

Make sure you have enough RAM for larger InnoDB FULLTEXT buffers.

GIVE IT A TRY !!!

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
1

There is a confirmed bug in MySQL (Deleted DOCID are not maintained during OPTIMIZE of InnoDB FULLTEXT tables) that tanks performance under heavy delete loads (without rebuilding the table from scratch).

Related.

Riedsio
  • 1,377
  • 1
  • 14
  • 18
0

Full-text indexes in MySQL does not designed to support large amounts of data so search speed dropping down quite fast as your dataset is growing up. One of the solutions is to use external full-text search engines like Solr or Sphinx which has improved search functionality (relevance tuning and phrase search support, built-in facets, snippets, etc) extended query syntax and much faster speed on mid-to-large data sets.

Solr is based on Java platform so if you run Java-based application is will be natural choice for you, Sphinx is written on C++ and acting as a daemon in the same fashion as MySQL. As soon as you feed external engine with the data you want to search you can also move some queries out of MySQL. I can't tell you which engine is better in your case, I use mostly Sphinx and here is usage example: http://astellar.com/2011/12/replacing-mysql-full-text-search-with-sphinx/

vfedorkov
  • 178
  • 6