fast bulk incrementing in MySQL

Question

I have one big table foobar describing a many-to-many-relation and containing millions of foo's, millions of bar's and every bar having several hundereds of foo's -> billions of rows.

CREATE TABLE `foobar` (
`foo_id` INT(10) UNSIGNED NOT NULL,
`bar_id` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`foo_id`, `bar_id`),
INDEX `bar_id_idx` (`bar_id`))

I have another table counting the foo_id's in foobar:

CREATE TABLE `foo_amount` (
`foo_id` INT(10) UNSIGNED NOT NULL,
`amount` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`foo_id`),
INDEX `amount_idx` (`amount`))

The counting could be done like this:

INSERT INTO foo_amount (SELECT foo_id, COUNT(*) AS amount FROM foobar GROUP BY foo_id);

But I would have to recompute the table with every inserted/deleted row in foobar.

An insert usually adds a new bar-object with several hundered foo's. For example inserting a bar with bar_id 42 having foo's with foo_id's 3, 8, 26, 44, .... would look like:

INSERT INTO foobar VALUES (3,42), (8,42), (26,42), (44,42), ...;

My second attempt was to update the foo_count table after every inserted bar object:

INSERT INTO foo_amount (SELECT foo_id, 1 FROM foobar WHERE bar_id = 42)
ON DUPLICATE KEY UPDATE amount = amount + 1;

But this is very slow. Do you have any ideas on how to optimize this? An option might be to accumulate new bar's in a temporary foo_count_tmp and merging it with foo_count every now and then. The foo_count table wouldn't be up-to-date all the time, but that's ok. But how would I trigger the updating then?

score 4 · Accepted Answer · edited Jun 15 '20 at 09:05

How about a GROUP BY count on foobar from scratch ???

First, insert any new data into foobar

Then, do a fresh GROUP BY count on foobar into the temp table:

CREATE TABLE foo_amount_new LIKE foo_amount;
INSERT INTO foo_amount_new
SELECT foo_id,COUNT(1)
FROM foobar WHERE bar_id = ... 
GROUP BY foo_id;

Finally, swap the temp table in and drop the old foo_amount

ALTER TABLE foo_amount RENAME foo_amount_zap;
ALTER TABLE foo_amount_new RENAME foo_amount;
DROP TABLE foo_amount_zap;

However, with a table in the billions, this is an uphill battle because you have an index to rebuild. Since the following happens on every INSERT ... ON DUPLICATE KEY:

the amount would have to incremented
the amount would have to shift with in the amount index

Try removing the amount index so as to speed up INSERTs and UPDATEs.

ALTERNATE SUGGESTION

Try out your temp table solution using another method

STEP 01) CREATE TABLE foobar_new LIKE foobar;

STEP 02) Do your bulk INSERTs into foobar_new

STEP 03) CREATE TABLE foo_amount_new LIKE foo_amount;

STEP 04) Perform GROUP BY count on the latest bulk INSERT batch

INSERT INTO foo_amount_new
SELECT foo_id,COUNT(1) FROM foobar_new WHERE bar_id = ... 
GROUP BY foo_id;

STEP 05) Perform a bulk INSERT into foobar from foobar_new

INSERT INTO foobar SELECT * FROM foobar_new;

STEP 06) Perform a bulk UPDATE of foo_amount from foo_amount_new

UPDATE foo_amount A INNER JOIN foo_amount_new B
USING (foo_id) SET A.amount = A.amount + B.amount;

STEP 07) Drop the temp tables

DROP TABLE foobar_new;
DROP TABLE foo_amount_new;

score 0 · Answer 2 · answered Aug 06 '12 at 17:49

Is the following query your typical one?

INSERT INTO foobar VALUES (3,42), (8,42), (26,42), (44,42), ...;

If so, and I'm assuming this is generated by code (not by hand), then I would suggest you could build the next query:

UPDATE foo_amount SET amount=amount+1 WHERE foo_in IN (3, 8, 26, 44, ...);

But some things are not quite clear to me:

Are the INSERTs guaranteed to work? I mean, can the INSERT INTO foobar VALUES (3,42), (8,42), (26,42), (44,42) contain a duplicate, thus failing the operation?
And if you're using some soert of IGNORE, then that complicates your understanding of whether you should increment the amount in foo_amount (applies to your solutions as well)
Last, what you're doing is essentially managing summary tables. I don't mean to say you shouldn't -- but are you sure you absolutely need them? can you perhaps just fetch the data when required? It may yet prove to be overall more efficient than managing all the writes. "efficient" is something of a blur here, of course, since you need to decide who gets the higher priority for optimization: reads or writes.

fast bulk incrementing in MySQL

2 Answers2

ALTERNATE SUGGESTION