21

I'm writing an application that needs to flush out a large number of updates to the database for an extended period of time, and I've gotten stuck at how to optimize the query. Currently I'm using INSERT INTO ... VALUES (..), (..) ON DUPLICATE KEY UPDATE, which works to batch all of the values into one query, but executes excruciatingly slowly on large tables. I don't ever actually need to insert rows.

Other approaches I've seen are to update using SET value = CASE WHEN... (which would be hard to generate due to the way I'm building the queries, and I'm not sure about the performance of CASE for hundreds/thousands of keys), and simply multiple concatenated updates. Would either of these be faster than my current method?

It baffles me that, as far as I can tell, there's no idiomatic, efficient way to do this in MySQL. If there really isn't a way that's faster than ON DUPLICATE KEY, would it be worth it to switch to PostgreSQL and use its UPDATE FROM syntax?

Any other suggestions are also greatly appreciated!

Edit: here's one of the tables that gets updated frequently. I've removed column names due to them being irrelevant.

CREATE TABLE IF NOT EXISTS `table` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `a` bigint(20) unsigned NOT NULL DEFAULT '0',
  `b` bigint(20) unsigned NOT NULL DEFAULT '0',
  `c` enum('0','1','2') NOT NULL DEFAULT '0',
  `d` char(32) NOT NULL,
  -- trimmed --
  PRIMARY KEY (`id`),
  KEY `a` (`a`),
  KEY `b` (`b`),
  KEY `c` (`c`),
  KEY `d` (`d`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;
jli
  • 313
  • 1
  • 2
  • 7

3 Answers3

24

Since you're using InnoDB tables, the most obvious optimization would be to group multiple UPDATEs into a transaction.

With InnoDB, being a transactional engine, you pay not just for the UPDATE itself, but also for all the transactional overhead: managing the transaction buffer, transaction log, flushing the log to disk.

If you are logically comfortable with the idea, try and group 100-1000 UPDATEs at a time, each time wrapped like this:

START TRANSACTION;
UPDATE ...
UPDATE ...
UPDATE ...
UPDATE ...
COMMIT;

Possible downsides:

  • One error will collapse the entire transaction (but would be easily fixed in code)
  • You might wait for a long time to accumulate your 1000 UPDATEs, so you might also want to have some timeout
  • More complexity on your application code.
Shlomi Noach
  • 7,403
  • 1
  • 25
  • 24
1

Usually, when I've had to do this, I've created a temporary table, inserted my changes into that, and then done an update with the join as shown below. Would love to have a DBA's thoughts on whether this is as efficient as the recommended "why don't we do it in a trans?" approach.

create table foo (
id int primary  key auto_increment,
category varchar(10),
animal varchar(20));

create table foo_tmp ( id int, category varchar(10), animal varchar(20));

insert into foo (id, category, animal) values (1, 'insect', 'spider'), (2, 'reptile', 'snake'), (3, 'reptile', 'gecko'), (4, 'amphibian', 'frog'), (5, 'primate', 'gorilla'), (6, 'primate', 'baboon'), (7, 'insect', 'fruit fly'); -- changes to make animal a bit more specific... insert into foo_tmp (id, category, animal) values (3, 'reptile', 'house gecko'), (4, 'amphibian', 'tree frog'), (5, 'primate', 'mountain gorilla');

-- and the update from the join ... update foo f join foo_tmp t on f.id=t.id set f.category = t.category,f.animal=t.animal;

Steve L
  • 111
  • 2
1

Besides doing all the work in a small number of transaction, sort the input data by the index fields that are supporting your WHERE clauses.

This increases buffer hit rate, thus speeding up the process.

RonJohn
  • 694
  • 2
  • 12
  • 31