3

I have a program which is time based. The users have all day to enter data that will affect their score, however if you haven't entered anything by midnight then your score is negatively affected. I had thought about setting up a chron job to handle this at midnight, but am then faced with a batch process that may be updating hundreds of thousands of records all at once.

I've decided one way to reduce this would be to divide the batch into 1000 records and then release the event loop, stack on another 1000. Is this a good way to go about something so massive? Would creating a second server instance and offloading the work to that server be a better idea?

Michael Green
  • 25,255
  • 13
  • 54
  • 100
KAT
  • 133
  • 5

1 Answers1

0

Please provide proof that 500k rows is a lot for your database.

Hint In most cases, it isn't.

Although this answer could befit a Comment, this is a valid Answer.

Databases deal with MILLIONS of rows at a time without a hiccup. 500k is nothing.

Things you need to be aware of when testing the single DML approach

  • Undo tablespace (or MySQL's equivalent)

  • possibly temp tablespace (for indexes)

  • locks on a heavily modified table.

If you have proof that one (or more) of those is causing issues, then you should consider Chunking your UPDATE statement.

You haven't given any indication that any of those is a problem. So, the answer is: Neither. Use a single DML.

As far as "process on separate server"...I don't see the need. I foresee too many headaches with synchronizing the processed rows with any modified data that occured during the Processing.

Michael Kutz
  • 4,919
  • 1
  • 10
  • 14