1

I have a table of yearly data [size 500GB = ~1 billion rows] in aurora mysql 3.02.2. requirement is to migrate this table to another aurora mysql [PROD] using DMS. we initiated the DMS task with data validation enabled and also with indexes & PK in target, which caused slowness to other production processes due to load on the DB. so i did some research and looking to get some suggestions here please:

*Note: this new table in prod will NOT be used by the application until this work is completed and renamed to current table name *

  1. Load data first using DMS and create index manually later on a large table like this is the correct approach? OR
  2. should i use DMS task filtering on "datetime" bigint column to load month by month data to the new table where the index build happens when the data is written over to the table. This can be done over the course of few days for each month (assuming we run the DMS task for few hours each day) OR any better method ?
  3. does index creation generate lot of temp files on such a large table, which will lead to any memory issues OR should i use something like "ALGORITHM=INPLACE" in create index statement?
  4. current parameter settings which i see related are "innodb_ddl_buffer_size=1MB" & "innodb_file_per_table=ON". any other parameters i should check?

the target production db have 24x7 data written into it and i am looking for a way to avoid any slowness or performance issues when this table with index is being copied over.

JollyRoger
  • 11
  • 1

1 Answers1

0

If you are referring to innodb_buffer_pool_size? If so, 1M is terribly small, and 70% of RAM is a better value (unless RAM is really small).

Assume that the default algorithm for ALTER will be the fastest.

It is likely to be faster to build secondary indexes after loading the data. Having the PRIMARY KEY already in place and the data sorted by the PK is likely to be the best way to deal with the PK.

Index creation for a large table uses lots of disk space; memory usage is relatively stable.

Are you loading that billion-row table repeatedly? Maybe we should look at ways to avoid reloading.

Loading a billion rows is likely to slow down other activity on the server.

"month by month" -- Is this some form of "time-series" data? If so consider using Summary Tables . Also, if you need to delete "old" data, consider Partitioning

"Copy by month" can be very efficient if the source and destination tables are already partitioned; see "transportable tablespaces" in the above link. This is effectively a disk copy, not reading/writing/reindexing of rows, so very fast.

Caveat: The above comments are valid for MySQL/MariaDB; there may be differences with Aurora that I am unaware of.

Rick James
  • 80,479
  • 5
  • 52
  • 119