3

I am trying to load data into mysql database form CSV file. I found that we could use LOAD DATA INFILE command to do it. But as per the mysql documentation it is not replication safe. (See here)

Is there a better way to do it rather than to do it via application?

bond
  • 133
  • 1
  • 4

2 Answers2

5

Another answer has already been accepted, but I disagree with that answer, because your question first requires an understanding of the meaning of the term "unsafe" and the phrase "considered unsafe" when it is used related to MySQL replication.

When speaking of the “safeness” of a statement in MySQL Replication, we are referring to whether a statement and its effects can be replicated correctly using statement-based format. http://dev.mysql.com/doc/refman/5.5/en/replication-rbr-safe-unsafe.html (emphasis added)

There is no such thing as an "unsafe" statement when you are using MIXED or ROW-based replication, because the entire premise of "unsafe" is related to how statement-based logging and replication work.

Something "considered unsafe" doesn't mean it shouldn't be done -- it means the optimizer will consider it unsafe to log in statement-based mode and therefore that it will be automatically replicated using the row-based replication logging format if row-based logging is available -- that is, if you are not using binlog_format = STATEMENT -- which you should not be doing unless you have a specific reason to use it. If you are not aware of a specific reason to use it, then you most likely do not have one, and you should used MIXED logging, so that the optimizer can choose how to replicate each statement.

http://dev.mysql.com/doc/refman/5.5/en/replication-formats.html

Michael - sqlbot
  • 22,715
  • 2
  • 49
  • 76
1

You are absolutely right. it is not replication-safe. In fact, I wrote about how LOAD DATA INFILE replicates back on Jan 22, 2012 : MySql shell command not replicated to slave. Basically, the entire data file is stored in the binary logs, replicated to the Slave, manifested as a text file in /tmp, and the LOAD DATA INFILE is executed.

What could help is setting sync-binlog=1 before running LOAD DATA INFILE. Notwithstanding, Replication of a large CSV file is at the mercy of replication and the network.

SUGGESTION

For a Text File called mydata.csv, you are better off doing the following:

Step 01 : cp mydata.csv to mydata.csv2

Step 02 : Make the script LoadMaster.sql

SET SQL_LOG_BIN=0;
SET bulk_insert_buffer_size = 1024 * 1024 * 256;
LOAD DATA INFILE 'mydata.csv' INTO tb1 ... ;

Step 03 : Make the script LoadSlave.sql

SET SQL_LOG_BIN=0;
SET bulk_insert_buffer_size = 1024 * 1024 * 256;
LOAD DATA INFILE 'mydata.csv2' INTO tb1 ... ;

Step 04: Load both in parallel

mysql -hIPMaster -u... -p... < LoadMaster.sql &
mysql -hIPSlave  -u... -p... < LoadSlave.sql &
wait

If you have multiple slaves, you could do this:

mysql -hIPMaster -u... -p... < LoadMaster.sql &
mysql -hIPSlave1  -u... -p... < LoadSlave.sql &
mysql -hIPSlave2  -u... -p... < LoadSlave.sql &
mysql -hIPSlave3  -u... -p... < LoadSlave.sql &
....
wait

That way

  • both imports are done together
  • binlogs are not bloated
  • CSV file can always be deleted
RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536