3

I have a requirement where I have to insert a lots of rows.For inserting we use like this

insert into emp(name,age) values('abc',12);

This will insert only one row.For inserting multiple rows we can either write insert query multiple times or write a single query with multiple values.For example

condtion1

insert into emp(name,age) values('abc1',121);
insert into emp(name,age) values('abc2',122);
insert into emp(name,age) values('abc3',12);

condition2

insert into emp(name,age) values('abc',12),('abc2',122),('abc3',12);

My question is are both (condition1 and condition2) above takes time?I guess condtion2 takes less time than condition2. If my guess is true then what is the reason for it?

Aaron Bertrand
  • 181,950
  • 28
  • 405
  • 624
rocking
  • 131
  • 1
  • 3

3 Answers3

6

In SQL Server at least (can't speak for the other RDBMS you've mentioned, sorry), a single statement will scale better than multiple, to a point. You can test this yourself using your exact statements and data, of course; nobody here can test that for you with your specifics, and your specifics may tilt things one way or another. "Which is faster, x or y?" questions are generally discouraged here because you can test them in your own environment much faster than any of us could throw guesses and logic at you. Especially when you're trying to get an answer for all of the database platforms you've listed - nobody is an expert in all of those and any such answer would either be very biased (like this one) or far too blanket to be useful.

In general, though, the overhead of preparing individual statements (and possibly sending them separately depending on your code and the behavior of your provider) should add up, much like @mustaccio said:

If you want to send three letters, would you go to the post office once carrying three letters, or three times carrying one letter?

This would especially be true in your scenario if each small statement is broken up across individual packets and even different connections. Again, I have no idea how your provider works or how your code sends these statements - if it's a single batch of a variable number of statements, that can actually be harder for SQL Server to optimize than individual single-statement batches, since SQL Server optimizes at the batch level.

Note that the VALUES() clause has an arbitrary limit of 1,000 values, so you may need to create multiple statements depending on how many values you have. The reason is concern over compile time, as Paul White explains here. Also note: Oracle has the same restriction.

According to Martin Smith's testing, compile time at least is minimal and relatively invariable up to a little over 250 values. See these charts (for details please see his answer):

Compile time 1

Compile time 2

Compile time 3

If you go with a VALUES() clause be aware that each variation (meaning number of actual value sets) will generate its own plan, regardless of whether you use properly parameterized statements or just inline constants, and this is true even when the database's parameterization setting is set to simple. So you may want to consider using the optimize for ad hoc workloads server setting (lots of info here and here to prevent single-use variations from filling the plan cache (generally a good idea for most systems anyway, unless you are CPU-bound and compilation costs are proven to be excessive).

A better answer to this problem is to use table-valued parameters (TVPs), which allow you to send sets of structured data through a single parameter, giving you an efficient way to pass data and a single plan that can be reused regardless of the number of values passed. The catch here is I'm not sure Java understands what those are (in C# you can send, say, a DataTable as a Structured parameter).

Aaron Bertrand
  • 181,950
  • 28
  • 405
  • 624
5

For SQL Server: Neither.

To insert a lot of rows you should use bulk insert APIs, and thrive to achieve minimally logged insert.

Bulk inserts can be achieved using IRowsetFastLoad (OleDB), using the Bulk Copy Functions (ODBC) or using SqlBulkCopy (.Net). All these APIs have in common that they establish a fast insert pipe with the server, and then they start pushing rows. These are not T-SQL statements, but implementations of the TDS Bulk Load message. At a higher abstraction layer you can use bcp.exe, BULK INSERT statement or an SSIS OleDB Destination with Fast Load option enabled.

A second, optional, improvement is to achieve minimal logging. See Operations That Can Be Minimally Logged (bulk insert can, ordinary INSERT cannot). Read Prerequisites for Minimal Logging in Bulk Import for details.

Finally I urge you to spend some time and read SQL Server Data Loading Performance Guide.

Remus Rusanu
  • 52,054
  • 4
  • 96
  • 172
0

Some thoughts, using a prepared statement will probably be a good idea since the statement wont have to recompiled by the DBMS for every new row. Something like:

insert into emp(name,age) values(?,?);

Sending several insert at once will minimize the network latency, in some languages there is a batch update operation which "mimics" this. If your host language does not support this, a possible strategy is to create say 3 prepared statements:

ps1 = "insert into emp(name,age) values(?,?)";
ps2 = "insert into emp(name,age) values(?,?),...,(?,?); -- holds 10 rows
ps3 = "insert into emp(name,age) values(?,?),...,...,,(?,?); -- holds 100 rows

As long as there are more than 100 rows to insert, execute ps3. Then execute ps2 as long as there are more than 10 rows left, finally execute ps1 for the remains. For 543 rows to insert you will execute ps3 5 times, ps2 4 times and ps1 3 times. In total you hand over 12 statements for the DBMS to execute

Lennart - Slava Ukraini
  • 23,842
  • 3
  • 34
  • 72