To summarize the specifics: We need to stage approximately 5 million rows into a vendor (Oracle) database. Everything goes great for batches of 500k rows using OracleBulkCopy (ODP.NET), but when we try to scale up to 5M, the performance starts slowing to a crawl once it hits the 1M mark, gets progressively slower as more rows are loaded, and eventually times out after 3 hours or so.
I suspect it's related to a primary key on the table, but I've been trawling the Oracle forums and Stack Overflow for information and a lot of what I'm reading contradicts that (also, a lot of posts seem to contradict each other). I'm hoping that somebody can set the record straight on some closely-related questions about the process:
Does the
OracleBulkCopyclass use conventional or direct-path loading? Is there some way I can confirm this, one way or another?Assuming it does use direct-path loading: Is it true that Oracle automatically sets all indexes to unusable during the load and puts them back online afterward? I've read several statements to this effect but again, cannot confirm it.
If #2 is true, then should it make any difference what indexes are on the table before I initiate a bulk copy operation? If so, why?
Related to #3, is there any practical difference, in general, between bulk loading with an unusable index vs. actually dropping the index before the load and recreating it afterward?
If #2 is not correct, or if there are some caveats I'm not understanding, then would it make any difference to explicitly make the index unusable before the bulk load, and then explicitly rebuild it afterward?
Is there anything else, other than index builds, that could cause a bulk copy operation to grow progressively slower as more and more records are added? (Maybe something to do with logging, although I would expect that bulk operations are not logged?)
If there really is no other way to get the performance up to snuff aside from dropping the PK/index first, what steps can I take to make sure that the index doesn't completely disappear, i.e. if the connection to the database is lost in the middle of the process?