I've thought of an approach for a process that builds a fairly large table (2million rows), now I'm canvassing opinion for it. What's going to bite me with this renaming approach to a making a table 'live'?
Background: The table (data.CatExt) is an overlayed version of our product catalogue. It builds daily in a complex procedure (getting data from lots of different areas and deciding which source trumps when there are conflicts - the complexity is in the business logic, not the technological side). All the work and any tables required to get the final output happen in a 'build' schema, and the final table lives in the 'data' schema. The schemas have different permissions.
The table is pretty wide, about 100 fields. We've taken the approach of building it from scratch each day, rather than using a 'change/additions/deletions' approach, because of the nature of the data sources (horrible to work out what is going to require a change to the final table.) So we build a new version of the table each day, and then (after it passes some automated checks) allow that to become the new live version.
New approach: So here's where the new approach comes in. What's the best way to replace the live version? The simplest option is to truncate and replace with the new data, but that seems like a lot of writing of data unnecessarily. We could do an update over each field, but with 100 fields that's a lot of nasty coding/comparison to find any differences. So I thought, why not rename the table? It would go like this:
- Live version data.CatExt exists and is useable through steps 2 and 3.
- Build new version of the table build.CatExt.
- Checks, indexing etc over build.CatExt all complete - ie, it's perfect.
- Drop (well, I'm too chicken - probably rename) data.CatExt, rename build.CatExt to data.CatExt.
Voila, new version, no unnecessary moving of data and associated long time while the live version is being locked.
Please point out the flaws. (Be gentle, I'm new! :) )