What methods exist for partitioning a multi-terabyte table on Standard Edition without hours of down time?

Question

Much of what I find online about partitioning large tables is very old. Databases are larger now and the old methods rarely account for the constraints of Standard Edition (partitioning was Enterprise only until 2016). This question exists to find modern methods of partitioning large tables on Standard Edition.

I have the following constraints. What methods exist for satisfying them on Standard Edition?

2 TB table. To keep things simple, assume that it has one ascending primary key and no other indexes/constraints/triggers.
Must be partitioned
The end of the table is hot. The rest is not.
Standard Edition, so RAM and online operations are limited
SQL Server 2022
No data loss allowed
Minimal down time allowed. 20 minutes at most. This likely rules out partitioning the table in place.
Optimistic locking of any kind is not enabled
This process can be done slowly, e.g. over weeks.

score 4 · Accepted Answer · edited Jan 02 '25 at 15:51

You can partition the table using only metadata operations by creating a new partitioned table that maps all the existing rows to a single partition, and SWITCHing the table into the partition on the new table. Then you can rename/drop and you've got a partitioned table without having to rewrite any of the rows, e.g.

use master 
go
create database part_test 
go
use part_test 
go
drop table if exists Foo
drop table if exists Foo_New
go
create table Foo(id int identity primary key, a int, b int)
insert into Foo(a,b) 
select value%5, value%17 
from generate_series(1,1000*1000)
go
drop partition scheme ps_foo
drop partition function pf_Foo
go
create partition function pf_Foo(int)
as range right for values (0)
CREATE PARTITION SCHEME ps_Foo
AS PARTITION pf_Foo
ALL TO ([Primary]);
go
create table Foo_new(id int identity primary key, a int, b int)
on ps_Foo(id)
go
begin transaction
alter table Foo add constraint ck_id_gte_0 check (id &gt;= 0)
alter table Foo switch to Foo_new partition 2

exec sp_rename 'Foo','Foo_old', 'object'
exec sp_rename 'Foo_new','Foo', 'object'

exec('drop table Foo_old')
dbcc checkident('Foo')

declare @split_point int = ident_current('foo') + 1

alter partition scheme ps_foo  next used [Primary]
alter partition function pf_Foo() split range (@split_point)

commit

All new rows will go into a new partition at the end. You can then add more partitions on the hot end over time by inserting a split point after the last value. And you can go back and split the large partition as time allows. Note since this is RANGE RIGHT you'll want to split from "right" to "left" as the split will move all the rows with values equal or greater than the split point, e.g.

alter partition scheme ps_foo  next used [Primary]
alter partition function pf_Foo() split range (900000)
alter partition scheme ps_foo  next used [Primary]
alter partition function pf_Foo() split range (800000)
alter partition scheme ps_foo  next used [Primary]
alter partition function pf_Foo() split range (700000)

score 1 · Answer 2 · answered Dec 29 '24 at 21:43

Assuming that you are proposing horizontal partitioning, many tables, each with the same columns and some of the data. Given the restrictions listed, I suggest creating the new partitioned table with a different name, then, over a period of time, gradually copy the data from the current table into the new table. Set up a job to move small batches of data in the background. When done, use your 20 minutes to rename the old table to something else, rename the new table to the active table name and double check that the last work is up to date. Remember to check the permissions are correctly set on the new table, after renaming. After a backup and a day or two working without issues, you can drop the old table.

Rodger Kong · Answer 3 · 2024-12-31T00:54:56.947

SQL Server 2022 standard version is powerful enough to support such scale of db, unless the table is wide. What should be concerned is IO performence, before impliment partitioned table, you'd better make a better storage plan, such as deploying files into mutiple disks or using RAID as a simple way to use mutiple disks, or use SSD to store hot data.
Since converting a large table to a partitioned table might use a lot of memory resource to sort and concurrently operate on the data, tempdb will be used in this process, optimaizing tempdb is also required.
The best way to determine if a method satisfies your constraints is to build a test environment.
If convert to a partitioned table directly is not suitable for the constraints after testing, I think maybe you try to use Partitioned View as a transition process. The precondition of using this method is that the "cold" data should not be updated or deleted, and there is clear boundary that can identify cold data, and you have ability to modify a few code of applications that depend on this table.
- a. Create a table to contain cold data, leave hot data in original table. Create a Partitioned View to return data from both table.
- b. Modify code to use Partitioned View to query data, and insert/update/delete directly in the original table. Because there are many limites on updating Partitioned View, so you have to do this, and it's temporery, should backup origin code.
- c. Then move data from original table to cold table by a slow batch loop untill all cold data has been moved. This step make sure all data is online and prepare partition data by use less resources.
- d. After this, convert original table to partitioned table, This step is most expensive, but its scale will be much smaller now, so this step should be faster and cost less resources.
- e. Then move data from cold table to the partitioned table with a slow batch loop untill all cold data has been moved back.
- f. At last, restore code to normal and remove Partitioned View.

Update: Find a way which need not modify the code of applications - using trigger view.

In step b., you can create a INSTEAD OF TRIGGER on the Partitioned View to insert/update/delete the original table. So that need not modify code of applications which depends on the table. What you should do is 1. Offline the server; 2. modify the name of original table; 3. Name the Partitioned View to the original name of the table; 4. Create the INSTEAD OF TRIGGER on the view. 5. Online the server.
In step f., Need not restore code. Just Offline the server, remove the view, rename the partitioned table to origin name, online the server, it is done.

score 0 · Answer 4 · answered Jan 02 '25 at 13:07

Regarding your concern about the log generated during methods of executing this maintenance, like the one proposed by @teaboy's answer, you could reduce the impact by using the bulk-logged recovery model:

Under the full recovery model, all bulk operations are fully logged. However, you can minimize logging for a set of bulk operations by switching the database to the bulk-logged recovery model temporarily for bulk operations. Minimal logging is more efficient than full logging, and it reduces the possibility of a large-scale bulk operation filling the available transaction log space during a bulk transaction. However, if the database is damaged or lost when minimal logging is in effect, you can't recover the database to the point of failure.

What methods exist for partitioning a multi-terabyte table on Standard Edition without hours of down time?

4 Answers4

Linked