0

I have a longish table ( 60M rows ) where there are only 7 columns. One is a unique ID, two are datetimes, and two are notes and descriptions.

There the notes and descriptions are very regular, except for a tag at the end of the text. So, they're technically unique. I can't take that unique tag out as the tags are signatures and these notes and descriptions are legal documents.

If it wasn't for those tags, they'd be 95% from stock descriptions - maybe 15 variations.

These descriptions are up to 8K chars long. I long for some reasonable compression, and am considering a clustered columnstore index to implement that compression, but I'm unclear as to whether the compression will even occur w/ these columns being tagged into uniqueness.

These descriptive columns comprise more than 90% of the row data.

So...rowstore indexing is appropriate for the 'key' column...but I'm wondering if I should define this after the clustered columnstore index.

Current:

create table dbo.SPECIMEN
(
  ID                        int not null,
  Specimen_Types            varchar(255) null,
  Collected_Date            datetime null,
  Received_Date             datetime null,
  Results                   varchar(4000) null,
  Notes                     varchar(max) null,
  Lab_Report_ID             int not null
)
go
create clustered index [SPECIMEN.ID.Lab_Report_ID.Fake.PrimaryKey]
on dbo.SPECIMEN(ID,Lab_Report_ID);
go
create index [SPECIMEN.Lab_Report_ID.Index]
on dbo.SPECIMEN(Lab_Report_ID);

...and I if I get good compression, I'd change the indexes this way:

go
create clustered columnstore index [SPECIMEN.CCI]
on dbo.SPECIMEN;
go
create index [SPECIMEN.ID.Lab_Report_ID.Fake.PrimaryKey]
on dbo.SPECIMEN(ID,Lab_Report_ID);
go
create index [SPECIMEN.Lab_Report_ID.Index]
on dbo.SPECIMEN(Lab_Report_ID);

Does this make sense? I have very little experience with columnstore indexing and don't want to step on my own foot.

BTW - fake primary key. It is supposed to be unique, but the app that populates the source data occasionally throws in a duplicate. This table is supposed to be an extract from that semi-stable source.

Clay
  • 101
  • 6

0 Answers0