0

We have made good progress following our previous question about SQL Server data compression. For one of the tables, we compressed it from 20 GB to 2 GB, so this proves that the best compression ratio can get up to 10 times.

Another table's size, measured as the table's reserved space, reduced from 72.55 GB to 67.06 GB. So, its compression benefit is not enough if compared with the best case above.

The same as the previous question, we are working on a data warehouse for a Magento-v2 application, and the table name is msab_magento.sales_order_item.

  • This table also contains a large number of null values in various columns.
  • However, this table contains a product_option column in type nvarchar(max) corresponding to the same column in the MySQL source in type text. And this column stores JSON format text data. Per row, the average string length on this column is 4923, or 9 KB as each character takes two bypes in nvarchar.

Regarding the JSON column, our tentative thoughts are:

  • Since SQL Server page-level compression works on the individual pages of size 8 KB each. So, the page size is smaller than the column's size. So, this may affect the compression.
  • The JSON documents come from the templates in product configuration, filled with specific information per transaction. If the consecutive transactions are not for the same product, their JSON fields will be generated from different templates, and have different keys. This makes it difficult to apply the Dictionary compression algorithm.
  • Nevertheless, in the global data set, each product will have large number orders, so the JSON keys are still repeating a lot, just maybe across longer range than one page (8 KB)'s size.

Our Question:

  • We want to make the data compression even better and need directions. Please also remind us if the bottleneck is on something else other than the JSON column.

  • We think, the sales_order_item table corresponds to this model in the magento2 repository on GitHub, right? We also need pointers on how to navigate the Magento source code, if possible.

Please also refer to the DDL of the table in our data warehouse in the below details and let me know of any questions.

We highly appreciate any hints and suggestions.

Details:

/****** Object:  Table [msab_magento].[sales_order_item]    Script Date: 11/8/2023 5:08:01 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON GO

CREATE TABLE [msab_magento].[sales_order_item]( [item_id] [bigint] IDENTITY(3233417,1) NOT NULL, [order_id] [bigint] NOT NULL, [parent_item_id] [bigint] NULL, [quote_item_id] [bigint] NULL, [store_id] [int] NULL, [created_at] [datetime] NOT NULL, [updated_at] [datetime] NOT NULL, [product_id] [bigint] NULL, [product_type] nvarchar NULL, [product_options] nvarchar NULL, [weight] [decimal](12, 4) NULL, [is_virtual] [int] NULL, [sku] nvarchar NULL, [name] nvarchar NULL, [description] nvarchar NULL, [applied_rule_ids] nvarchar NULL, [additional_data] nvarchar NULL, [is_qty_decimal] [int] NULL, [no_discount] [int] NOT NULL, [qty_backordered] [decimal](12, 4) NULL, [qty_canceled] [decimal](12, 4) NULL, [qty_invoiced] [decimal](12, 4) NULL, [qty_ordered] [decimal](12, 4) NULL, [qty_refunded] [decimal](12, 4) NULL, [qty_shipped] [decimal](12, 4) NULL, [base_cost] [decimal](12, 4) NULL, [price] [decimal](12, 4) NOT NULL, [base_price] [decimal](12, 4) NOT NULL, [original_price] [decimal](12, 4) NULL, [base_original_price] [decimal](12, 4) NULL, [tax_percent] [decimal](12, 4) NULL, [tax_amount] [decimal](20, 4) NULL, [base_tax_amount] [decimal](20, 4) NULL, [tax_invoiced] [decimal](20, 4) NULL, [base_tax_invoiced] [decimal](20, 4) NULL, [discount_percent] [decimal](12, 4) NULL, [discount_amount] [decimal](20, 4) NULL, [base_discount_amount] [decimal](20, 4) NULL, [discount_invoiced] [decimal](20, 4) NULL, [base_discount_invoiced] [decimal](20, 4) NULL, [amount_refunded] [decimal](20, 4) NULL, [base_amount_refunded] [decimal](20, 4) NULL, [row_total] [decimal](20, 4) NOT NULL, [base_row_total] [decimal](20, 4) NOT NULL, [row_invoiced] [decimal](20, 4) NOT NULL, [base_row_invoiced] [decimal](20, 4) NOT NULL, [row_weight] [decimal](12, 4) NULL, [base_tax_before_discount] [decimal](20, 4) NULL, [tax_before_discount] [decimal](20, 4) NULL, [ext_order_item_id] nvarchar NULL, [locked_do_invoice] [int] NULL, [locked_do_ship] [int] NULL, [price_incl_tax] [decimal](20, 4) NULL, [base_price_incl_tax] [decimal](20, 4) NULL, [row_total_incl_tax] [decimal](20, 4) NULL, [base_row_total_incl_tax] [decimal](20, 4) NULL, [discount_tax_compensation_amount] [decimal](20, 4) NULL, [base_discount_tax_compensation_amount] [decimal](20, 4) NULL, [discount_tax_compensation_invoiced] [decimal](20, 4) NULL, [base_discount_tax_compensation_invoiced] [decimal](20, 4) NULL, [discount_tax_compensation_refunded] [decimal](20, 4) NULL, [base_discount_tax_compensation_refunded] [decimal](20, 4) NULL, [tax_canceled] [decimal](12, 4) NULL, [discount_tax_compensation_canceled] [decimal](20, 4) NULL, [tax_refunded] [decimal](20, 4) NULL, [base_tax_refunded] [decimal](20, 4) NULL, [discount_refunded] [decimal](20, 4) NULL, [base_discount_refunded] [decimal](20, 4) NULL, [free_shipping] [int] NOT NULL, [qty_returned] [decimal](12, 4) NOT NULL, [gift_message_id] [int] NULL, [gift_message_available] [int] NULL, [weee_tax_applied] nvarchar NULL, [weee_tax_applied_amount] [decimal](12, 4) NULL, [weee_tax_applied_row_amount] [decimal](12, 4) NULL, [weee_tax_disposition] [decimal](12, 4) NULL, [weee_tax_row_disposition] [decimal](12, 4) NULL, [base_weee_tax_applied_amount] [decimal](12, 4) NULL, [base_weee_tax_applied_row_amnt] [decimal](12, 4) NULL, [base_weee_tax_disposition] [decimal](12, 4) NULL, [base_weee_tax_row_disposition] [decimal](12, 4) NULL, [gw_id] [int] NULL, [gw_base_price] [decimal](12, 4) NULL, [gw_price] [decimal](12, 4) NULL, [gw_base_tax_amount] [decimal](12, 4) NULL, [gw_tax_amount] [decimal](12, 4) NULL, [gw_base_price_invoiced] [decimal](12, 4) NULL, [gw_price_invoiced] [decimal](12, 4) NULL, [gw_base_tax_amount_invoiced] [decimal](12, 4) NULL, [gw_tax_amount_invoiced] [decimal](12, 4) NULL, [gw_base_price_refunded] [decimal](12, 4) NULL, [gw_price_refunded] [decimal](12, 4) NULL, [gw_base_tax_amount_refunded] [decimal](12, 4) NULL, [gw_tax_amount_refunded] [decimal](12, 4) NULL, [event_id] [int] NULL, [giftregistry_item_id] [int] NULL, [business_area] nvarchar NULL, [custom_options_has_private_data] [smallint] NULL, [auto_ship] [smallint] NULL, CONSTRAINT [PK_sales_order_item_item_id] PRIMARY KEY CLUSTERED ( [item_id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95, DATA_COMPRESSION = PAGE) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0)) FOR [order_id] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [parent_item_id] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [quote_item_id] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [store_id] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (getdate()) FOR [created_at] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (getdate()) FOR [updated_at] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [product_id] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [product_type] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [weight] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [is_virtual] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [sku] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [name] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [is_qty_decimal] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0)) FOR [no_discount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [qty_backordered] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [qty_canceled] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [qty_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [qty_ordered] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [qty_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [qty_shipped] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [base_cost] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [price] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [base_price] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [original_price] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_original_price] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [tax_percent] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [tax_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [base_tax_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [tax_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [base_tax_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [discount_percent] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [discount_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [base_discount_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [discount_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [base_discount_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [amount_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [base_amount_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [row_total] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [base_row_total] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [row_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [base_row_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [row_weight] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_tax_before_discount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [tax_before_discount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [ext_order_item_id] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [locked_do_invoice] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [locked_do_ship] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [price_incl_tax] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_price_incl_tax] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [row_total_incl_tax] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_row_total_incl_tax] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [discount_tax_compensation_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_discount_tax_compensation_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [discount_tax_compensation_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_discount_tax_compensation_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [discount_tax_compensation_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_discount_tax_compensation_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [tax_canceled] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [discount_tax_compensation_canceled] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [tax_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_tax_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [discount_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_discount_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0)) FOR [free_shipping] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0.0000)) FOR [qty_returned] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gift_message_id] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gift_message_available] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [weee_tax_applied_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [weee_tax_applied_row_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [weee_tax_disposition] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [weee_tax_row_disposition] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_weee_tax_applied_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_weee_tax_applied_row_amnt] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_weee_tax_disposition] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [base_weee_tax_row_disposition] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_id] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_base_price] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_price] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_base_tax_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_tax_amount] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_base_price_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_price_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_base_tax_amount_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_tax_amount_invoiced] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_base_price_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_price_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_base_tax_amount_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [gw_tax_amount_refunded] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [event_id] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [giftregistry_item_id] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT (NULL) FOR [custom_options_has_private_data] GO

ALTER TABLE [msab_magento].[sales_order_item] ADD DEFAULT ((0)) FOR [auto_ship] GO

EXEC sys.sp_addextendedproperty @name=N'MS_SSMA_SOURCE', @value=N'msab_magento.sales_order_item' , @level0type=N'SCHEMA',@level0name=N'msab_magento', @level1type=N'TABLE',@level1name=N'sales_order_item' GO

James
  • 149
  • 1
  • 6

1 Answers1

1

Bear in mind your varchar(max) are stored off-table and will not be compressed with the "flick the switch" compression modes.

Consider

a) Unpicking the JSON and storing it as relational data if possible

b) changing your column to varbinary(max) and adding a compression step at your code level, as a trigger or SP. https://learn.microsoft.com/en-us/sql/t-sql/functions/compress-transact-sql?view=sql-server-ver16

To be honest, a 20GB database does not sound worth compressing.

LoztInSpace
  • 923
  • 4
  • 8