9

I am trying to estimate the space requirements for a central database server that will be collecting data from about 200 identical field databases. I have the average daily row count for each table and now need to estimate the row size, including indexes, for each table.

Is there such an animal in existence or do I need to roll my own? If I do need to roll my own, can you suggest a good approach?

TIA

Metaphor
  • 906
  • 1
  • 11
  • 30

3 Answers3

11

Script selected as an asnwer has some drawbacks:

  1. Does not take into account some SQL Server datatypes:

    image, text, uniqueidentifier, sql_variant, ntext, hierarchyid, geometry, geography, xml, sysname

and (max) data structures structures such as varchar(max) to name a few.

  1. Takes into account computed columns which should not affect record's size.
  2. Does not take into account the fact table has NULLable columns.
  3. Does not take into account the fact that some datatypes have different size depending on specified scale (for example time(0) and time(7) should have different column length)

The following script shows the size of the record as it is on a data page. It shows maximum size that may be taken by the record, minimum size (in case one all values in the record are NULL if permitted by table structure). It shows table type (CLUSTERED/NONCLUSTERED), total number of data columns and schema where table belongs.

Note this script does not take into account indexes.

Update: Script counts each bit column as 1/8th of a byte. Minimum and maximum sizes are adjusted to ensure that records cannot be smaller than the forwarding stub size.

  SELECT 
     -- record cannot be smaller than the forwarding stub size =9 Bytes
     CASE WHEN [Max Size]>=9
     THEN [Max Size]
     ELSE 9 
     END AS [Max Size]
     -- record cannot be smaller than the forwarding stub size =9 Bytes
    , CASE WHEN [Min Size]>=9
     THEN [Min Size]
     ELSE 9 
     END AS [Min Size]
    , [Table Name]
    , [Table Type]
    , [Total Number of Columns]
    , [Schema]
    FROM
    (
SELECT 
    DISTINCT 
    -- Overhead for row header of a data row
    4
    +
    -- Overhead for NULL bitmap
    2+cast(([Total Number of Columns]+7)/8 AS BIGINT)+
    -- overhead for variable length
    CASE WHEN [IsVariableLength]>0
    THEN
    2
    ELSE
    0
    END
    +  
    --- Sum is on record level
    SUM(
    a1.[max_length]
    +
    -- Overhead for variable-length columns
    CASE WHEN 
    -- varchar
    [System Type]='varchar'
    --(([system_type_id]=167) AND ([user_type_id]=167))
    OR
    -- nvarchar 
    [System Type]='nvarchar'
    --(([system_type_id]=231) AND ([user_type_id]=231))
    OR
    -- IMAGE
    (([system_type_id]=34) OR ([user_type_id]=34))
    OR
    -- TEXT
    (([system_type_id]=35) OR ([user_type_id]=35))
    OR 
    --  NTEXT
    (([system_type_id]=99) OR ([user_type_id]=99))
    OR 
    --  SQLVARIANT
    (([system_type_id]=98) OR ([user_type_id]=98))
    OR
    -- hierarchyid geometry geography
    (([system_type_id]=240))
    THEN 2
    ELSE 0
    END
    )
    OVER (PARTITION BY a1.[Schema], a1.[Table Name]) AS [Max Size]

    , -- Overhead for row header of a data row
    4
    +
    -- Overhead for NULL bitmap
    2+cast(([Total Number of Columns]+7)/8 AS BIGINT)+
    -- overhead for variable length
    CASE WHEN ([IsVariableLength]>0) AND ([AnyFixedColumn]=0)
    THEN
    2
    ELSE
    0
    END
    +
    --- Sum is on record level
    SUM(
    -- overhead for variable length depending on number of variable columns
    CASE WHEN 
    -- varchar
    --[System Type]='varchar'
    (([system_type_id]=167) OR ([user_type_id]=167))
    OR
    -- nvarchar 
    --[System Type]='nvarchar'
    (([system_type_id]=231) OR ([user_type_id]=231))
    OR
    -- IMAGE
    (([system_type_id]=34) OR ([user_type_id]=34))
    OR
    -- TEXT
    (([system_type_id]=35) OR ([user_type_id]=35))
    OR 
    --  NTEXT
    (([system_type_id]=99) OR ([user_type_id]=99))
    --  VARBINARY
    OR
    (([system_type_id]=165) OR ([user_type_id]=165))
    OR 
    --  SQLVARIANT
    (([system_type_id]=98) OR ([user_type_id]=98))
    OR
    -- hierarchyid geometry geography
    (([system_type_id]=240))
    OR
    -- xml
    (([system_type_id]=241))
    THEN
        CASE WHEN [Is Nullable]=1
        THEN 0 
        ELSE 
        1
        END
    ELSE
        CASE
        WHEN
        -- bit
        (([system_type_id]=104) OR ([user_type_id]=104))
        and [Is Nullable]=1
        THEN 0
        ELSE
        a1.[max_length]
        END
    END
    -- 

    )
    OVER (PARTITION BY a1.[Schema], a1.[Table Name]) AS [Min Size]
    , a1.[Table Name]
    , [Table Type]
    , [Total Number of Columns]
    , a1.[Schema]
    FROM
    -- Start a1
    (   SELECT
        (SELECT [name] FROM [sys].[schemas]
        WHERE [sys].[schemas].[schema_id]=[sys].[objects].[schema_id])
        AS [Schema]
        , [sys].[objects].[name] AS [Table Name]
        , [sys].[all_columns].[name] AS [Column Name]
        , [sys].[all_columns].[system_type_id]
        , (
            SELECT name FROM [sys].[types]
            WHERE [sys].[types].[system_type_id]=[sys].[all_columns].[system_type_id]
            AND
                    [sys].[types].[user_type_id]=[sys].[all_columns].[user_type_id]
            ) AS [System Type]
        , [sys].[all_columns].[user_type_id]
        , 
        CASE 
        WHEN 
        -- IMAGE
        (([system_type_id]=34) OR ([user_type_id]=34))
        THEN 2147483647
        -- TEXT
        WHEN (([system_type_id]=35) OR ([user_type_id]=35))
        THEN 2147483647
        --  NTEXT
        WHEN (([system_type_id]=99) OR ([user_type_id]=99))
        THEN 1073741823
        -- varchar(max)
        WHEN (([system_type_id]=167) OR ([user_type_id]=167)) AND ([sys].[all_columns].[max_length]=-1)
        THEN 2147483647
        -- nvarchar(max) 
        WHEN (([system_type_id]=231) OR ([user_type_id]=231)) AND ([sys].[all_columns].[max_length]=-1)
        THEN 1073741823
        -- varbinary(max)
        WHEN (([system_type_id]=165) OR ([user_type_id]=165)) AND ([sys].[all_columns].[max_length]=-1)
        THEN 2147483647
        -- hierarchyid geometry geography
        WHEN (([system_type_id]=240))
        THEN 2147483647
        -- xml
        WHEN (([system_type_id]=241) AND ([sys].[all_columns].[max_length]=-1))
        THEN 2147483647
        -- bit
        WHEN (([system_type_id]=104) OR ([user_type_id]=104))
        THEN 1/8    
        ELSE 
        CAST([sys].[all_columns].[max_length] AS BIGINT)
        END [max_length]
        , [sys].[all_columns].[is_nullable] AS [Is Nullable]
        , 
        CASE 
        WHEN EXISTS 
            (   
                SELECT type_desc FROM sys.indexes
                WHERE type_desc='CLUSTERED'
                AND [sys].[objects].[object_id]=[sys].[indexes].[object_id]
            )
        THEN 'CLUSTERED'
        ELSE 'HEAP'
        END AS [Table Type]
        , COUNT([sys].[all_columns].[name]) OVER (PARTITION BY [sys].[objects].[object_id]) AS [Total Number of Columns]
        ,SUM (CASE WHEN 
        -- varchar
        (
        (([system_type_id]=167) AND ([user_type_id]=167))
        OR
        -- nvarchar 
        (([system_type_id]=231) AND ([user_type_id]=231))
        )
        AND [sys].[all_columns].[is_nullable]=0
        THEN 1
        ELSE 0
        END) OVER (PARTITION BY [sys].[objects].[name]) AS [IsNonNullableVariableLength]
        ,SUM (
        CASE WHEN 
        -- varchar
        (([system_type_id]=167) OR ([user_type_id]=167))
        OR
        -- nvarchar 
        (([system_type_id]=231) OR ([user_type_id]=231))
        OR
        -- IMAGE
        (([system_type_id]=34) OR ([user_type_id]=34))
        OR
        -- TEXT
        (([system_type_id]=35) OR ([user_type_id]=35))
        OR 
        --  NTEXT
        (([system_type_id]=99) OR ([user_type_id]=99))
        --  VARBINARY
        OR
        (([system_type_id]=165) OR ([user_type_id]=165))
        OR 
        --  SQLVARIANT
        (([system_type_id]=98) OR ([user_type_id]=98))
        OR
        -- hierarchyid geometry geography
        (([system_type_id]=240))        
        OR
        -- xml
        (([system_type_id]=241))    
        THEN 1
        ELSE 0
        END) OVER (PARTITION BY [sys].[objects].[name]) 
        AS [IsVariableLength]
        ,SUM (
        CASE WHEN 
        -- varchar
        (([system_type_id]=167) OR ([user_type_id]=167))
        OR
        -- nvarchar 
        (([system_type_id]=231) OR ([user_type_id]=231))
        OR
        -- IMAGE
        (([system_type_id]=34) OR ([user_type_id]=34))
        OR
        -- TEXT
        (([system_type_id]=35) OR ([user_type_id]=35))
        OR 
        --  NTEXT
        (([system_type_id]=99) OR ([user_type_id]=99))
        --  VARBINARY
        OR
        (([system_type_id]=165) OR ([user_type_id]=165))
        OR 
        --  SQLVARIANT
        (([system_type_id]=98) OR ([user_type_id]=98))
        OR
        -- hierarchyid geometry geography
        (([system_type_id]=240))    
        OR
        -- xml
        (([system_type_id]=241))    
        THEN 0
        ELSE 1
        END) OVER (PARTITION BY [sys].[objects].[name]) 
        AS [AnyFixedColumn]
    FROM [sys].[objects]
    INNER JOIN sys.all_columns
    ON [sys].[objects].[object_id]=[sys].[all_columns].[object_id]
    WHERE type_desc='USER_TABLE'
    ) a1
    ) a2
Pavel Nefyodov
  • 652
  • 8
  • 18
10

Using Aaron's link for even finer adaption of my process, I would recommend a function and a view/query to report out size per row per table, including indexes.

create function dbo.getColumnSize (@typeName SYSNAME, @max_length INT, @precision INT)
RETURNS INT
AS
BEGIN
    RETURN (SELECT CASE @typeName
        WHEN 'tinyint'          THEN 1
        WHEN 'smallint'         THEN 2
        WHEN 'int'              THEN 4
        WHEN 'bigint'           THEN 8
        WHEN 'numeric'          THEN ((@precision - 1)/2) + 1
        WHEN 'decimal'          THEN ((@precision - 1)/2) + 1
        WHEN 'real'             THEN 4
        WHEN 'float'            THEN CASE WHEN @precision <=24 THEN 4 ELSE 8 END
        WHEN 'money'            THEN 8
        WHEN 'smallmoney'       THEN 4
        WHEN 'time'             THEN 5
        WHEN 'timestamp'        THEN 5
        WHEN 'date'             THEN 3
        WHEN 'smalldatetime'    THEN 4
        WHEN 'datetime'         THEN 8
        WHEN 'datetime2'        THEN 8
        WHEN 'datetimeoffset'   THEN 10
        WHEN 'char'             THEN @max_length
        WHEN 'varchar'          THEN @max_length + 2
        WHEN 'nchar'            THEN @max_length
        WHEN 'nvarchar'         THEN @max_length + 2
        WHEN 'binary'           THEN @max_length
        WHEN 'varbinary'        THEN @max_length + 2
        WHEN 'bit'              THEN 0.125
    END)
END

select SchemaName, ObjectName, SUM(CEILING(Bytes))+4+2+2+2+(ceiling(count(distinct columnname)*1.0/8.0)) As RowSize--, FLOOR((POWER(2,30)+(POWER(2,30)-1))*1.0/SUM(CEILING(Bytes))/30/2/60/24)
from (
    select s.name as SchemaName, o.name AS ObjectName, c.name as ColumnName, t.name as TypeName
        , dbo.getColumnSize(t.name,c.max_length, c.precision) AS Bytes
    from sys.objects o
        inner join sys.schemas s on s.schema_id=o.schema_id
        inner join sys.columns c on o.object_id=c.object_id
        inner join sys.types t on c.system_type_id=t.system_type_id
            and t.user_type_id=c.user_type_id
    where o.type='U'
        AND RIGHT(t.name,4) != 'text'
    UNION ALL
    select s.name as SchemaName, o.name AS ObjectName, c.name as ColumnName, t.name as TypeName
        , dbo.getColumnSize(t.name,c.max_length, c.precision) AS Bytes
    from sys.objects o
        inner join sys.schemas s on s.schema_id=o.schema_id
        inner join sys.indexes i on i.object_id = o.object_id
        inner join sys.index_columns ic on ic.object_id = o.object_id
            and i.index_id = ic.index_id
        inner join sys.columns c on o.object_id=c.object_id
            and ic.column_id = c.column_id
        inner join sys.types t on c.system_type_id=t.system_type_id
            and t.user_type_id=c.user_type_id
    where o.type='U'
        AND RIGHT(t.name,4) != 'text'
    ) Z
group by SchemaName, ObjectName
Jaaz Cole
  • 263
  • 2
  • 8
0
---METHOD 1  : DBCC SHOWCONTIG
  dbcc showcontig ('TableName') with tableresults
--check max min and average record size

--METHOD 2 : sys.dm_db_index_physical_stats
--dbcc showcontig is to be deprecated
SELECT * FROM sys.dm_db_index_physical_stats (DB_ID(N'DatabaseName'), OBJECT_ID(N'TableName'), NULL, NULL , 'DETAILED')

--METHOD 3 : SysColumns Only run and dont edit!!!
use db_name
create table ##tmpRowSize (TableName varchar(100),RowSizeDefinition int)
exec sp_msforeachtable 'INSERT INTO ##tmpRowSize Select ''?'' As TableName, SUM(C.Length) as Length from dbo.SysColumns C where C.id = object_id(''?'') '
select * from ##tmpRowSize order by RowSizeDefinition  desc
drop table ##tmpRowSize
مهدی
  • 101
  • 1