8

Querying sys.partitions can return an approximate row count for a table.

I've noticed that this is returning the same row count for all partitions, regardless of the actual content (even for empty partitions).

The table has a clustered columnstore index and statistics have been created on almost all of the columns. Statistics are updated daily after each data load. The table is partitioned by date.

sys.partitions query:

    SELECT   convert(date, convert(varchar,rv.[value])) as partitionDate, p.rows as syspartitions_RowCount
        FROM       sys.tables t     
        join       sys.schemas  sc on sc.schema_id = t.schema_id        
        JOIN        sys.partitions p                ON      p.[object_id]         = t.[object_id]
        JOIN        sys.indexes i                   ON      i.[object_id]         = p.[object_id]
                                                    AND     i.[index_id]          = p.[index_id]
        JOIN        sys.data_spaces ds              ON      ds.[data_space_id]    = i.[data_space_id]
        LEFT JOIN   sys.partition_schemes ps        ON      ps.[data_space_id]    = ds.[data_space_id]
        LEFT JOIN   sys.partition_functions pf      ON      pf.[function_id]      = ps.[function_id]
        LEFT JOIN   sys.partition_range_values rv   ON      rv.[function_id]      = pf.[function_id]
                                                    AND     rv.[boundary_id]+1      = p.[partition_number]
        WHERE   p.[index_id] <=1
                and t.[name] ='tbl'
                and sc.name = 'temp'
                and convert(date, convert(varchar,rv.[value])) > '2016-05-31'
                order by convert(date, convert(varchar,rv.[value])), 
                t.[name]

table query:

                    select date, count_big(*) as real_count
                from temp.tbl
                where date > '2016-05-31'
                group by date
                order by date 

Sample results from both queries:

enter image description here

Hannah Vernon
  • 70,928
  • 22
  • 177
  • 323
Neil P
  • 1,294
  • 3
  • 20
  • 38

5 Answers5

6

Try using sys.dm_db_partition_stats instead of sys.partitions, as in:

SELECT ObjectName = QUOTENAME(sc.name) + '.' + QUOTENAME(t.name)
    , RangeValue = rv.value
    , sys_partitions_RowCount = p.rows
    , sys_dm_db_partition_stats_row_count = ddps.row_count
FROM sys.tables t 
    INNER JOIN sys.schemas sc ON t.schema_id = sc.schema_id 
    INNER JOIN sys.partitions p ON t.object_id = p.object_id
    INNER JOIN sys.indexes i ON t.object_id = i.object_id
        AND p.index_id = i.index_id
    INNER JOIN sys.data_spaces ds ON ds.data_space_id = i.data_space_id
    INNER JOIN sys.partition_schemes ps ON ps.data_space_id = ds.data_space_id
    INNER JOIN sys.partition_functions pf ON pf.function_id = ps.function_id
    INNER JOIN sys.partition_range_values rv ON rv.function_id = pf.function_id
        AND (rv.boundary_id + 1) = p.partition_number
    INNER JOIN sys.dm_db_partition_stats ddps ON t.object_id = ddps.object_id 
        AND p.partition_id = ddps.partition_id
WHERE p.index_id <= 1
    and t.name ='tbl'
    and sc.name = 'temp'
ORDER BY sc.name
    , t.name
    , rv.value;

For Azure SQL Data Warehouse, you'll need to use sys.dm_pdw_nodes_db_partition_stats instead of sys.dm_db_partition_stats, even though they contain the same details.

Note, I removed the CONVERT(date,...) functionality so this code is compatible with all partition schemes, not just those with date range values.

In the on-prem version of SQL Server, sys.partitions gets its row counts from the internal table ALUCOUNT or sys.sysrowsets, if ALUCOUNT.rows is NULL. The definition of sys.partitions is:

CREATE VIEW sys.partitions AS
    SELECT rs.rowsetid AS partition_id
        , rs.idmajor AS object_id
        , rs.idminor AS index_id
        , rs.numpart AS partition_number
        , rs.rowsetid AS hobt_id
        , isnull(ct.rows, rs.rcrows) AS rows
        , rs.fgidfs AS filestream_filegroup_id
        , cmprlevel AS data_compression
        , cl.name AS data_compression_desc
    FROM sys.sysrowsets rs OUTER APPLY OpenRowset(TABLE ALUCOUNT, rs.rowsetid, 0, 0) ct
    LEFT JOIN sys.syspalvalues cl ON cl.class = 'CMPL' AND cl.value = cmprlevel

The on-prem version of sys.dm_db_partition_stats gets its row counts differently, from the internal table, PARTITIONCOUNTS:

CREATE VIEW sys.dm_db_partition_stats AS
    SELECT c.partition_id
        , i.object_id
        , i.index_id
        , c.partition_number
        , c.in_row_data_page_count
        , c.in_row_used_page_count
        , c.in_row_reserved_page_count
        , c.lob_used_page_count
        , c.lob_reserved_page_count
        , c.row_overflow_used_page_count
        , c.row_overflow_reserved_page_count
        , c.used_page_count
        , c.reserved_page_count
        , c.row_count
FROM sys.indexes$ i 
CROSS APPLY OpenRowSet(TABLE PARTITIONCOUNTS, i.object_id, i.index_id, i.rowset) c

While both sys.partitions and sys.dm_db_partition_stats should both have correct row counts, I'd put more trust in the PARTITIONCOUNTS internal table.

Hannah Vernon
  • 70,928
  • 22
  • 177
  • 323
2

I had the same problem and came across this gem.

Found in this article: https://www.red-gate.com/simple-talk/sql/bi/azure-sql-data-warehouse-explaining-architecture-system-views/

SELECT pnp.partition_number,t.name,nps.[row_count],nps.[used_page_count]*8.0/1024 as usedSpaceMB,nt.distribution_id
 FROM
   sys.tables t
INNER JOIN sys.indexes i
    ON  t.[object_id] = i.[object_id]
    AND i.[index_id] <= 1 /* HEAP = 0, CLUSTERED or CLUSTERED_COLUMNSTORE =1 */
INNER JOIN sys.pdw_table_mappings tm
    ON t.[object_id] = tm.[object_id]
INNER JOIN sys.pdw_nodes_tables nt
    ON tm.[physical_name] = nt.[name]
INNER JOIN sys.pdw_nodes_partitions pnp 
    ON nt.[object_id]=pnp.[object_id] 
    AND nt.[pdw_node_id]=pnp.[pdw_node_id] 
    AND nt.[distribution_id] = pnp.[distribution_id]
INNER JOIN sys.dm_pdw_nodes_db_partition_stats nps
    ON nt.[object_id] = nps.[object_id]
    AND nt.[pdw_node_id] = nps.[pdw_node_id]
    AND nt.[distribution_id] = nps.[distribution_id]
    AND pnp.[partition_id]=nps.[partition_id]
WHERE t.name='FactProductInventory'
ORDER BY usedSpaceMB DESC;
Alex KeySmith
  • 297
  • 1
  • 4
  • 16
0

Use the old sybase floating view sysindexes to resolve this issue, (id matches up with object_id in the newer system views); that always returns the correct rowcount.

Mr SQL
  • 1
0

With some heap tables you will find that both sys.dm_db_partition_stats, column row_count and sysindexes, column rowcnt are incorrect.

In order to correct this, create a clustered index on the table in question (if its small enough and you are allowed to do so). Note, I attempted to rebuild the table by running DBCC DBREINDEX, and that never fixed the inconsistency.

If the above cannot be performed you will just have to make a note of the inconsistency (arrg!!).

PS Anyone from Microsoft have an alternative for the above, and NOT running COUNT(*)?

0

For Azure Synapse, you can try using:

DBCC PDW_SHOWSPACEUSED ("temp.tbl");

Reference.

mustaccio
  • 28,207
  • 24
  • 60
  • 76