-1

Today I realized another fascinating thing about sql databases: They don't do paging well at all.

By paging I mean selecting a slice of data defined by a starting record number and a record count according to an order one has an index on - this operation is used by user interface grids with pagers or scrollbars with lazy loading.

In my case, I noticed how the beginning of the data in my grid the ui was very responsive, but lower down the data it became sluggish. That was with Sql Server.

I checked the execution plan, and it was reading the data through an index in the correct order, but for the data lower down it would still need to scan through all the previous data to get the correct starting row.

Now that I think about it, it's fairly obvious: Efficient seeking is usually supported by a search key, not by record number.

However, it doesn't have to be that way. It very much depends on how the underlying data structure of the respective database engine is implemented. Traditional b-trees can not efficiently find a record by number, but a variant of it can: The tree would have to store the number of all records in the whole respective subtree for each node. That way, even a lookup by row number can be served in logarithmic time.

With such an implementation, paging could always be efficient.

I'm wondering which database engines have that feature, if any - which is why the question is tagged with various different sql database engine tags.

Do you know what your favorite database does on paging? Does it do a scan or can it indeed seek by record number?

Michael Green
  • 25,255
  • 13
  • 54
  • 100
John
  • 775
  • 3
  • 17

2 Answers2

4

SQL Databases ... don't do paging well at all.

There are smart ways to query DBs involving indexes with varying support from different RDBMS.

"Paging" hardly defines a task. That's why your question is hard to answer. You have to specify how you define pages first.

Read-only tables are trivial. Just add a row number in a materialized view and a plain btree index on it. Every RDBMS can do that. The true art is in handling concurrent write load. You have to define exact requirements first. And that's where your question is leaving out ...

for the data lower down it would still need to scan through all the previous data to get the correct starting row.

Your description indicates that you are using a lowly OFFSET - FETCH in SQL Server - other RDBMS use different syntax. A large offset is bound to be slow this way.
Instead, remember unique (!) index column value(s) of first and last row for the current page and you can retrieve the next or previous n rows very quickly using a matching btree index - in logarithmic time.

In particular, Postgres also supports this technique for multicolumn keys (row values). Details in these related answers:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633
1

Pagination via OFFSET is both slow, and error prone. This discusses how to "remember where you left off" as a better way.

Rick James
  • 80,479
  • 5
  • 52
  • 119