Why You Shouldn't Use OFFSET and LIMIT for Pagination

Why You Shouldn't Use OFFSET and LIMIT for Pagination

Play this article

If you have been doing backend or database architecture for a while you have probably already done paging queries, like this:

SELECT * FROM table_name LIMIT 10 OFFSET 40

Right?

But if you did build your paginations such as this, it's very inefficient as the offset keeps increasing

What is wrong with OFFSET and LIMIT?

OFFSET and LIMIT work good for small amounts of data.

But when we have a lot of data, using OFFSET and LIMIT can slow things down. This is because the database has to read through all the records from the start to the offset, even if they aren’t needed.

For that to happen the database will need to perform an inefficient Full Table Scan everytime we request a pagination (insertions and deletes may happen meanwhile and we don’t want outdated data!).

What is a Full Table Scan?
A Full Table Scan (aka Sequential Scan) is a scan made in the database where every row in a table is sequentially read and the columns encountered are then checked for the validity of a condition. This type of Scan is known to be the slowest due to the heavy amount of I/O reads from the disk consisting of multiple seeks as well as costly disk to memory transfers.

That means that if we have 1 million users and we are requesting an OFFSET of 500K, it will need to fetch all those records (that will not even be needed!), put them in memory, and only after, get the 20 results specified in the LIMIT.

Google Pagination

Benchmarks

Let's consider a Sample Schema, here primary key is id

| id | first_name | last_name |
|----|------------|-----------|
|  1 | abcdsd     | rth       |
|  2 | hcfesd     | cdc       |
|  3 | dddbje     | enf       |

I inserted 1 million random Records in this table, I'll fetch a page with 10 documents, from the database. I'll start with offset 0 and go all the way to an offset of 900k in increments of 100k.

-- 0% OFFSET
SELECT * from name ORDER BY id LIMIT 10 OFFSET 0

-- 10% OFFSET
SELECT * from name ORDER BY id LIMIT 10 OFFSET 100000

-- 20% OFFSET
SELECT * from name ORDER BY id LIMIT 10 OFFSET 200000



-- 90% OFFSET
SELECT * from name ORDER BY id LIMIT 10 OFFSET 900000

Results

1 Million Documents

Offset pagination (ms)
0% offset3.73
10% offset52.39
20% offset96.83
30% offset144.13
40% offset216.06
50% offset257.71
60% offset313.06
70% offset371.03
80% offset424.63
90% offset482.71

The Query Time grew 130X


What we Should Use Instead ?

We should use Cursor based pagination.

See the Prev, next tabs here, this is an example of cursor based pagination

How Cursor based Pagination looks like

Example :

SELECT * FROM name WHERE id > 10 ORDER BY id LIMIT 20

let's say this returns this

Idfirst_namelast_name
11LeonardoMishra
14CristopherGupta
.......
43MarlonGarg

here the last received primary key is 43, so next page is retrieved by this query

SELECT * FROM name WHERE id > 43 ORDER BY id LIMIT 20

Instead of storing the current OFFSET and LIMIT locally and passing it with each request, we should be storing the last received primary key (usually an ID) and the LIMIT, so the query could end up being similar to this one.

Why?
Because by explicitly passing the latest read row, we are telling our DB exactly where to start the search based on an efficient indexed key and won’t have to consider any rows outside of that range.

Benefits

  • Stable Query time irrespective of data size

  • Resilience to shifting rows. For example, if a record is deleted, the next record that would have followed is still displayed since the query is working off of the cursor rather than a specific offset.

Drawbacks

  • Keep track of that little piece of state: the cursor.

  • Impossible to address a specific page directly. For instance, if the requirement is to jump directly to page five, it's not possible to do so since the pages themselves are not explicitly numbered

Results

Offset pagination (ms)Cursor pagination (ms)
0% offset3.736.27
10% offset52.394.07
20% offset96.835.15
30% offset144.135.29
40% offset216.066.65
50% offset257.717.26
60% offset313.064.61
70% offset371.036.00
80% offset424.635.60
90% offset482.715.05

Cursor based Pagination is exponentially faster than Offset based Pagination as data higher pages are retreived


Conclusion :)

Offset/limit is nice because it's easy to implement and understand, and we can directly address pages. Some downsides are that it can be slower as we navigate deeper into the pages, and it is more prone to drift.

Cursor-based pagination is nice because it is more performant and more resilient to shifting rows. Some of the downsides are that it is more complicated to implement, and we cannot directly address pages.