Indexes are a perfect tool to finding a certain value or some kind of range in a table. It is possible to speed up a query many times by avoiding a sequential scan on a large table. This kind of behavior is widely known and can be observed in any relational database system.
What is interesting to note is that indexes are not only good to search for data – they also offer a good way to provide you with sorted output.
Sorted output
Let us try to demonstrate things with a simple example. For the sake of simplicity we create a table with 100.000 rows:
test=# CREATE TABLE t_test (id int4); CREATE TABLE test=# INSERT INTO t_test SELECT * FROM generate_series(1, 100000); INSERT 0 100000
The goal of this example is to show what happens if you want to find the top 5 IDs. The query is pretty simple:
test=# SELECT * FROM t_test ORDER BY id DESC LIMIT 5; id -------- 100000 99999 99998 99997 99996 (5 rows)
All we need is an ORDER BY clause as well as a LIMIT clause. There is nothing special about this query. What is more important is the execution plan used for this query:
test=# explain analyze SELECT * FROM t_test ORDER BY id DESC LIMIT 5; QUERY PLAN ------------------------------------------------------------------------------------------------------- Limit (cost=3103.96..3103.98 rows=5 width=4) (actual time=35.758..35.761 rows=5 loops=1) -> Sort (cost=3103.96..3353.96 rows=100000 width=4) (actual time=35.757..35.757 rows=5 loops=1) Sort Key: id Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on t_test (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.007..12.372 rows=100000 loops=1) Total runtime: 35.794 ms (6 rows)
PostgreSQL has to scan the table completely and perform a top-N heapsort. The larger the table grows the longer this will take. In other words: You cannot do this kind of query on a very large table if you need deterministic runtimes.
An index can help:
test=# CREATE INDEX idx_id ON t_test (id); CREATE INDEX
The plan we had before needed 35 ms to perform the query. Once we have defined the index we can make use of its sorted content:
test=# explain analyze SELECT * FROM t_test ORDER BY id DESC LIMIT 5; QUERY PLAN ----------------------------------------------------------------------------------------------- Limit (cost=0.29..0.42 rows=5 width=4) (actual time=0.054..0.056 rows=5 loops=1) -> Index Only Scan Backward using idx_id on t_test (cost=0.29..2604.29 rows=100000 width=4) (actual time=0.052..0.054 rows=5 loops=1) Heap Fetches: 0 Total runtime: 0.080 ms (4 rows)
All PostgreSQL has to do now is to read the index backwards and take the first 5 rows. Note that there is no WHERE clause involved here. We merely take advantage of sorting here.
MIN and MAX
The same concept can be applied to min and max. The max value is the highest entry in the table, which is not NULL. The min value is the lowest value on the PostgreSQL table, which is not NULL.
For quite some time now the PostgreSQL optimizer is able to handle this kind of query efficiently:
test=# explain SELECT min(id), max(id) FROM t_test; QUERY PLAN ------------------------------------------------------------------------------------------------------- Result (cost=0.64..0.65 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.29..0.32 rows=1 width=4) -> Index Only Scan using idx_id on t_test (cost=0.29..2854.29 rows=100000 width=4) Index Cond: (id IS NOT NULL) InitPlan 2 (returns $1) -> Limit (cost=0.29..0.32 rows=1 width=4) -> Index Only Scan Backward using idx_id on t_test t_test_1 (cost=0.29..2854.29 rows=100000 width=4) Index Cond: (id IS NOT NULL) (9 rows)
The query executes well under a millisecond because all it does is scanning the index twice and return the data. This is a major benefit when it comes to performance.
credit: https://www.cybertec-postgresql.com/en/speeding-up-min-and-max/