Sequential Scan calculation in PostgreSQL

testdb=# CREATE TABLE tbl (id int PRIMARY KEY, data int);
testdb=# CREATE INDEX tbl_data_idx ON tbl (data);
testdb=# INSERT INTO tbl SELECT generate_series(1,10000),generate_series(1,10000);
testdb=# ANALYZE;

The cost of the sequential scan is estimated by the cost_seqscan() function. In this subsection, we explore how to estimate the sequential scan cost of the following query.

testdb=# SELECT * FROM tbl WHERE id < 8000;

In the sequential scan, the start-up cost is equal to 0, and the run cost is defined by the following equation:

run cost 
=cpu run cost+disk run cost    
=(cpu_tuple_cost+cpu_operator_cost)×Ntuple+seq_page_cost×Npage

where seq_page_costcpu_tuple_cost and cpu_operator_cost are set in the postgresql.conf file, and the default values are 1.00.01, and 0.0025, respectively; NtupleNtuple and NpageNpage are the numbers of all tuples and all pages of this table, respectively, and these numbers can be shown using the following query:

Thus,

run cost=
(0.01+0.0025)×10000+1.0×45=170.0

Finally,

total cost=0.0+170.0=170

For confirmation, the result of the EXPLAIN command of the above query is shown below:

In first Line, we can find that the start-up and total costs are 0.00 and 170.00, respectively, and it is estimated that 8000 rows (tuples) will be selected by scanning all rows.

In the next Line a filter ‘Filter:(id < 8000)’ of the sequential scan is shown. More precisely, it is called a table level filter predicate. Note that this type of filter is used when reading all the tuples in the table, and it does not narrow the scanned range of table pages.

As understood from the run-cost estimation, PostgreSQL assumes that all pages will be read from storages; that is, PostgreSQL does not consider whether the scanned page is in the shared buffers or not.

Ref: https://www.interdb.jp/pg/pgsql03.html

Leave a Reply

Your email address will not be published. Required fields are marked *