Selectivity vs Cardinality

What is selectivity?

  • Selectivity is the estimated proportion of the rows that will return from our query based on the total rows of that table.
  • Let us assume We have a sales table That contains 9700 records.

select * from cust_sales where promo_id=55;

  • It is returning row count is about 8100. So our query returns a large scale of the records.

select * from cust_sales where promo_id=33;

  • It is returning 2000 rows.
  • So the optimizer will most probably pick different execution plans for these two queries.
  • As you can see, even if we write the same statement, the selectivity of our query changes the execution plan.

The selectivity will be calculated through the below formula.

selectivity = No.of rows returning from the query / total no of rows

  • As we can understand from here, the selectivity is calculated by the number of rows that we expect to return from our query, divided by the total number of rows.
  • If the selectivity is 0 means our query is “high selectivity “ which returns the low no of rows, if the selectivity is 1 means query is “low selectivity “ which returns the huge volume of rows.

What is cardinality?

  • The cardinality is the expected number of rows returned from our query.

Cardinality = total no of rows * selectivity

  • The estimator estimates the I/O cost of our execution. Selectivity affects the sorting cost. If the returning rows will be so many, it’s sort cost will be high. So it will be considered while creating an execution plan.
  • Cardinality is used to determine to join, sort and filter costs. By using cardinality, the optimizer decides which join method to use, it will be filtering to reduce the data while using an index, etc. Incorrect selectivity and cardinality = incorrect plan
  • Sometimes the optimizer is not able to predict the number of rows that a given operator will return (because of missing table statistics). This can prevent Oracle from estimating the cost of a query plan correctly, which can lead to the selection of a suboptimal plan.
  • Cardinality estimation errors can cause slow running queries.
  • Let us assume we have two examples here

Ex-1:

in these tables having 10 rows, you want to know the biggest EMP_NO from the EMP table.

select MAX(EMP_NO) from EMP;

  • In these above queries having the selectivity & cardinality details

selectivity = 10/10 = 1

  • As per the selectivity rule if the result is closely 1 it is “low selectivity”

cardinality = number of rows accessed = 10

Ex-2:

In the same example, I am adding the filter in the query

SELECT MAX(EMP_NO) from EMP where LAST_NAME=’DHONI’;

  • Assuming that there are only 2 Employees with last_name as ‘DHONI’

selectivity = number of rows accessed/total number of rows = 2/10 = 0.2 (40% of the rows were accessed)

cardinality = number of rows accessed = 2

ref: https://medium.com/nerd-for-tech/explain-about-a-selectivity-and-cardinality-in-oracle-c4417213b6f3#