12c: Optimizer_Dynamic_Sampling = 11

With default sampling level of 2 (from 10g onwards) , dynamic sampling is performed only for the objects for which statistics do not exist. If the statistics are stale or insufficient, dynamic sampling is not done.

12c introduces a new value of 11 for OPTIMIZER_DYNAMIC_SAMPLING . This value allows the optimizer to automatically perform dynamic sampling using an appropriate level for a SQL statement, even if all basic table statistics exist but they are found to be stale or insufficient. The results of the dynamically sampled queries are persisted in the cache, as dynamic statistics, allowing other SQL statements to share these statistics. This level will no doubt generate a better plan during the first execution of the statement itself but will also lead to dynamic sampling being triggered more frequently and sometime unnecessarily as well.

This example demonstrates that if OPTIMIZER_DYNAMIC_SAMPLING is set to 11, dynamic sampling will be performed even in case of stale or insufficient statistics. Moreover dynamic sampling may be unnecessary triggered in some scenarios.

Insufficient Statistics

I have created a table HR.BIRTHDAYS having 10000 rows whose column MM is indexed and contains numeric month of birth with NDV = 12. The data distribution in the column is skewed . Statistics have been gathered for the table without histogram.

DB12c>select mm, count(*) from hr.birthdays group by mm order by mm;

MM   COUNT(*)
---------- ----------
1       9989
2         1
3         1
4         1
5         1
6         1
7         1
8         1
9         1
10        1
11        1
12        1

12 rows selected.

If OPTIMIZER_DYNAMIC_SAMPLING were set to 2 (default), dynamic sampling will not be done, as statistics are present for the table. However, if the parameter is set to 11 (new in 12c), in view of skewed data distribution, existing statistics are found to be insufficient (missing histogram) and dynamic sampling is performed leading to accurate cardinality estimates for both MM = 1 which occurs 0.01% times and MM = 12 which occurs 99.89% times .

DB12c>alter session set optimizer_dynamic_sampling=11;
set autot trace explain

select * from hr.birthdays where mm = 12;

Execution Plan
----------------------------------------------------------
Plan hash value: 3569291752
-----------------------------------------------------------------------------
|Id |Operation                          |Name    |Rows|Bytes|Cost(%CPU)|Time|
-----------------------------------------------------------------------------
|  0|SELECT STATEMENT                    |         | 1 |  37| 2(0)| 00:00:01|
|  1| TABLE ACCESS BY INDEX ROWID BATCHED|BIRTHDAYS| 1 |  37| 2(0)| 00:00:01|
|* 2|  INDEX RANGE SCAN                  |BDAY_IDX | 1 |    | 1(0)| 00:00:01|
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("MM"=12)

Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)

DB12c>select * from hr.birthdays where mm = 1;
set autot off

Execution Plan
----------------------------------------------------------
Plan hash value: 3605468880

-----------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time   |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |  9989 |   360K|    17   (0)|00:00:01|
|*  1 |  TABLE ACCESS FULL| BIRTHDAYS |  9989 |   360K|    17   (0)|00:00:01|
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("MM"=1)
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)

Stale Statistics

Now to make the statistics stale, I have modified the data in the table and have not refreshed statistics. Note that there are no rows for MM = 1 or 2, so that actual NDV = 10.
Although there are 8388 rows in table, dictionary statistics still shows NUM_ROWS = 10000 and NDV = 12

DB12c> select count(*), count(distinct mm) from hr.birthdays;

COUNT(*) COUNT(DISTINCTMM)
---------- -----------------
 8388                10

DB12c>select owner, table_name, num_rows
from dba_tables
where owner = 'HR'
and table_name = 'BIRTHDAYS';

OWNER   TABLE_NAME        NUM_ROWS
------- --------------- ----------
HR      BIRTHDAYS            10000

DB12c>select owner, table_name, column_name, num_distinct
from dba_tab_cols
where table_name= 'BIRTHDAYS' and column_name = 'MM';

OWNER   TABLE_NAME      COLUMN_NAME     NUM_DISTINCT
------- --------------- --------------- ------------
HR      BIRTHDAYS       MM                        12

If OPTIMIZER_DYNAMIC_SAMPLING = 2 (default), dynamic sampling will not be done, as statistics (although stale) are present for the table.

With OPTIMIZER_DYNAMIC_SAMPLING =11, since statistics are stale, dynamic sampling is performed and almost correct no. of rows are estimated for both MM = 1 and 4

DB12c>alter session set optimizer_dynamic_sampling=11;
select /*+ gather_plan_statistics */ count(*)
from hr.birthdays where mm = 1;

select * from table(dbms_xplan.display_cursor(format => 'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
SQL_ID  ghg0pr81m1ha3, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from hr.birthdays where
mm = 1

Plan hash value: 4218648105
-----------------------------------------------------------------------------
|Id  |Operation             |Name    |Starts|E-Rows|A-Rows|  A-Time| Buffers|
-----------------------------------------------------------------------------
|  0 |SELECT STATEMENT      |        |    1 |      |    1 |00:00:00.01|  40 |
|  1 | SORT AGGREGATE       |        |    1 |    1 |    1 |00:00:00.01|  40 |
|* 2 |  INDEX FAST FULL SCAN|BDAY_IDX|    1 |    1 |    0 |00:00:00.01|  40 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("MM"=1)
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)

DB12c>select /*+ gather_plan_statistics */ count(*)
from hr.birthdays where mm = 4;

select * from table(dbms_xplan.display_cursor(format => 'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------SQL_ID  bhrdb027v2pnt, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from hr.birthdays where
mm = 4

Plan hash value: 3164848757
-----------------------------------------------------------------------------
|Id  |Operation         |Name    |Starts|E-Rows|A-Rows|  A-Time   | Buffers |
-----------------------------------------------------------------------------|  0 |SELECT STATEMENT  |        |    1 |      |    1 |00:00:00.0 |       5 |
|  1 | SORT AGGREGATE   |        |    1 |    1 |    1 |00:00:00.0 |       5 |
|* 2 |  INDEX RANGE SCAN|BDAY_IDX|    1 |  810 |  810 |00:00:00.0 |       5 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("MM"=4)

Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)

Unnecessary Sampling

To demonstrate unnecessary sampling due to sampling level = 11, I have created unique index on NAME column and deleted the statistics for the table.
A search for a NAME would not have triggered dynamic sampling with a sampling level of 2 since there is a unique index on NAME column.
On the contrary, if OPTIMIZER_DYNAMIC_SAMPLING =11, dynamic sampling is unnecessarily performed .

DB12c> alter session set optimizer_dynamic_sampling=11;

select /*+ gather_plan_statistics */ count(*)
from hr.birthdays where NAME = 'NAME OCTOBER  8802';


select * from table(dbms_xplan.display_cursor(format => 'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------SQL_ID  d79yg9wq02swy, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from hr.birthdays where
NAME = 'NAME OCTOBER  8802'

Plan hash value: 480407801
-----------------------------------------------------------------------------
|Id  |Operation          |Name         |Starts|E-Rows|A-Rows|A-Time |Buffers|
-----------------------------------------------------------------------------
|  0 |SELECT STATEMENT   |             |    1 |      |    1 |00:00:00.01| 2 |
|  1 | SORT AGGREGATE    |             |    1 |    1 |    1 |00:00:00.01| 2 |
|* 2 |  INDEX UNIQUE SCAN|BDAY_NAME_IDX|    1 |    1 |    1 |00:00:00.01| 2 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("NAME"='NAME OCTOBER  8802')
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)

Hence, If OPTIMIZER_DYNAMIC_SAMPLING = 11, sampling is performed in case of
•   missing statistics (as earlier with sampling level of 2),
•   stale statistics and
•   insufficient statistics
possibly leading to an optimal plan during first execution of the statement itself. But this has the disadvantage that dynamic sampling will be unnecessarily triggered in some cases leading to performance degradation. To have the best of both worlds, sampling level can be set to its default value of 2 and SPD’s can be employed to guide the optimizer to perform dynamic sampling in case of stale / insufficient statistics too.

ref: http://oracleinaction.com/12c-optimizer_dynamic_sampling-11/

Insufficient Statistics

Stale Statistics

Unnecessary Sampling

Related posts: