Using the Oracle 11GR2 database flash cache

Oracle just released a patch which allows you to use the database flash cache on Oracle Enterprise Linux even if you don’t have exadata storage.  The patch is the obscurely named:

  •    8974084:META BUG FOR FLASH CACHE 11.2PL BUGS TO BACKPORT TO 11.2.0.1 OEL

Once you install the patch you can use any old flash device as a database flash cache.  Below documented some initial dabbling on a very old server and a cheap usb flash device.   The results are not representative of the performance you’d get on quality hardware, but are still interesting.

Setup and configuration

If you just want to experiment using an USB flash device, then you first need to get that device mounted. On test machine create a directory “/mnt/usbflash” then create an /etc/fstab entry like this:

   /dev/sda1               /mnt/usbflash           vfat    noauto,users,rw,umask=0 0 0

On your system you might need to change “/dev/sda1” to another device depending on how your fixed disks are configured.  You should then be able to mount the flashdrive by typiing “mount /dev/sda1”.  Make sure that the mount point is writable by oracle (chmod 777 /mnt/usbflash). 

Once mounted, you configure the flash cache by setting the parameters DB_FLASH_CACHE_FILE and DB_FLASH_CACHE_SIZE.  My settings are shown below:

Note that the value of DB_FLASH_CACHE_FILE needs to be a file on the flash drive, not the flash drive mount point itself.

Once these parameters are set, the flash cache will be enabled and will act as a secondary cache to the buffer cache.  When a block is removed from the primary cache, it will still exist in the flash cache, and can be read back without a physical read to the spinning disk.

Monitoring

There’s a few ways to examine how the flash cache is being used.  Firstly,  V$SYSSTAT contains some new statistics showing the number of blocks added to the cache and the number of “hits” that were satisfied from cache (script here):

Some new wait events are now visible showing the waits incurred when adding or reading from the flash  cache.   Below we see that ‘db flash’ waits are higher than ‘db file sequential read’, though reads from the flash cache are much quicker than reads from disk (but there are a lot more of them):

Now, remember that I could not have picked worst hardware for this test – an old two CPU intel box and a cheap thumb drive.  Even so, it’s remarkable how high the write overhead is in comparison to the total time.    Although the flash reads save time when compared to db file sequential reads, the overhead of maintaining the cache can be high because flash based SSD has a relatively severe write penalty.

All flash-based Solid State Disk have issues with write performance.  However, cheap Multi Level Cell (MLC) flash take about 3 times as long to write as the more expensive Single Level Cell (SLC).  When flash drives are new, the empty space can be written to in single page increments (usually 4KB).  However, when the flash drive is older, writes typically require erasing a complete 128 page block which is very much slower.  My cheap USB drive was old and MLC, so it had very poor write performance.   But even the best flash based SSD is going to be much slower for writes than for reads, and in some cases using a  flash cache might slow a database down as a result.  So monitoring is important.

There’s a couple of other V$SYSSTAT statistics of interest:

To examine the contents of the cache, examine the V$BH view.  Buffers in the flash cache have STATUS values such as ‘flashcur’, allowing us to count the buffers from each object in the main buffer cache and in the flash cache (script is here):

In this case, the TXN_DATA table has 85,833 blocks in the flash cache and 28,753 blocks in the main buffer cache. 

For the 11.2..0.2 and higher code line no need to apply the patch with an up to data Oracle distro.

Unfortunately, the flash cache is only available in Oracle operating systems – Oracle Enterprise Linux or Solaris. The ORA-00439 error will be encountered if you try to use it on any other systems.

When to use the flash_cache feature

The Oracle documentation suggests enabling the flash_cache when the data buffer advisor suggests that Oracle wants more RAM (a case which is almost always true!), when you are disk I/O bound and when you have spare CPU cycles.  They mention CPU because moving to solid-state flash storage removes disk I/O, but changes the workload bottleneck to CPU in many cases.

In other words, flash_cache is for every system that is not already running a SSD back end!

  • The Buffer Pool Advisory section of your Automatic Workload Repository (AWR) report or STATSPACK report indicates that doubling the size of the buffer cache would be beneficial.
  • db file sequential read is a top wait event.
  • You have spare CPU.

KEEP pool vs. flash_cache

It’s important to note that flash_cache is different from the KEEP pool in both the media type (RAM vs. flash SSD) as well as the arguments.  When using the KEEP pool, you can write automated scripts to detect popular object and cache them to stay in RAM as long a the instance is running:

Conversely, with smart flash cache, Oracle moves aged-out data blocks from the RAM data buffer into the flash cache area, instead of letting it age-out to disk.

The flash_cache is a new storage clause argument, very similar to the KEEP pool:

alter|create table|index   objectname storage
(
   buffer_pool { keep | recycle | default }
   flash_cache { keep | none    | default }
);

One confusing aspect of the flash_cache is that Oracle is treating the onboard flash memory as if it were RAM, and not disk!  For example, note that the flash_cache syntax also include a KEEP argument, just like the buffer_pool assignment:

alter table fred storage (flash_cache keep);

alter table   fred storage (flash_cache none);

alter table    fred storage (flash_cache default);

Disabling Smart Flash Cache

Resetting the initialization parameters disables the Smart Flash Cache.

ALTER SYSTEM RESET DB_FLASH_CACHE_FILE SCOPE=SPFILE;
ALTER SYSTEM RESET DB_FLASH_CACHE_SIZE SCOPE=SPFILE;
SHUTDOWN IMMEDIATE;
STARTUP;

SELECT * FROM v$flashfilestat;

no rows selected

You can also display details about the statistics for the flash file statistics:

set lines 100
col name format a20

select   * from   v$flashfilestat;

In addition to object segregation, the flash_cache may also be beneficial to RAC systems, since RAC databases love to run on flash drives because it removes bandwidth saturation at the cache fusion layer as a source of contention:

DRAM SSD vs. Flash SSD

With all the talk about the Oracle “flash cache”, it is important to note that there are two types of SSD, and only DRAM SSD is suitable for Oracle database storage.  The flash type SSD suffers from serious shortcomings, namely a degradation of access speed over time.  At first, Flash SSD is 5 times faster than a platter disk, but after some usage the average read time becomes far slower than a hard drive.  For Oracle, only rack-mounted DRAM SSD is acceptable for good performance:

 Avg. Read speed Avg. write speed
Platter disk10.0 ms.  7.0 ms.
DRAM SSD 0.4 ms.   0.4 ms.
Flash SSD     1.7 ms.94.5 ms.

Conclusion

Good to see the flash cache working, even with this crappy hardware. it’s really good that Oracle opened this up to non-exadata hardware. 

It’ll be getting a better setup soon so that It can seen how this works with a decent commerical SSD flash drive.

Better to have at least two layers – one for mass storage, the other for fast retrieval.   We should be cautious however, because the flash write penalty may result in performance problems similar to those we’ve seen with RAID5.

credit: http://guyharrison.squarespace.com/blog/2009/11/24/using-the-oracle-11gr2-database-flash-cache.html
credit: https://www.dba-oracle.com/t_flash_cache.htm