Input/output operations per second (IOPS) and data throughput are two popular performance characteristics of any storage system. In the early days of computer technology, when data transfers were limited to small database queries, IOPS was the major metric that could provide system administrators with general information about the performance of a storage system.
In modern days, with the rise of big data, online video streaming, distributed data processing, and transfer of huge volumes of data across the Internet, data throughput is gaining relevance. Data-intensive operations associated with these technological innovations require better storage and network performance, not just in terms of IOPS, but throughput as well.
In this article, we’ll shed light on the relation between IOPS and throughput, as well as how DevOps specialists can use both metrics to evaluate storage system performance. We’ll also discuss how to measure IOPS and throughput and the extra factors you need to gain a comprehensive picture of storage system performance.
What is IOPS?
The IOPS metric shows how many read and/or write operations a storage device can perform per second. A single operation is performed on one Hard Disk Drives (HDD) normally have 512B or 4KB blocks, whereas modern Solid State Drive (SSD) expose storage memory in pages joined in blocks that can reach 512KB in size.
The IOPS metric itself says nothing about the amount of data that a drive can process. This amount depends on both the number of the IOPS and the block size (the largest amount of bits/bytes that can be allocated to a single I/O operation). For example, given the same IOPS value, more data can be processed (read or written) by the drive with the bigger block size.
In addition, IOPS can differ depending on whether the data is accessed sequentially or randomly. For HDDs in particular, IOPS is typically higher for sequential writes because the disk head can easily access contingent blocks. On the other hand, random reads and writes require the disk head to move to find the needed location. IOPS value can also be different for read and write operations.
For these reasons, IOPS can be divided into four categories:
- IOPS of random reads
- IOPS of random writes
- IOPS of sequential reads
- IOPS of sequential writes
What is throughput?
Storage throughput (also called data transfer rate) measures the amount of data transferred to and from the storage device per second. Normally, throughput is measured in MB/s. Throughput is closely related to IOPS and block size. Given the same IOPS, block size can make a substantial difference in terms of throughput. In particular, smaller block sizes result in lower throughput because each I/O reads or writes data in non-divisible blocks.
Similarly to IOPS, data throughput can also depend on the type of I/O: random or sequential. Data throughput is close to the maximum sustained bandwidth for sequential I/O. The data throughput of a random I/O is significantly smaller than that of a sequential I/O for HDDs, but is comparable on modern SSDs.
Why IOPS?
The performance of data-intensive applications, including databases, data warehouses, and multimedia processing systems, is directly affected by the number of input/output operations that a storage system can execute. Therefore, the storage media you choose should depend on the performance requirements of these applications. High IOPS (e.g., 100,000 for modern SSDs) can be a good indicator that a storage system will meet the requirements.
Also, DevOps and infrastructure operators can implement a mixed storage system that combines storage media with different IOPS parameters. For example, applications that use storage for redundancy, such as storing “cold” (non-writable) indexes, may use HDD with average IOPS parameters. HDDs with an average IOPS can be quite performant for applications with a lot of read operations. In contrast, applications that perform high-frequency random access I/O will be more efficient on SSDs that have consistent performance for random access data.
Combining storage devices with different IOPS profiles can help optimize the storage expenditures of your company and make DevOps more flexible in provisioning different storage media to applications.
Limitations of IOPS
While IOPS can be used as a general storage performance metric, it says more about the potential of a storage system than its actual performance. In isolation, the IOPS metric reveals the maximum amount of I/O operations a given storage drive can perform. However, it tells us nothing about the data workload that a storage device can process. That’s because IOPS does not take parameters like block size and response time into account. The same IOPS can result in different storage performance, depending on the block size.
Also, “factory” IOPS does not indicate the performance of applications with mixed I/O profiles. You must test IOPS in a real production setting to assess the actual needs of data-intensive applications.
Why data throughput?
In contrast to IOPS, throughput provides a concrete assessment of storage performance because it tells you how much data can be processed. You can use this metric to conclude whether a given storage drive meets your application requirements.
Throughput depends on IOPS and block size, as well as the type of I/O (random or sequential). In random access patterns, data is read or written from various locations on the disk. Here, the average seek time plays a major role in the storage system’s performance. Generally speaking, a random I/O is less optimized in HDD storage and more efficient in modern SSD, where the random search is dependent on the device’s internal controller and memory interface speed.
In sequential I/O, you are dealing with the maximum sustained bandwidth measured in MB/s. During a sequential I/O operation, the drive operates on contingent storage blocks, which means that the total throughput is only limited by IOPS.
Limitations of throughput
Throughput is more useful for assessing the performance of storage systems for large sequential operations than it is for random I/O. In the former case, it’s perfectly aligned with the IOPS metric and can be described as maximum sustained bandwidth.
In a random I/O that typically involves small data chunks, latency and queue length cause performance to substantially deviate from the “factory” throughput value. Sending random I/O requests can cause high latency. Thus, it’s generally agreed that operations with large files and data streams are more efficient with sequential throughput. In these circumstances, knowledge about the “factory” throughput tells little about actual storage performance.
Comparison of IOPS and throughput
Metric | Target | Meaning | Measurement Complexity | Advantages | Limitations | Recommended Usage |
---|---|---|---|---|---|---|
IOPS | Measures I/O per second | Tells how many operations a disk drive can perform per second | Hard to measure because of different I/O profiles and block sizes | Good measure of potential storage performance | Does not account for block size and mix of I/O profiles | Baseline assessment of potential storage performance in ideal scenario |
Throughput | Measures amount of data read/written per second | Tells how much data can be processed per second | Can be easily measured for sequential operations More testing is required for random I/O | Good measure of sequential operations with large files and data chunks | Does not equal maximum sustained bandwidth for random I/O Does not account for queue length and latency | Estimating requirements of data-intensive applications (especially multimedia and big data) |
Measuring IOPS
You can measure IOPS with open-source testing tools, such as Vdbench, which work by simulating I/O workloads on the target storage devices. However, these tools are often criticized for averaging IOPS, which limits their applicability for real application scenarios.
Generally speaking, measuring IOPS is difficult because real-world applications combine different I/O profiles. To get a more precise measure of IOPS, it’s recommended to assess storage performance based on application metrics, such as the speed of storage-related queries, rather than the exact number of operations.
Measuring throughput
Given the IOPS and block size, you can calculate storage throughput for sequential writes as follows:
Throughput = IOPS X block_size
So, for example, if the IOPS is 3,000 and the block size is 512 KB, the throughput of the drive is:
3000 * 512KB = 1,535 Mb
This value can be regarded as a maximum sustained bandwidth because in the sequential I/O, the blocks are contingent. In case of a random I/O, the throughput will also depend on the average search time and disk latency.
Throughput may also differ depending on the application’s data size. The OS and storage device will normally combine smaller data chunks to fill the entire block or split them if they are bigger than the block size. The throughput measurement will be more accurate for application data that matches a block size.
Conclusion
IOPS and throughput are quite useful for assessing storage performance. However, in isolation, IOPS is less informative because it only shows the potential performance, while throughput provides concrete information about data bandwidth.
When assessing storage performance, DevOps and infrastructure operators should take other parameters into account, such as network latency, I/O profile, and block size. To assess storage performance, avoid using only a storage drive’s “factory” metric. Rather, test its actual performance using real production workloads.
ref: https://www.site24x7.com/learn/linux/iops-throughput.html