All about hard drive cache

How does a hard drive cache work EXACTLY

The short answer is, EXACTLY, no one knows, how a hard drive cache works is a manufacturer secret and differs from drive to drive depending on the drive’s purpose, BUT, we have a lot of clues, some through the SATA specification (And PATA), others through industry standard commands, and it is also not so hard to get what we want from black box reverse engineering, we might not get the actual algorithm (or variant of the algorithm) from such an endeavor, but we can know enough to predict how it will work

Hard drives are not simple machines in any sense of the word, as soon as you are familiar with them, and if you are familiar with computer science, specifically algorithms, you will come to conclusions concerning where complexities lie ! and it is not all in the hardware, much of it is in the hard drive’s software (Firmware)

The hard drive’s raison d’être

You see, a hard drive spins at a certain speed (Most commonly 5400 or 7200 rpm), some spin even faster, the hard drive has to do all it can to do what it is asked in the most efficient way, so for example, it allows the OS (through the controller’s driver) to tell it all about what data it wants in advance so that it can plan the heads shortest path to getting all that data (Native Command Queuing and before it Tagged Command Queuing), but let us not get carried away here, we are here to find out how cache works ! NCQ is a topic for a different day (Or is it)

Im here for the recipes

There are very few recipes and interactions that you are able to make use of, but let me try to come up with the most common ones you will probably want.

IMPORTANT: please note that all this is lost when you switch your computer off, to make this stuff permanent, you will need to add them to /etc/rc.local or use udev rules

write caching

First, here are the commands to probe for state, enable and disable

# Check status (=0 means disabled)
sudo hdparm -W /dev/sdX
# Enable
sudo hdparm -W1 /dev/sdX
# Disable
sudo hdparm -W0 /dev/sdX

read ahead caching

First, here are the commands to probe for state, enable and disable

# Check for state (Zero means disabled, other values are sectors to cache)
sudo hdparm -a /dev/sdX
# Enable (Ask for a 256 sector read ahead)
sudo hdparm -a 256 /dev/sdX
# Disable
sudo hdparm -A 0 /dev/sdX

Operating system level caching for a device

# Set read ahead for a disk into ram (Unit: Memory blocks)
blockdev --setra xxx /dev/sda
# Set write caching in system memory (Percentage of ram)
echo 10 > /proc/sys/vm/dirty_ratio
# Fstab entry to create a hard drive (Block device) in RAM (percentage or size Ex: 20G)
tmpfs /mnt/tmpfs tmpfs size=50%,rw,nosuid,nodev 0 0

In this day and age, do we still need spinning hard drives anyway ?

Well, yes, and no, in my case, I burn through hard drives and SSDs very quickly, but with a little tweaking, hard drives live a bit longer (Can only be achieved by also managing the vibration of multiple disks with a heavy computer case, but that is a topic for a different post), my use case is all about continuous writing, SSDs don’t seem to like this.

If this does not apply to you, and SSD cost is what is stopping you from going all in SSD, then maybe you would be interested in a post about adding an SSD caching layer in front of your inexpensive spinning disk

Why this is important to me (and you)

It is important to me because I have a mysql database spread across a big bunch of spinning disks, those disks are being written to ALL THE TIME, and this is precisely why using SSDs here is a bad idea, the data is short lived but the drive is hammered with writes continuously !

I am not saying that hard drives don’t take a considerable hit when they are hammered with writes continuously, but a disk constantly busy seeking while writing vs a disk writing sequentially do not bear the same kind of penalty, in fact, from my experiments, a hard disk with a write load designed to destroy it, will last much less than an SSD ! and the hit on SSDs also depends on the workload (Check write amplification), so yeah, this subject can get out of hand quickly

Is a hard drive’s cache used for reading or writing

Both, you will be told online (On some very authoritative popular places) that it is mostly for reading, but I fail to see what that means, it is mostly for whatever you are doing more ! Here is a bad example, It’s as if you are asking if a dolly is more concerned with sending goods to the truck or bringing them from the truck to the warehouse, it depends on whether you are loading or unloading the truck

Why is this a bad example you ask, well, because a hard drive is not a dolly that is being used to unload a truck, operating systems and database engines and hard drives are not a sheet of metal on 4 wheels (More like a sheet of oxidized metal on one bearing, but that is besides the point), A database operation will typically require many reads before it does any writes, and those reads are also handled by the database engine’s cache and the operating system’s cache, you get the idea and complexity…. but this still doesn’t mean that cache is concerned with reads more than writes or the other way around. it will depend on your workload, and on the correct disk firmware for that workload (EX: WD purple vs WD Blue, VS WD Black for example).

the firmware will always determine the priorities of the disk when caching, so certain firmwares will lean towards caching writes over reads while other firmwares will do the opposite.

NCQ already !

Well, since me and my big mouth already got us into NCQ, let me start with that and get it out of the way

NCQ is not possible without a chache, the cache is used to

  • Store operating system’s requests, reordering them according to their locations on the disk, and fetch them
  • Some requests may be served immediately from the cache before that cache is overwritten
  • Write Coalescing and Deferred Writes, writes can be “acknowledged” before being written and wait their turn to truly be written, and are only written to disk when they are combined into a larger write for optimization (There is a feature in NCQ that allows the OS to know if it was written to the disk or just the cache, but you don’t need that in your applications, you shouldn’t care)

Okay, so let us get back to what we were saying….

Hard drive cache for reading

hard drive designers are certainly well aware of the operating system’s cache in ram, so what good could come from caching in a measly 64MBs on the disk

this is a very good question, you see the operating system will not attempt to read neighboring areas of the disk just because they have zero overhead, but the disk will, it is free potential prefetch so why wouldn’t it fill its cache with it

There are many reasons why it would and why it would not, the cache size is limited, so there are priorities to what gets done with this cache, but also, the required processing is not little, so you don’t want to push that hard drive processor making a bottleneck out of it, remember when western digital came out with their black series and promoted them as having 2 processors (Micro-controllers is probably the correct term, but why complicate the jargon), that is because there is plenty of processing tasks to be done ?

So let us get to the reading business, if you ask AI, you will get very outdated or irrelevant data, when i asked AI, it seems to return advantages that are nulled by operating system disk-to-ram caching, so let me tell you what is still true and what is not

  1. Prefetching and Read-Ahead Optimization also known as (read-lookahead feature) and (read-ahead caching): Since the hard drive has knowledge of its own physical layout and access patterns, it can intelligently prefetch adjacent data into cache. Unlike the operating system, which only caches frequently used files or blocks, the hard drive itself can anticipate sequential reads and load data preemptively at a very little to no overhead (because it is reading data in the head’s way mostly). This is particularly useful for sequential reads (Mostly contiguous) . the drive itself has the facility to detect whether the read is sequential or not from the request addresses, SO TO AVOID LOST SPINS DON’T COMPLETELY DISABLE IT… MAKE IT LOWER IF YOU MUST, EXPERIMENTATION ON THE BEST SIZE IS KEY
  2. Interaction with OS-Level Caching: While the operating system also caches data in RAM, the drive’s internal cache is the first line of defense against performance bottlenecks. The OS might not always know the drive’s specific access patterns, whereas the drive’s firmware can optimize for known workloads in real-time.
  3. Adaptive Algorithms: Some hard drives (probably all modern ones) employ adaptive caching techniques, where they analyze access patterns over time and adjust caching strategies accordingly. For example, a drive may increase its read-ahead buffer if it detects frequent sequential reads but prioritize different caching strategies when dealing with random access patterns.

Hard drive cache for writing

Writing to a hard drive is not as straightforward as it might seem. The cache plays a crucial role in optimizing write performance and improving the overall lifespan of the drive. When data is written to a hard drive, it doesn’t necessarily go straight to the platters. Instead, the cache temporarily holds this data before it is written in an optimized manner.

This is beneficial for a few reasons:

  1. Write Coalescing: The hard drive can combine multiple small write requests into a single, larger, more efficient write operation. This reduces the number of disk rotations required to complete a task.
  2. Reducing Latency: If an application writes small amounts of data frequently, the cache allows the drive to acknowledge the write operation almost instantly before the data is physically committed to the disk.
  3. Deferring Writes: Some writes can be held in cache temporarily, allowing the drive to prioritize more urgent tasks before actually writing the data to disk.

However, this raises an important issue: data integrity. Since data is often held in volatile cache before being written permanently, there is always a risk of data loss in the event of a power failure or unexpected system shutdown. To mitigate this, many enterprise-grade drives implement write-through caching or battery-backed cache systems that ensure data is not lost before it is written.

Does Cache Improve Write Speed?

Yes, but only under certain conditions. For bursty, short writes, the cache significantly improves performance because the hard drive doesn’t have to immediately seek and rotate to a specific position on the disk. Instead, it temporarily holds the data and commits it at an optimal time. However, for sustained, sequential writes that exceed the cache size, the drive eventually has to flush the cache and write directly to disk, which means the cache offers diminishing returns.

Another critical aspect to consider is firmware tuning. Some manufacturers optimize their firmware for different workloads. Consumer drives often prioritize read-heavy workloads, while enterprise drives optimize caching strategies for sustained writes and improved data integrity.

Cache Eviction and Management

Since cache size is limited (typically between 8MB and 256MB on modern drives), the firmware must decide what stays in cache and what gets discarded. The general approach follows:

  • Least Recently Used (LRU): Frequently accessed data is kept in cache, while older, less-used data is replaced.
  • Write Prioritization: If a large sequential write is detected, the drive may flush other cache contents to prioritize this operation.
  • Predictive Read-Ahead: The drive may determine patterns in disk access and prefetch data into cache for anticipated future reads.

The Role of the OS in Caching

The operating system also plays a major role in caching, with its own layer of RAM-based disk caching. It can reorder and batch disk operations before passing them to the hard drive. This means that even if a hard drive’s cache is relatively small, the OS can compensate by managing frequently accessed data in RAM, which is significantly faster than any onboard hard drive cache.

When Cache Doesn’t Help

While cache is incredibly useful for many workloads, there are scenarios where it does little to nothing:

  • Purely Sequential Writes: If you are writing large files that exceed the cache size, the drive will quickly bypass the cache and write directly to disk.
  • Heavy Random Workloads: If your workload is entirely random writes that do not benefit from coalescing or deferred writes, the cache provides minimal advantage.
  • Database Applications (Like MySQL): Many database engines already perform their own caching and optimizations, sometimes making CERTAIN TYPES OF CACHING on the hard drive’s cache redundant, and making other caching mechanisms more valuable (Why i research hard drive caching).

Final Thoughts

Hard drive cache is a critical but often misunderstood component. It plays a dynamic role in both read and write operations, helping to bridge the performance gap between slow spinning platters and fast system memory. While the actual caching algorithms remain proprietary, we can infer their behavior from real-world testing and performance characteristics.

For database-heavy workloads like MySQL, tuning both the database and disk caching mechanisms can lead to significant performance gains. Understanding when and how a hard drive’s cache is utilized can help in selecting the right drive for your specific use case.

Leave a Reply

Your email address will not be published. Required fields are marked *