All about hard drive cache

How does a hard drive cache work EXACTLY

The short answer is, EXACTLY, no one knows, how a hard drive cache works is a manufacturer secret and differs from drive to drive depending on the drive’s purpose, BUT, we have a lot of clues, some through the SATA specification (And PATA), others through industry standard commands, and it is also not so hard to get what we want from black box reverse engineering, we might not get the actual algorithm (or variant of the algorithm) from such an endeavor, but we can know enough to predict how it will work

Hard drives are not simple machines in any sense of the word, as soon as you are familiar with them, and if you are familiar with computer science, specifically algorithms, you will come to conclusions concerning where complexities lie ! and it is not all in the hardware, much of it is in the hard drive’s software (Firmware)

The hard drive’s raison d’être

You see, a hard drive spins at a certain speed (Most commonly 5400 or 7200 rpm), some spin even faster, the hard drive has to do all it can to do what it is asked in the most efficient way, so for example, it allows the OS (through the controller’s driver) to tell it all about what data it wants in advance so that it can plan the heads shortest path to getting all that data (Native Command Queuing and before it Tagged Command Queuing), but let us not get carried away here, we are here to find out how cache works ! NCQ is a topic for a different day (Or is it)

Im here for the recipes

There are very few recipes and interactions that you are able to make use of, but let me try to come up with the most common ones you will probably want.

IMPORTANT: please note that all this is lost when you switch your computer off, to make this stuff permanent, you will need to add them to /etc/rc.local or use udev rules

write caching

First, here are the commands to probe for state, enable and disable

# Check status (=0 means disabled)
sudo hdparm -W /dev/sdX
# Enable
sudo hdparm -W1 /dev/sdX
# Disable
sudo hdparm -W0 /dev/sdX

read ahead caching

First, here are the commands to probe for state, enable and disable

# Check for state (Zero means disabled, other values are sectors to cache)
sudo hdparm -a /dev/sdX
# Enable (Ask for a 256 sector read ahead)
sudo hdparm -a 256 /dev/sdX
# Disable
sudo hdparm -A 0 /dev/sdX

Operating system level caching for a device

# Set read ahead for a disk into ram (Unit: Memory blocks)
blockdev --setra xxx /dev/sda
# Set write caching in system memory (Percentage of ram)
echo 10 > /proc/sys/vm/dirty_ratio
# Fstab entry to create a hard drive (Block device) in RAM (percentage or size Ex: 20G)
tmpfs /mnt/tmpfs tmpfs size=50%,rw,nosuid,nodev 0 0

In this day and age, do we still need spinning hard drives anyway ?

Well, yes, and no, in my case, I burn through hard drives and SSDs very quickly, but with a little tweaking, hard drives live a bit longer (Can only be achieved by also managing the vibration of multiple disks with a heavy computer case, but that is a topic for a different post), my use case is all about continuous writing, SSDs don’t seem to like this.

If this does not apply to you, and SSD cost is what is stopping you from going all in SSD, then maybe you would be interested in a post about adding an SSD caching layer in front of your inexpensive spinning disk

Why this is important to me (and you)

It is important to me because I have a mysql database spread across a big bunch of spinning disks, those disks are being written to ALL THE TIME, and this is precisely why using SSDs here is a bad idea, the data is short lived but the drive is hammered with writes continuously !

I am not saying that hard drives don’t take a considerable hit when they are hammered with writes continuously, but a disk constantly busy seeking while writing vs a disk writing sequentially do not bear the same kind of penalty, in fact, from my experiments, a hard disk with a write load designed to destroy it, will last much less than an SSD ! and the hit on SSDs also depends on the workload (Check write amplification), so yeah, this subject can get out of hand quickly

Is a hard drive’s cache used for reading or writing

Both, you will be told online (On some very authoritative popular places) that it is mostly for reading, but I fail to see what that means, it is mostly for whatever you are doing more ! Here is a bad example, It’s as if you are asking if a dolly is more concerned with sending goods to the truck or bringing them from the truck to the warehouse, it depends on whether you are loading or unloading the truck

Why is this a bad example you ask, well, because a hard drive is not a dolly that is being used to unload a truck, operating systems and database engines and hard drives are not a sheet of metal on 4 wheels (More like a sheet of oxidized metal on one bearing, but that is besides the point), A database operation will typically require many reads before it does any writes, and those reads are also handled by the database engine’s cache and the operating system’s cache, you get the idea and complexity…. but this still doesn’t mean that cache is concerned with reads more than writes or the other way around. it will depend on your workload, and on the correct disk firmware for that workload (EX: WD purple vs WD Blue, VS WD Black for example).

the firmware will always determine the priorities of the disk when caching, so certain firmwares will lean towards caching writes over reads while other firmwares will do the opposite.

NCQ already !

Well, since me and my big mouth already got us into NCQ, let me start with that and get it out of the way

NCQ is not possible without a chache, the cache is used to

  • Store operating system’s requests, reordering them according to their locations on the disk, and fetch them
  • Some requests may be served immediately from the cache before that cache is overwritten
  • Write Coalescing and Deferred Writes, writes can be “acknowledged” before being written and wait their turn to truly be written, and are only written to disk when they are combined into a larger write for optimization (There is a feature in NCQ that allows the OS to know if it was written to the disk or just the cache, but you don’t need that in your applications, you shouldn’t care)

Okay, so let us get back to what we were saying….

Hard drive cache for reading

hard drive designers are certainly well aware of the operating system’s cache in ram, so what good could come from caching in a measly 64MBs on the disk

this is a very good question, you see the operating system will not attempt to read neighboring areas of the disk just because they have zero overhead, but the disk will, it is free potential prefetch so why wouldn’t it fill its cache with it

There are many reasons why it would and why it would not, the cache size is limited, so there are priorities to what gets done with this cache, but also, the required processing is not little, so you don’t want to push that hard drive processor making a bottleneck out of it, remember when western digital came out with their black series and promoted them as having 2 processors (Micro-controllers is probably the correct term, but why complicate the jargon), that is because there is plenty of processing tasks to be done ?

So let us get to the reading business, if you ask AI, you will get very outdated or irrelevant data, when i asked AI, it seems to return advantages that are nulled by operating system disk-to-ram caching, so let me tell you what is still true and what is not

  1. Prefetching and Read-Ahead Optimization also known as (read-lookahead feature) and (read-ahead caching): Since the hard drive has knowledge of its own physical layout and access patterns, it can intelligently prefetch adjacent data into cache. Unlike the operating system, which only caches frequently used files or blocks, the hard drive itself can anticipate sequential reads and load data preemptively at a very little to no overhead (because it is reading data in the head’s way mostly). This is particularly useful for sequential reads (Mostly contiguous) . the drive itself has the facility to detect whether the read is sequential or not from the request addresses, SO TO AVOID LOST SPINS DON’T COMPLETELY DISABLE IT… MAKE IT LOWER IF YOU MUST, EXPERIMENTATION ON THE BEST SIZE IS KEY
  2. Interaction with OS-Level Caching: While the operating system also caches data in RAM, the drive’s internal cache is the first line of defense against performance bottlenecks. The OS might not always know the drive’s specific access patterns, whereas the drive’s firmware can optimize for known workloads in real-time.
  3. Adaptive Algorithms: Some hard drives (probably all modern ones) employ adaptive caching techniques, where they analyze access patterns over time and adjust caching strategies accordingly. For example, a drive may increase its read-ahead buffer if it detects frequent sequential reads but prioritize different caching strategies when dealing with random access patterns.

Hard drive cache for writing

Writing to a hard drive is not as straightforward as it might seem. The cache plays a crucial role in optimizing write performance and improving the overall lifespan of the drive. When data is written to a hard drive, it doesn’t necessarily go straight to the platters. Instead, the cache temporarily holds this data before it is written in an optimized manner.

This is beneficial for a few reasons:

  1. Write Coalescing: The hard drive can combine multiple small write requests into a single, larger, more efficient write operation. This reduces the number of disk rotations required to complete a task.
  2. Reducing Latency: If an application writes small amounts of data frequently, the cache allows the drive to acknowledge the write operation almost instantly before the data is physically committed to the disk.
  3. Deferring Writes: Some writes can be held in cache temporarily, allowing the drive to prioritize more urgent tasks before actually writing the data to disk.

However, this raises an important issue: data integrity. Since data is often held in volatile cache before being written permanently, there is always a risk of data loss in the event of a power failure or unexpected system shutdown. To mitigate this, many enterprise-grade drives implement write-through caching or battery-backed cache systems that ensure data is not lost before it is written.

Does Cache Improve Write Speed?

Yes, but only under certain conditions. For bursty, short writes, the cache significantly improves performance because the hard drive doesn’t have to immediately seek and rotate to a specific position on the disk. Instead, it temporarily holds the data and commits it at an optimal time. However, for sustained, sequential writes that exceed the cache size, the drive eventually has to flush the cache and write directly to disk, which means the cache offers diminishing returns.

Another critical aspect to consider is firmware tuning. Some manufacturers optimize their firmware for different workloads. Consumer drives often prioritize read-heavy workloads, while enterprise drives optimize caching strategies for sustained writes and improved data integrity.

Cache Eviction and Management

Since cache size is limited (typically between 8MB and 256MB on modern drives), the firmware must decide what stays in cache and what gets discarded. The general approach follows:

  • Least Recently Used (LRU): Frequently accessed data is kept in cache, while older, less-used data is replaced.
  • Write Prioritization: If a large sequential write is detected, the drive may flush other cache contents to prioritize this operation.
  • Predictive Read-Ahead: The drive may determine patterns in disk access and prefetch data into cache for anticipated future reads.

The Role of the OS in Caching

The operating system also plays a major role in caching, with its own layer of RAM-based disk caching. It can reorder and batch disk operations before passing them to the hard drive. This means that even if a hard drive’s cache is relatively small, the OS can compensate by managing frequently accessed data in RAM, which is significantly faster than any onboard hard drive cache.

When Cache Doesn’t Help

While cache is incredibly useful for many workloads, there are scenarios where it does little to nothing:

  • Purely Sequential Writes: If you are writing large files that exceed the cache size, the drive will quickly bypass the cache and write directly to disk.
  • Heavy Random Workloads: If your workload is entirely random writes that do not benefit from coalescing or deferred writes, the cache provides minimal advantage.
  • Database Applications (Like MySQL): Many database engines already perform their own caching and optimizations, sometimes making CERTAIN TYPES OF CACHING on the hard drive’s cache redundant, and making other caching mechanisms more valuable (Why i research hard drive caching).

Final Thoughts

Hard drive cache is a critical but often misunderstood component. It plays a dynamic role in both read and write operations, helping to bridge the performance gap between slow spinning platters and fast system memory. While the actual caching algorithms remain proprietary, we can infer their behavior from real-world testing and performance characteristics.

For database-heavy workloads like MySQL, tuning both the database and disk caching mechanisms can lead to significant performance gains. Understanding when and how a hard drive’s cache is utilized can help in selecting the right drive for your specific use case.

12TB disk does not show up

I have been using an intel “D525mw” intel atom system as a network attached storage system for some time now, I have an extra SATA PCIe card (Silicon Image, Inc. SiI 3132) so that I can connect 4 disks, when the 12TB western digital disk (HGST HUH721212AL) is connected to the external SATA card, it does not show up, meaning, an “fdisk -l” does not bring it up !

So the next thing to do is swap the SATA connection with a different disk connected to the motherboard, and suddenly it works, amazing, but I need to know where the problem comes from

The first theory is that disks that are SFF-8447 compliant (rather than the old IDEMA standard) are not supported by this controller !

Hard drive power draw at startup

The maximum power draw a PC with many hard drives happens at boot time, in my case, the PC is a n intel atom D525MW, which hardly draws any power

What this means is that I need an oversized power supply that only does its thing at startup, then becomes an inefficient power supply right after, why this is particularly important is because this computer runs on a UPS, and the number of minutes it can stay up is a very important number.

The solution is to enable PUIS (Power up in standby), what this does is allow the disks not to spin as soon as they get power, but instead, spin up upon reception of a command from the controller. so in effect, the disks are spun up sequentially (In turn).

Continue reading “Hard drive power draw at startup”

Mounting QCOW2 (KVM/QEMU) directly

First, the tools you need

apt-get install qemu-utils

Now, enable NBD

modprobe nbd max_part=8

Once that is enabled, connect the file as a block device

qemu-nbd --connect=/dev/nbd0 /hds/usb/virts/Windows/main.qcow2

Now, the block device should appear like any other, alongside the partitions inside !

fdisk -l

On my machine, this resulted in

Disk /dev/nbd0: 95 GiB, 102005473280 bytes, 199229440 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xc5324c42

Device      Boot     Start       End   Sectors  Size Id Type
/dev/nbd0p1 *         2048    104447    102400   50M  7 HPFS/NTFS/exFAT
/dev/nbd0p2         104448 198138958 198034511 94.4G  7 HPFS/NTFS/exFAT
/dev/nbd0p3      198139904 199225343   1085440  530M 27 Hidden NTFS WinRE

This disk was around 40GB, but fdisk will see the number corresponding to the largest allowed size, 100GB in this case ! let us mount the drive

mount /dev/nbd0p2 /hds/loop

Now, in this case in particular, like any other block device that held the windows operating system, more often than not, you will get the message saying

The disk contains an unclean file system (0, 0).
Metadata kept in Windows cache, refused to mount.
Falling back to read-only mount because the NTFS partition is in an
unsafe state. Please resume and shutdown Windows fully (no hibernation
or fast restarting.)
Could not mount read-write, trying read-only

The solution to that is simple, follow the following two steps to remedy the issue and then force mount the file by using remove_hiberfile

ntfsfix /dev/nbd0p2
mount -t ntfs-3g -o remove_hiberfile /dev/nbd0p2 /hds/loop

The result of NTFSFIX was

Mounting volume... The disk contains an unclean file system (0, 0).
Metadata kept in Windows cache, refused to mount.
FAILED
Attempting to correct errors...
Processing $MFT and $MFTMirr...
Reading $MFT... OK
Reading $MFTMirr... OK
Comparing $MFTMirr to $MFT... OK
Processing of $MFT and $MFTMirr completed successfully.
Setting required flags on partition... OK
Going to empty the journal ($LogFile)... OK
Checking the alternate boot sector... OK
NTFS volume version is 3.1.
NTFS partition /dev/nbd0p2 was processed successfully.

And the following mount command worked as you would expect, silently

Now, if you want to disconnect the NBD image, you need to unmount (Like you normally would) THEN

#Disconnect the image from the NBD device
qemu-nbd --disconnect /dev/nbd0;
#Unload the NBD module
rmmod nbd;

Linux badblocks cheat sheet

1- Large disks need to have their block size specified, without it, disks like my 6TB and my 8Tb hard drives will not work, badblocks will report the following error.

badblocks: Value too large for defined data type invalid end block (5860522584): must be 32-bit value

So the solution is to add the block size, like the following for example (This one is destructive)

badblocks -b 4096 -wsv /dev/sdb

It is a good idea to LOG THE BAD SECTORS (this is the command i usually use for a destructive test)

badblocks -b 4096 -o /root/badblockslog.txt -wsv /dev/sdb

In the command above, the W means do a destructive red-write test, the S is for show progress, and the V is for show the errors you encounter, the -o flowed by a file name is where to keep the log file

Updating the firmware on my 2TB Seagate Barracuda

Why update the firmware !

My answer here is a bit unconventional, and certainly not a fact.. I even think I am wrong, but it can’t hurt, so here it goes

Seagate recommends you update the disk’s firmware to improve performance and longevity of the hard drive, I on the other hand have an extra mission…

The firmware on a hard drive is stored partially on a chip on the PCB, and partially on the disk itself ! I know that disk platters have a data retention life of 10 years, the area where the firmware is written is never refreshed since it is read only when the disk boots up, So i am hoping (even though doubtful) that the firmware update might re-write this area of the disk and breath new life into it.

A disk certain application claims to refresh the data on that area of the disk, After testing that application I will come here and add my findings accordingly

Getting the firmware

Let us start by downloading the firmware ! To download firmware from seagate’s website, you will need to know your hard drive serial number, to do just that, open the command prompt elevated, and run the following command

wmic diskdrive get model,serialnumber

The result of me running that command is as follows

The results of executing the command above show that I have a hard drive of the model (ST2000DM001-1CH164) as you can see with a serial number that I now have (Masking serial numbers just in case seagate has a problem with me publishing them as it allows you to download the firmware using the serial number)…..

Now that I have the serial number, I can go to seagate’s firmware download page here and grab the firmware… once done, I unzip it, and the following folders appear

Creating a bootable USB flash disk

Now, inside the Bootable tools, there is a file (SeaChest_RC_2.7.4_10-18-2018.usbBootMaker.exe) that will create a bootable flash stick for you, insert a flash stick of any size that is to be deleted by this application, run the application, and now you have a bootable drive, but without the firmware, so copy the firmware folder you see above into the flash stick, and now you are ready to boot from it, for instructions how to boot from a flash stick, you will need to check with your motherboard’s manufacturer documentation, it is usually a simple thing such as hitting F11 at boot time.

Updating the firmware

Once booted, you should be presented with a linux command prompt, where we can run commands to update the firmware

To see what disks are on your system, run the following command

SeaChest_Firmware --scan

The scan should give you the handle for the drive, if you have never used linux before, the handle started with /dev/ (Short for device), and sata disks usually start with sdX (Where X starts with A and ends with a letter corresponding to the last disk you have in your system) old PATA disks usually start with hdX… but that is usually not something you need as PATA disks are virtually non existent at this stage

Now, execute the firmware update command like so

SeaChest_Firmware -d /dev/sg3 --downloadFW /firmware/filename.LOD

Now, if you want to know whether the update was successful or not, just run the scan command again, and note the firmware on it !

My Problem !

As you can see from the image below, I have 3 firmware files, named 1TB, 2TB, and 3TB, when i ran the command above, The system claimed that the update was successful, but didn’t really update the firmware, I was still stuck with 26 rather than 29 !

So i decided to use Seagates own configuration file to do the update with the command

SeaChest_Firmware -d /dev/sg3 --fwdlConfig GPCC2949.CFS

Surprise was that I got the following error

model matched but the current firmware version does not match the available updates

So, I went back in time and remembered that for this particular disk, I had changed the PCB before (Trying to get a 3TB disk to work by moving a certain chip from one board to the other, diagnosis turned out the problem is not the PCB)… So instead of flashing the 2TB firmware file, I flashed the 3TB, and what do you know, It worked.

Anyway, I will come back with screenshots of the whole thing… and more data for those who are having trouble updating their firmware, until then, hang in there

Mounting unclean NTFS windows drive in Linux

Whenever i get the following message

mount /dev/sdd1 /hds/sgt2tb
The disk contains an unclean file system (0, 0).
Metadata kept in Windows cache, refused to mount.
Falling back to read-only mount because the NTFS partition is in an
unsafe state. Please resume and shutdown Windows fully (no hibernation
or fast restarting.)
Could not mount read-write, trying read-only

The command

ntfsfix /dev/sdd1

resolves the issue, and produces the following message

Mounting volume... The disk contains an unclean file system (0, 0).
Metadata kept in Windows cache, refused to mount.
FAILED
Attempting to correct errors...
Processing $MFT and $MFTMirr...
Reading $MFT... OK
Reading $MFTMirr... OK
Comparing $MFTMirr to $MFT... OK
Processing of $MFT and $MFTMirr completed successfully.
Setting required flags on partition... OK
Going to empty the journal ($LogFile)... OK
Checking the alternate boot sector... OK
NTFS volume version is 3.1.
NTFS partition /dev/sdd1 was processed successfully

The same mount command you see here will now work flawlessly

mount /dev/sdd1 /hds/sgt2tb

I am still unsure what process from the mentioned above is responsible, as this oftentimes pops up on drives that were never system drives, so there is no hibernation file problem

Resume bad blocks where it was stopped

The answer to this should be simple, I initiated the test with

badblocks -nsv /dev/sdb

, first, interrupt bad blocks with ctrl+c, the output should be

Checking for bad blocks in non-destructive read-write mode
From block 0 to 1953514583
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern:   0.92% done, 49:38 elapsed. (0/0/0 errors)
 21.32% done, 18:49:24 elapsed. (0/0/0 errors)

Interrupted at block 416437376

Interrupt caught, cleaning up

Okay, so we know what blocks it was supposed to check (1 through 1953514583), and where it was interrupted (416437376)

So i will ask it to resume testing from where it finished (-1), up to the end

badblocks -nsv /dev/sdb 1953514583 416437375

n = Non destructive
s = Show progress
v = tell us about what you find !

The new run should tell you the percentage correctly, but the time counter will be reset to zero, as it is only counting how long this run has been running for

One thing to note is that bad blocks can be used to instruct the filesystem to avoid the bad blocks, but it also allows the disk’s firmware to substitute bad blocks with spare blocks, so that the disk works again with no intervention from your end !

So for my 2TB hard drive…

416437375 = 21% (13 hours)
619014719 = 31.6% (+23:22)
627995199 = 32.15% (+1:04)
667782398 = 34.18% (+4:46)
715469885 = 36.62% (+5:44)
827834875 = 42.38%

While running the tests, you might want to keep an eye on the hard drive temperature with a command like

hddtemp /dev/sdb

To create a log file of the bad blocks, every run should have it’s own file !

badblocks -nsv -o /root/badblocks3.txt /dev/sdb 1953514583 627995198

The concatenation of those files you are creating is very useful in creating a file system if you ever decide to format the drive later !, but the recommended way is using badblocks with the other disk tools directly

while the test is running, you will see 3 numbers that correspond to readerror/writeerror/corruptionerror

BCACHE – how to setup

About this tutorial

Despite being lengthy, this tutorial is in fact easy and fast, I have split it to parts so that you can get down to business instantly if you need to.

Worth mentioning is that i think this simple procedure presents itself as rocket science, it is not, so advise you to dive in (experimenting on a separate computer first may be a good idea), again i assure you it is VERY STRAIGHT FORWARD, the length is because i am elaborating to make it easy.

Disclaimer

This is an effort to put all the information i need about bcache in one place for my referance and your benefit, but please beware, bcache should be run with backup (You will have to come up with things as raid will render the cache redundant for example and rsync for big files might make your CPU do a lot of work), in any case, i am not responsible and will not be held liable for any damage you may endure.

SSDs are the future

When it comes to SSDs, I would say they have come a long way in terms of price, and one day they will be replacing hard drives, I have no doubt about that, there is no advantage in a hard drive that an SSD can’t eventually match (You might argue that TBs written, maybe, but have you tried to check the reliability of a hard drive stressed to the level needed to achieve those TBs written ?).

What is bcache for

Spinning hard drives are fast beasts when it comes to sequential reads, but when it comes to random reads where the head has to go seek the data, they become very very slow, you can be reading at 200MB/s and suddenly drop to 2MB/s, While SSDs do not suffer this much from random reads, slower than sequential, yes, but nothing close to the gap you see in spinning disks, in a spinning disk, the speed difference can be 100 fold OR MORE.

History (Windows)

The earliest attempt that i can remember was Intel robson (2005), Intel robson or intel turbo memory was a feature in the Core 2 CPUs, but i don’t think it made it up to the Core I, it was not very popular and for a good reason, at the extra cost, OEMs could add more ram, not only would it be better for marketing, it also made more sense, as Windows was already introducing memory cache for disks with windows Vista.

Some time later, microsoft came up with Microsoft ReadyBoost (With windows Vista), readyboost relied on fast pen drives to cache the data from the spinning disk, it was not a very popular feature at the time for many reasons, the drawbacks is that they had to design it to be pulled out without affecting data integrity, making restrictions on the writing speed (Writethrough, can’t writeback), and still it was doing the stuff that RAM did perfectly. not to mention that affordable pen drives were not that fast to begin with.

Caching today.

As it is today, caching still makes sense, I would argue it makes more sense than ever, spinning hard disk drives are still much cheaper than SSDs, A good SSD, A 1 TB SSD from samsung is at around $340 for the EVO, and 460 for the pro (Jul 2017), compare that to the spinning disk, with a price tag averaging $40, and you will know that the difference is still around 10 fold, even more if you go up in size, So what do we do ?

The answer is cache the disk. now is a better time to use caching with super fast SSDs that employ wear leveling and are connected in a more stable and persistant connection (SATA inside the computer).

SSD caching On Windows.

On windows, the answer may be ISR (Intel Smart Responce), I have not tried it myself, but i have heard many good things about it, you get into your bios and set the disks are R.A.I.D, then use the Intel Management Engine software to cahce the spinning disk on the SSD, that simple.

I could almost swear INTEL had a software solution for this that was a bit pricy, but i can’t seem to find it, i remember watching a video about it many years ago.

In any case, I am not very experienced with windows, so I will just leave it here.

SSD disk caching on Linux

On Linux, there are many solutions, the one that i will be showing you how to use right now is bcache, because it is fast, efficient, and works on block devices.

So, I am assuming you have installed debian stretch (9), and you have logged in, and you have networking et al running, now, let us get to installing bcache, mind you, bcache has been part of the linux kernel since jessie or even before, so all you need is bcache-tools, in Jessie, you had to compile those with a few lines, in stretch, there is a package for it.

** BCACHE **

To help avoid the confusion, you can use your big hard disk before attaching an SSD, you can then, whenever you want, attach an SSD to it to start the performance gain.

Installing bcache tools in Debian Jessie (8)

** IF YOU ARE INSTALLING ON JESSIE, BCACHE TOOLS WERE NOT PACKAGED FOR JESSIE**

apt-get install git make gcc pkg-config uuid openssl util-linux uuid-dev libblkid-dev

git clone https://github.com/g2p/bcache-tools.git
cd /usr/src
cd bcache-tools
make
make install

** END OF FOR JESSIE **

Installing bcache tools in Debian Stretch (9)

apt-get install bcache-tools

Planning how to setup the drives

In this article, i will be setting up 2 separate disks that are not system disks, one is a 4TB spinning disk, the other is a 1TB SSD, there are a few rules that you need to be aware of though

1- You can cache as many backing devices as you wish with one SSD
2- You can not cache one backing device with more than one SSD

3- There are memory requirements for bcache, so for example dropping the disks in a 486 computer with 256mb ram and using iscsi is not a good idea .

My setup

The backing device is your large spinning disk, the caching device is the SSD

My backing device is a 4TB hard drive that is connected as /dev/sde
My caching device is a 1TB samsung 850evo (alignment considerations here since it is a tlc disk (the pro is MLC, works like a regular with no alignment issues)), connected as /dev/sdc

Setting up the backing device (sde), mounting and populating it with data

You may want to start with the following command to clear any existing filesystem from the drives (Change SDE with your own drive designation)

wipefs -a /dev/sde

Now, let’s format SDE as backing, and SDC as caching

1- Run parted for backing device

parted /dev/sde
mklabel gpt
mkpart primary ext4 0% 100%

2- Make it a bcache backing partition

Using make-bcache, you will use the -B switch to tell the system that this is the backing device, meaning the spinning disk

make-bcache -B /dev/sde1

output from the above will be something like

UUID:                   19d92bc8-8f49-479a-9480-33ca659b91b2
Set UUID:               0e3f386a-ec62-42b9-b0f3-025a09253946
version:                1
block_size:             1
data_offset:            16

3- Format it as ext4 or whatever filesystem you fancy

mkfs.ext4 /dev/bcache0

4- Mounting it like you would mount any other partition

mount /dev/bcache0 /hds/bcache0

5- If you like, you can now copy your data to it and get things ready before installing the caching device (before attaching the SSD as cache).

as i prefer to copy all the files to the spinning disk before attaching the SSD, since when we copy sequential, the SSD does not cache anyway, but the things it does cache are not the things we will use frequently, So i copy my files to it first, then i attach the SSD.

Setting up the caching device (sdc), then attaching it to the backing device

1- Create a partition on caching device (you chose the size you want to use as cache), but i would recommend that if you want to use the whole disk that you leave 10% unpartitioned for over-provisioning.

wipefs -a /dev/sdc

parted /dev/sdc
mklabel gpt
mkpart primary ext4 0% 90%

Using make-bcache, you will use the -C switch to tell the system that this is the caching device, meaning the solid state disk (SSD)

make-bcache -C /dev/sdc1

output from the above will be something like

UUID: eeda3570-eb1b-4983-8c53-76322a654585
Set UUID: 92dbf6ca-0f0b-44d5-b70e-8f1e7919838d
version: 0
nbuckets: 1716964
block_size: 1
bucket_size: 1024
nr_in_set: 1
nr_this_dev: 0
first_bucket: 1

Now, even if this is not for a technical purpose, just to give you the feel of this, try running the command below, the command should result in “no cache” because we did not attach a cache to it yet

cat /sys/block/bcache0/bcache/state

DO NOT Format the caching partition as ext4

this time, we won’t be formatting it in ext4 like the backing device above (think about it, the OS should see the backing device, and at some abstraction layer not even know about this one, so why would it have a file system other than the one that bcache itself understands), we will simply be attaching it to the disk.

Attaching the caching device

If you take a look at the result from make-bcache -C command, you will notice a Set UUID, we will need this unique ID to tell bcache what SSD to connect to what cache, the only cache we have so far is bcache0 as you can see from above, here is how we attach it.

echo 92dbf6ca-0f0b-44d5-b70e-8f1e7919838d > /sys/block/bcache0/bcache/attach

Now, if we run the command above again

cat /sys/block/bcache0/bcache/state

It should read “Clean” or “Dirty” instead of “no cache” (I would bet it reads clean at this stage), Depending on whether something has been written to it and still not in the backing device, or clean otherwise.

Setup all done, unless you want to fine tune it for your purpose, then read on.

Tuning the cache.

1- Caching mode

to inspect what caching mode we are using now

cat /sys/block/bcache0/bcache/cache_mode

Which will probably result in

[writethrough] writeback writearound none

By default, the system uses writethrough (better data integrity), but if you are like me, and have made 100% sure the electric won’t ever go down, or if you backup the data in real time, you might want to switch to writeback, writeback gives much faster write operations which is not necessarily a requierment for all applications.

echo writeback > /sys/block/bcache0/bcache/cache_mode

2- sequential read cutoff

The other thing you might wish to tune is the size of the sequential read/write cutoff, we want a size short enough to be worth caching, by default, it is 4MB, so that everything under 4MB sequential will be cached, I personally like to take that down to 1MB judging by the fact that files larger than 1MB do read pretty fast directly from the disk ! but surely, this will depend on your application and on experimentation with your application.

cache 1 megabyte and smaller

echo 1M > /sys/block/bcache0/bcache/sequential_cutoff

cache everything (special value, not the same mathematical logic of less than)

echo 0 > /sys/block/bcache0/bcache/sequential_cutoff

back to caching 4 mega bytes and smaller (default)

echo 4M > /sys/block/bcache0/bcache/sequential_cutoff

3- Percentage of dirty data to allow on SSD.

I personally like it the way it is (10% of the SSD’s size), but you can change that, and sometimes you have to temporarily change that for certain purposes)

Flush all dirty data to disk as soon as you can

echo 0 > /sys/block/bcache0/bcache/writeback_percent

Allow 10% dirty data

echo 10 > /sys/block/bcache0/bcache/writeback_percent

the first (Value 0) is very usefull when you want to disconnect the cache, to disconnect you want the dirty_data to be 0 on the SSD, so you can start by issuing the first line above, then as soon as all the data is flushed to the backing device, you can disconnect the SSD like i will be showing you further down.

Manipulating the setup

Sometimes, you want to change your SSD with a larger or smaller or newer one, other times, you want to disconnect it and use the backing device without a cahce, other times, you want to use the same caching device to cache more disks, here i will show you how

Assuming you want to disconnect the SSD, for this to happen, you will need to go through a couple of steps, first, make sure there is no dirty data, and second, detach it from the backing device

For the first step, we should inform bcache that we don’t want any dirty data, by default, bcache allows for 10% of the size of the SSD to be dirty data, we need to make that ZERO percent

echo 0 > /sys/block/bcache0/bcache/writeback_percent

remember, if you reattach or otherwise, you should set it back to ten percent in the same way

echo 10 > /sys/block/bcache0/bcache/writeback_percent

Monitoring cache and cache performance

1- How much dirty data is on the SSD, Assuming that “/sys/block/bcache0/bcache/state” reads dirty, you can see how much data is dirty with the command.

cat /sys/block/bcache0/bcache/dirty_data

2- Caching statistics

tail /sys/block/bcache0/bcache/stats_total/*

Force mount hibernated NTFS volume

This problem is one i face often, because of how older versions functioned, the answers online no longer apply, online, you will find that

ntfsfix /dev/sdc2

should do the trick, in reality, it will not as you will see the following error

Mounting volume... OK
Processing of $MFT and $MFTMirr completed successfully.
Checking the alternate boot sector... OK
NTFS volume version is 3.1.
NTFS partition /dev/sdc1 was processed successfully.

The solution in reality is asking ntfs-3g’s mount to remove the hiberfile

WHAT YOU NEED – YOU WILL LOSE THE HIBERFILE

mount -t ntfs-3g -o remove_hiberfile /dev/sdc2 /hds/intelssd

Without the remove_hiberfile instruction, you will probably get an error message such as

Windows is hibernated, refused to mount.
Failed to mount '/dev/sdc2': Operation not permitted
The NTFS partition is in an unsafe state. Please resume and shutdown
Windows fully (no hibernation or fast restarting), or mount the volume
read-only with the 'ro' mount option.

Where you can actually mount it as read only if you do not want to write to it with the line

 mount -o ro /dev/sdc1 /hds/intelssd