Over provisioning SSD in linux

Over provisioning a Samsung 1TB 850 EVO

Mind you, Don’t follow this tutorial step by step unless you have a 1TB Samsung 850 EVO, if you have a smaller disk, you need to adapt the numbers to your SSD 😉

Over provisioning a flash disk is simply some un-partitioned space a the end of the disk, but you need to tell the SSD’s controller about that free space that it can use to do it’s housekeeping, You also need to find out if the Tejun Heo’s on-demand HPA unlocking patch applies to your distro, if it does, you need to get kernel patching first.

First of all, the controller will usually use the cache RAM to so the over provisioning, or at least this is what i understood from some text on the Samsung website, you can make things faster by allowing it to use FLASH space while it deletes a 1.5MB flash area to put the data in.

1- How big should the over provisioning area be ?

Samsung recommends 10% of the disk’s space. Somewhere hidden in a PDF on their websites, they explain that OP space should be anywhere between 7% and 50% ! we will use 10 as our writing patterns are not that harsh. but mind you, a database that alters a few rows every second can probably make the most use of such OP space.

2- Won’t that 10% wear out before the rest ?

No, there is a mapping function inside the controller where that space is in fact wherever the controller thinks is appropriate, the wear leveling algorithm kicks in at a stage after the logical stage of partitions etc… it is blind to the file system or the over provisioning area, it will simply remap any address you give it to a random address that is not already mapped, at flash erase, those mappings are deleted, and other areas of the disk will be assigned to that area, i have no idea whether it uses a random algorithm, or simply has a record of flash chip usage (At the size of the sample, that won’t make any difference.)

3- Are you sure we are informing the controller and not just telling Linux what the last address is ?

Sure I’m sure, ask the controller DIRECTLY yourself with the command

smartctl -i /dev/sdb

Before the operation we are doing in this article, it will say 1000204886016, and after it will say

User Capacity:    900,184,411,136 bytes [900 GB]

Meaning that now, the disk’s S.M.A.R.T. attribute tells us that this much is available for the user after the over provisioning operation

So, how do we over provision in linux

See the last secrot of your ssd,

hdparm -N /dev/sdb

In my case, my samsung 850 EVO has the following, notice that the number is repeated twice. x out of x is the same, and HPA is disabled..

max sectors = 1953525168/1953525168, HPA is disabled

Now, 1953525168 * 512 = 1,000,204,886,016 (1 TB !)

Now, we want to set a maximum address, anything after this address is a PROTECTED AREA, that the controller knows about, I will multiply the number above with 0.9 to get the maximum address, take the integer part alone

hdparm -Np1758172678 --yes-i-know-what-i-am-doing /dev/sdb (As hdparm -Np1758172678 /dev/sdb will ask you if you know what you are doing)

 setting max visible sectors to 1758172678 (permanent)
 max sectors   = 1758172678/1953525168, HPA is enabled

Now again, hdparm -N /dev/sdb

max sectors = 1758172678/1953525168, HPA is enabled

Now, to make sure we are not suffering that dreaded bug, let’s reboot the system and check again after that, I am using debian Jessie, so it is unlikely that i am affected

Yup, hdparm -N /dev/sdb still gives us a smaller maximum address than the actual physical

Now, we seem to be ready to talk fdisk business.

fdisk /dev/sdb

Now, if you O (Clean), then P, you should get a line such as

Disk /dev/sdb: 838.4 GiB, 900184411136 bytes, 1758172678 sectors

This means that FDISK understands.and asking it to create (the n command) will yeild this

/dev/sdb1 2048 1758172677 1758170630 838.4G 83 Linux

Arent we happy people.

Now, lets mount with trim support, and enjoy all the beutiful abilities an SSD will bless us with.

tune2fs -o journal_data_writeback /dev/sdb1
tune2fs -O ^has_journal /dev/sdb1

NOTE: in the event that you are presented with an error such as the following

/dev/sde:
 setting max visible sectors to 850182933 (permanent)
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0a 04 51 40 01 21 04 00 00 a0 14 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 max sectors   = 1000215216/1000215216, HPA is disabled

The most likely cause is the SATA controller (Try executing the hdparm -Np command using a different SATA controller), another possible cause of errors is that some disks require being trimmed before this action

Alliance ProMotion 6410

One little problem about modern VGA cards is HEAT, they consume over 30W on IDLE, those 30 watts are going into the case, so i looked into my old computers, and found a computer that dates back to 1995-1996, I pulled out the VGA card from it, and installed it on a modern I3 computer for testing pending the installation on an I7 with 64GB of ram and what have you.

On ebay, you can find such PCI cards for around $10, Cirrus Logic, SIS, ATI, OR S3, they should all work, if the promotion card works, those should work too.

Now i ran the Debian Jessie installer, the installation went fine, when rebooting, the system boots with the PCI card, but then switches to the embedded graphics system (Comes with the I3 CPU), the BIOS does not allow me to disable that, so, rather than looking for a solution, I will test the adapter on an I7 (Does not come with built in VGA).

I have a good feeling that it will work right away, here is some information about my 20 year old graphics card (Will post some photos too when i plug it out)

    Made by: Alliance
    Codename: ProMotion 6410
    Bus: PCI
    Memory Size: 1MB
    Max Memory Size: 4MB
    Memory Type: FPM
    Year: 1995
    Card Type: VGA
    Made in: USA
    Owned by: Palcal
    Outputs: 15 pin D-sub
    Power consumption (W): 1.5
    Video Acceleration: MPEG-1 (VCD)
    Core: 64bit
    Memory Bandwidth (MB/s): 213
    Sold by: miro
    Press info: Freelibrary

You can find

Recovering deleted files from ext4 partition

Update, although extundelete restrored most of my files, some files could only be restored with no file name through an application that is installed with testdisk called photorec

So, what happened was that i added a directory to eclipse, a message appears, i hit enter accidentally, all the files in the web directory are lost, no backup, years of programming…

Instantly, i shut down the computer so that i do not overwrite the disk space with new files and logs and the like, i got a larger disk (1.5tb) and did the DD first (i recommend gddrescue in place of DD just in case your disk has bad sectors).

Installed Linux (Debian 7) on the new disk, installed the hard drive from the other computer to the new PC, then installed the software i always use to recover files, testdisk, test disk did not work as expected, on both disks, when it comes to the ext4 partition, the process that ended in an error would be as follows

testdisk
create log file
Chose the 1TB disk (the one with the deleted files
Partition type (INTEL)
Advanced
Chose the main partition *(ext4) and chose List (left and right arrow keys)
Damn, the error.

TestDisk 6.13, Data Recovery Utility, November 2011
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org
 1 * Linux                    0  32 33 119515  60 33 1920010240
Can't open filesystem. Filesystem seems damaged.

So, i quit TestDisk and installed
apt-get install extundelete

extundelete /dev/sdb1 --restore-directory /var/www

This way, i only restore the files from that direcotry, if you want all the deleted files, you could surely use something like

extundelete /dev/sda4 --restore-all

Anyway, my files are back, a few are missing, but i am sure i can deal with that

Intel processor Lithography explained

In short, it is the average space between the processor’s logic gates (transistors).

It makes all the difference in speed, and a considerable difference in power consumption.

For example, i ran a certain task on both of the following processors

E3300 which is a low cost celeron processor with a lithography of 45nm and (1M Cache, 2.50 GHz, 800 MHz FSB)
Q6600 Which is a much more expensive (at the time when both were purchased) with a lithography of 65nm and (8M Cache, 2.40 GHz, 1066 MHz FSB)

When comparing a single core’s throughput, the cheap celeron processor beat the quad core by a very considerable number, much higher than the difference in clock speed, The actual numbers would need me to explain many factors such as the nature of the millions of records that needed processing, how they were processed, how jobs were distributed between computers, how the random sample is guaranteed to be random and so on, and i don’t think this is very relevant to you.

So, lithography is something you should really consider when buying a processor, the lower the better, my laptop’s I7 is built with a lithography of 22nm, this is the best number as of 2013.

Disk spindown in linux, specifeying spindown idle time

Disk Spin down (Tested with Bullseye 2022)

Even though everything concerning block devices in linux has shifted to unique identifiers, hdparm has not, and will still use the old /dev/sdx system

To control disk spindown, and to manually issue commands, you will need to have the package installed

apt-get install hdparm

There is a probelm with disk spindown via hdparm, the problem is that you must address a disk as /dev/sdc , which changes in the case of USB media and other disks, even when you add slaves,

hdparm -Y /dev/sdb will spin a disk down instantly
hdparm -S 240 /dev/sdb will set this disk to sleep when idle for 20 minutes (5 second units here)

or adding at the bottom of the file /etc/hdparm.conf a section such as

/dev/sdc {
spindown_time = 240
}

to make those changes persistent across reboots.

To check the status of a disk, here is what you do

hdparm -C /dev/sde

You could get one of the following results
When spun down…
drive state is: standby
When active
drive state is: active/idle

Don’t make your disks spin-down too often, 20 minutes is good for me almost in all circumstances.

If the disks don’t spin down, chances are that selftest is enabled…

Check if it is enabled with

smartctl -a /dev/sdb
if it reads
Auto Offline Data Collection: Enabled.
then you need to disable it with
smartctl --offlineauto=off /dev/sdb

then wait for them to finish (if a test is running) then spin down.

Can i mount a disk image created with dd , ddrescue , or dd_rescue on Windows ?

The lowdown: Yes you can, try the free OSFMount.

How i found out about it ? a friend sent me his laptop to un-dlete files for him, i didn’t have time to see how i can un-delete under windows, so (with his permission) i mounted his laptop hard drive on my computer (Linux), then DDd the whole drive to a 250GB image file, put the hard drive back where it was (in the laptop), and sent it back to him so that he can continue using it, once i found the time, i simply copied the image to a Windows computer, mounted it with OSFMount, then un-deleted everything with Recuva (the best un-delete software in my opinion), put his files on an external hard drive and sent it his way.

Images created with dd , ddrescue , or dd_rescue are not formatted, they are the direct copy of a whole disk, including boot records, partition tables, and file system, so mounting such images should not be hard at all, and indeed, turns out there is a program that can mount them under windows (i would not be surprised if it turns out there are hundreds that do that), but for now, this seems to be a champ, and it seems to be free.

Yet, this program seems to be more than a mounting tool for direct disk images, it also mounts CD images (i guess the one i currently use (virtual clone drive) is obsolete, creation of RAM disks, and can open a big bunch of other image formats (nrg, SDI, AFF, AFM, AFD, VMDK, E01, S01).

So there you are, all you need for your disk mounting needs in 1 program 😀

Cheers

DD_RESCUE ( GDDRESCUE’s ddrescue ) for disks with Advanced Format ( AF ) 4KiB sectors 4096 byte

1- Before using dd, ddrescue, or dd_rescue, you need to know which disk is which, you can do that by simply using the command “fdisk -l” in my case, the old disk turned out to be /dev/sdb and the new un-partitioned disk is /dev/sdc.

So, i have been cloning a 2TB hard drive ( WD20EARS ) to a WD20EARX, same disk, but with a few differences

WD20EARS is sata 2 and the other is sata 3, another difference is that using “hdparm -I /dev/sdb” the older WD20EARS reports (And should not be true)

WD20EARS

Logical/Physical Sector size:           512 bytes

wile with “hdparm -I /dev/sdc” the newer WD20EARX reports

        Logical  Sector size:                   512 bytes
        Physical Sector size:                  4096 bytes
        Logical Sector-0 offset:                  0 bytes

The first clone did not work for a reason unknown to me, i cloned my NTFS disk with ddrescue (gddrescue) on a linux (because i don’t know how to clone on windows) and then plugged it into windows, where it simply did not work, and in disk management reported the disk as un-partitioned space, so now i want to do the thing again, but i don’t want that slow performance, so i increased block size to 4KiB. (UPDATE: THE NEW COPY WITH 4KiB DID WORK BUT I DONT KNOW IF THE 4KiB SIZE IS RELEVANT, MAYBE YOU SHOULD TAKE A LOOK AT THE SECOND DIFFERENCE BETWEEN THE DISKS UP AT THE BEGINNING OF THE POST)

For now, i will try the cloning with the command (Only change the block level for advanced format hard drives)

Note, block size no longer works, and it is called sector-size, but the short letter for it -b is still the same, so we will change this to the line below it
ddrescue --block-size=4KiB /dev/sdb /dev/sdc rescue2.log
ddrescue -b=4KiB /dev/sdb /dev/sdc rescue2.log

And if all of your data is important, you can ask ddrescue to retry every bad block 3 times (or as many times as you wish) with the -r command

ddrescue --block-size=4KiB -r3 /dev/sdb /dev/sdc rescue2.log
ddrescue -b=4KiB -r3 /dev/sdb /dev/sdc rescue2.log

And what do you know, the disk now works on my WINDOWS machine 😀 no errors and no nothing, great, so now to some details about the copy

The result up to now is that i am reading at a maximum of 129MB while the average (in the first 60 GBs is 93018 kB/s), if this continues, i will be done in less than 6 hours.

The part that does not make any sense to me is that western digital states clearly in the specs that the maximum (Host to/from drive (sustained)) is 110 MB/s for both drives, it must be that i need to wait a bit more and see what that actually means.

rescued:         0 B,  errsize:       0 B,  errors:       0
Current status
rescued:    74787 MB,  errsize:       0 B,  current rate:     119 MB/s
   ipos:    74787 MB,   errors:       0,    average rate:   93018 kB/s
   opos:    74787 MB,     time from last successful read:       0 s
Copying non-tried blocks...

Now, once done, you can have the OS reload the partition table without having to restart, you can simply use the command partprobe

partprobe
or
partprobe /dev/sdc

To use partprobe, you need to install parted

apt-get install parted

If it were a linux drive, an advanced format drive would not have it’s first sector on sector 63 but rather on sector 2048, which is at exactly 2KiB, it could (but usually does not) start at any other value divisible by 8.

Windows probably does something similar for our AF Disk, so asking parted about our ntfs disk, this is what parted says

Model: ATA WDC WD20EARS-00M (scsi)
Disk /dev/sdb: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  2000GB  2000GB  primary  ntfs

1049kB is 1074176 bytes, Which when divided by 8 is 134272 (divisible by 8).

NOTES:
-There is a tool specifically for cloning ntfs volumes called ntfsclone, i am not sure what extra features it provides that are specific to ntfs, i have never used it before, with my disk that has bad blocks, i can only rely on gddrescue.
-A block is 512 on regular drives, and 4096 on newer ones, if you want to backup the hard drive’s geometry, you can do one of the following
Backup the first 63 blocks (MBR + Bootloader). on a “non advanced format” drive

dd if=/dev/sda of=/mnt/storage/sda.vbr bs=512 count=63

On an advanced format drive, we can try

dd if=/dev/sda of=/mnt/storage/sda.vbr bs=4096 count=63

Which, will make us read 258048 bytes rather than the traditional 32256 bytes (around 250K rather than 32K)

Rescuing a failed hard drive

This article is work in progress, i have started the ddrescue and waiting for it to finish before i go on with this post.

One thing you should notice that this is the GNU DDRESCUE, from the package gddrescue, not the old script dd_rescue that is a wrapper around the dd program.

It has been some time since i found out that dd_rescue has been replaced by the newer rewritten ddrescue from the gddrescue package, to be more specific, since i posted this back in march 2011.

So, now i have yet another disk that is busted, with 3 partitions, but not like that one, this one simply has so many bad sectors, it’s a 2TB western digital caviar black that is causing me trouble

So i couldn’t find a 2TB caviar black, so to be on the safe side, i got a 3TB western digital green, partitioned it and formatted it like i describe here.

So, now that i have a hard drive that needs rescuing, lets revise what we need to do

1- Install the new ddrescue tool gddrescue
apt-get install gddrescue
2- Run ddrescue, make sure to use a file to resume in case we get interrupted (sometimes saves days of rescue can be lost and need to be done again, if they have not been damaged with disk deterioration that is)

ddrescue /dev/sdb /hds/3tb/2tb.img /root/resumelog.log

If you lose power, or get interrupted, or need to restart your computer, ddrescue will resume ONLY if you use the same exact line above once again, it will then use the log file to append to the existing output file.

Now, we have an image, we can now mount that image and take a look, so we mount the image on a loop

You could have partitions on the original disk, in my case i had 3 EXT2 partitions, the data i need is on the third partition

So i enter parted (debian package), and did the following

Using /hds/3tb/2tb.img
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) unit
Unit?  [compact]? B
(parted) print
Model:  (file)
Disk /hds/3tb/2tb.img: 2000398934016B
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start          End             Size            Type     File system  Flags
 1      32256B         400077619199B   400077586944B   primary  ext2
 2      400077619200B  700144058879B   300066439680B   primary  ext2
 3      700144058880B  2000396321279B  1300252262400B  primary  ext2

So now i know that the partition i want (third) starts at 700144058880, that’s all i need to know to mount this as a loop device
I am mounting loop device simply because i want to run fsck (disk check) on the partition before actually mounting that partition.

First, what is the next free / available loop device number ?

losetup -f

this should produce output such as

/dev/loop1

So now we know that we need to mount this on loop1 since it is the next available (not in use) loop device.

losetup /dev/loop1 /hds/3tb/2tb.img -o 700144058880

So, now loop 1 has my partition, you should omit the -y if you want to manually agree to every repair fsck wants to make

fsck.ext2 -y /dev/loop1

Great, now we should have a clean file system to mount
Even though we can mount the attached loop directly, i will demonstrate how to mount the loop and how to mount the image on a loop directly. i am doing this so that this tutorial would have the complete command referance of what tools and command parameters you might need.

First methode, the detach / release the loop device then mount it again in one go, this is done as follows
(-d means detach)

losetup -d /dev/loop1

Then, we attach with the foolowing command, notice how we used the starting ofset exactly like we did when attaching to a loop device.

Your mount command here

The other way is simply to mount our already attached loop device as follows

mount -t ext2 /dev/loop1 /hds/img

Now, we can mount this partition

mount -o loop,offset=700144058880 harddrive.img /hds/img

Or if you like, you can mount it read only

mount -o ro,loop,offset=700144058880 harddrive.img /hds/img
mount | grep /hds/3tb/2tb.img
/hds/3tb/2tb.img on /hds/img type ext2 (ro,loop=/dev/loop1,offset=700144058880)

Installing my 3TB hard drive on Debian linux step by step

It is simple, here is what you need to know

You can format it EXT4, but ext2 and ext3 are also OK ! ext2 and ext3 allow up to 16TB disks, and file sizes of up to 2TB, ext4 allows much more.

Any linux kernel newer than 2.6.31 should work just fine with “Advanced format” drives using the exact same steps in this article.

MBR only supports 2TB drives, you need GPT, so let us get started

1- apt-get update
2- apt get install parted
3- parted /dev/sdc
4- mklabel gpt
5- Answer yes to: Warning: The existing disk label on /dev/sdb will be destroyed and all data on this disk will be lost. Do you want to continue?
Yes/No? yes
6- mkpart primary ext4 0% 100% (to make a partition as big as the disk (will occupy starting from first megabyte (for alignment) to the end of disk))
7- quit

FYI, if you want multiple partitions, here are the 2 lines that should replace step 6
6- mkpart primary ext4 0% 40%
6- mkpart primary ext4 40% 100%

and remember to format both (sdc1 and sdc2) when you are done with parted

Now to formatting the drive

mkfs.ext4 /dev/sdc1

Before mounting it, i like ext4, but i don’t want a journaling OS on this drive that is not the system drive, so i will need do a few things to the drive first

Lazy writeback

tune2fs -o journal_data_writeback /dev/sdc1

No Journaling

tune2fs -O ^has_journal /dev/sdc1

Now to check what we have

dumpe2fs /dev/sdc1 |grep 'Filesystem features'


Or maybe if you want the whole thing on the screen

dumpe2fs /dev/sdc1 |more

if has_journal option exist when executing the first – you have journal on the file system

And there we are, Now we need to mount it at boot time by adding it to fstab, to do that, we will need the disk’s unique ID !

8- Now executing the following command will give you the unique ID of this new partition for use with fstab (The disk list we will edit below in step 10)
blkid /dev/sdc1
9- create the directory where you want to mount your hard disk, for example
mkdir /hds
mkdir /hds/3tb
10- Now, we add the following line to fstab, notice that noatime increases performance, but some applications might need or rely on it. postfix does not and i have verified that.

UUID=b7a491b1-a690-468f-882f-fbb4ac0a3b53       /hds/3tb            ext4     defaults,noatime                0       1

defaults and noatime are but only a couple of options, here are more options that you can add
nofail = If the disk is not present, continue booting
nobootwait = Limit the amount of time you plan to wait
noauto = Don’t mount it until I issue a “mount /dev/sdb1”, or mount “/hds/thisdisk” command

11- Now execute
mount -a

You are done,. if you execute
df -h
You should see your 2+TB hard drive in there !

To make sure the drive is aligned correctly, i like to write a file on it and see how fast that goes… so let us use a 2GB file

dd if=/dev/zero of=/hds/WD2000_3/deleteme.img bs=1M count=2000

Outcome came out (for a western digital black 2TB)
First run: 2097152000 bytes (2.1 GB) copied, 5.94739 s, 353 MB/s
Consecutive runs: 2097152000 bytes (2.1 GB) copied, 11.1405 s, 188 MB/s
Outcome came out for a western digital green 3TB
First run: 2097152000 bytes (2.1 GB) copied, 8.32337 s, 252 MB/s
Consecutive runs: 2097152000 bytes (2.1 GB) copied, 14.376 s, 146 MB/s

the consecutive runs give close results, what i printed here is the average

FAQ of hard disk errors and data retrieval

Section 1: My hard drive has bad sectors / Blocks / area

Do i need to change it ?
Not necessarily, but If it is in warranty, and they allow you to replace it, a new one is not a bad idea, otherwise read on

it all depends on whether the bad sectors are expanding or not, if they are not, they are probably caused by shock to the hard drive, usually, it is enough to mark them as bad using “chkdsk /r” on windows and leave the drive working.

To find out if your bad sectors are Spreading or not spreading, do a “chkdsk /r” four times, make sure the same number appears in the second and third and fourth time (Forget the first time), then, if the second is different but the third and fourth are the same, then do the test 2 more times, and make sure you get the same number of bad sectors for trials 3, 4, 5, 6, if so, your bad sectors are not spreading.

You did not mention backup in the answer before, do we need to backup ?
People would typically ask you to backup just in case, i say you should always have backup of your most important files, non spreading sectors of the hard drive, in my humble experience do not contribute negatively to reliability, so my answer is, backup should be done regardless

How do i know how many bad sectors are marked on an NTFS hard drive ?
There is a tool called nfi.exe that comes with a bundle Microsoft makes available here http://support.microsoft.com/kb/253066/en-us this tool is part of (OEM Support Tools), it can tell you everything about a disk formatted in NTFS