Calculating SSD Wearout

From RoseWiki
Jump to navigation Jump to search

SSDs have a predetermined wearout point at which they enter a failed state. This is because SSDs use nand flash which eventually loses its ability to accurate respond to IO requests. Determining the status of your SSD is vital, as knowing the state of its wearout can help you schedule much needed replacements.

First, you need to know the drive's rated TBW - terabytes or total bytes written - as this is what we'll be calculating against. This represents what the drive is rated to be able to handle. This can be determined usually through spec sheets or by using this useful database. It's not exhaustive, but it's usually good enough. For this example, our drive is rated for 400TB TBW.

Next, you need to know your drive's Total_LBAs_Written - a measurement of how many sectors have been written or modified on the drive. This can be determined by using the following:

root@example-pve:~# smartctl --all /dev/sdX
--- snippet ---
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       1
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       384578442888
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       12018275091
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       84402429335
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       9380
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       4

The exact values available on your device differ from model to model. Some even have an explicit field for expected lifetime remaining, but not all do, so this method may be the best option.

Then, we multiply this value (on the right hand side, the raw bytes value) by the Sector Size of the drive. This can also be gotten from this command:

Sector Size:      512 bytes logical/physical

Now we can get the actual terabyte amount using this formula:

(total_lbas_written * sector_size) / (1024^4)
(384578442888 * 512) / (1024^4)
≈ 179TB

Now, we know that on this drive, 179TB has been written. Remember how our drive is rated for about 400TB? That means that our drive currently has (179/400) ≈ .45 ≈ 45% wearout.

Proxmox uses a slightly more sophisticated internal algorithm, and may use a self-discovered max TBW rating indicator, which could explain discrepancies within a few percent. Case in point, Proxmox tells us that this drive actually has 49% wearout. But hey, that's really close! Close enough to schedule on.