Calculating SSD Wearout
SSDs have a predetermined wearout point at which they enter a failed state. This is because SSDs use nand flash which eventually loses its ability to accurate respond to IO requests. Determining the status of your SSD is vital, as knowing the state of its wearout can help you schedule much needed replacements.
First, you need to know the drive's rated TBW - terabytes or total bytes written - as this is what we'll be calculating against. This represents what the drive is rated to be able to handle. This can be determined usually through spec sheets or by using this useful database. It's not exhaustive, but it's usually good enough. For this example, our drive is rated for 400TB TBW.
Next, you need to know your drive's Total_LBAs_Written - a measurement of how many sectors have been written or modified on the drive. This can be determined by using the following:
root@example-pve:~# smartctl --all /dev/sdX --- snippet --- 206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 1 246 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 384578442888 247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 12018275091 248 FTL_Program_Page_Count 0x0032 100 100 000 Old_age Always - 84402429335 180 Unused_Reserve_NAND_Blk 0x0033 000 000 000 Pre-fail Always - 9380 210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 4
The exact values available on your device differ from model to model. Some even have an explicit field for expected lifetime remaining, but not all do, so this method may be the best option.
Then, we multiply this value (on the right hand side, the raw bytes value) by the Sector Size of the drive. This can also be gotten from this command:
Sector Size: 512 bytes logical/physical
Now we can get the actual terabyte amount using this formula:
(total_lbas_written * sector_size) / (1024^4) (384578442888 * 512) / (1024^4) ≈ 179TB
Now, we know that on this drive, 179TB has been written. Remember how our drive is rated for about 400TB? That means that our drive currently has (179/400) ≈ .45 ≈ 45% wearout.
Proxmox uses a slightly more sophisticated internal algorithm, and may use a self-discovered max TBW rating indicator, which could explain discrepancies within a few percent. Case in point, Proxmox tells us that this drive actually has 49% wearout. But hey, that's really close! Close enough to schedule on.