ZFS Failed Disk Replacement: Difference between revisions

From RoseWiki
Jump to navigation Jump to search
m 1 revision imported
m Fix UEFI boot with proxmox-boot-tool
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
root@Chn-pve01:~# cat ZFS_Replace-Boot-Disk.txt
==Copy partitions from good disk sda to blank disk sdb==
==Copy partitions from good disk sda to blank disk sdb==
<b>sgdisk -R /dev/sdb /dev/sda</b> # sgdisk -R /dev/sdb<Replicate to this disk>  /dev/sda<From this disk><br>
sgdisk /dev/sda -R /dev/sdb # sgdisk  /dev/sda<From this disk> -R /dev/sdb<Replicate to this disk><br>sgdisk -G /dev/sdb # randomize the GUID on the new disk since it was copied from the other drive.
<b>sgdisk -G /dev/sdb</b> # randomize the GUID on the new disk since it was copied from the other drive.<br>


==Using Parted to verify the partition table of /dev/sdl==
==Using Parted to verify the partition table of /dev/sdb==
<b>(parted) select /dev/sdl</b><br>
(parted) select /dev/sdb<br>Using /dev/sdb
Using /dev/sdl<br>
<br>(parted) p<br>   Model: ATA WDC WD2000FYYZ-0 (scsi)<br>   Disk /dev/sdb: 2000398934016B<br>   Sector size (logical/physical): 512B/512B<br>   Partition Table: gpt<br>   Disk Flags:<br>   Number Start End Size File system Name Flags
<b>(parted) p</b><br>
    1 1048576B 2097151B 1048576B Grub-Boot-Partition bios_grub
:Model: ATA WDC WD2000FYYZ-0 (scsi)<br>
    2 2097152B 136314879B 134217728B fat32 EFI-System-Partition boot, esp
:Disk /dev/sdl: 2000398934016B<br>
    3 136314880B 2000397885439B 2000261570560B zfs PVE-ZFS-Partition
:Sector size (logical/physical): 512B/512B<br>
:Partition Table: gpt<br>
(Ok partitions copied)
:Disk Flags:<br>
<br>
:Number Start End Size File system Name Flags<br>
:1 1048576B 2097151B 1048576B Grub-Boot-Partition bios_grub<br>
:2 2097152B 136314879B 134217728B fat32 EFI-System-Partition boot, esp<br>
:3 136314880B 2000397885439B 2000261570560B zfs PVE-ZFS-Partition<br>


(Ok partitions copied)
==Copy data from /dev/sda1 to /dev/sdb1==
 
dd if=/dev/sda1 of=/dev/sdb1 bs=512 #This is the bios boot partition   
==Copy data from /dev/sda1 to /dev/sdb1 and /dev/sda2 to /dev/sdb2==
root@folkvang:~# dd if=/dev/sda1 of=/dev/sdb1 bs=512
<b>dd if=/dev/sda1 of=/dev/sdb1 bs=512</b> #This is the bios boot partition  <br>
  2014+0 records in   
root@folkvang:~# <b>dd if=/dev/sdk1 of=/dev/sdl1 bs=512</b><br>  
2014+0 records out   
2014+0 records in  <br>
1031168 bytes (1.0 MB) copied, 0.10164 s, 10.1 MB/s   
2014+0 records out  <br>
1031168 bytes (1.0 MB) copied, 0.10164 s, 10.1 MB/s  <br>


==Replace the failed partition in the zpool==
==Replace the failed partition in the zpool==
Find the ID of the failed block device
Find the ID of the failed block device
root@folkvang:~# zpool status
pool: rpool
    state: DEGRADED
    status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.
    action: Replace the device using 'zpool replace'.
    see: http://zfsonlinux.org/msg/ZFS-8000-4J
    scan: scrub repaired 0 in 0h25m with 0 errors on Sun May  8 11:20:27 2016
    config
    NAME                    STATE    READ WRITE CKSUM
    rpool                  DEGRADED    0    0    0
      mirror-0              DEGRADED    0    0    0
        993077023721924477  FAULTED      0    0    0  was /dev/sdk2
        sdk2                ONLINE      0    0    0
    errors: No known data errors


: root@folkvang:~# <b>zpool status</b>
::   pool: rpool
::   state: DEGRADED
::    status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.
::      action: Replace the device using 'zpool replace'.
::   see: http://zfsonlinux.org/msg/ZFS-8000-4J
::   scan: scrub repaired 0 in 0h25m with 0 errors on Sun May  8 11:20:27 2016
::   config:
:: NAME                    STATE    READ WRITE CKSUM
:: rpool                  DEGRADED    0    0    0
::   mirror-0              DEGRADED    0    0    0
::     993077023721924477  FAULTED      0    0    0  was /dev/sdk2
::     sdk2                ONLINE      0    0    0
::   errors: No known data errors
==Call zpool to replace the failed device==
==Call zpool to replace the failed device==
root@folkvang:~# zpool replace -f rpool 993077023721924477 /dev/sdl2
Make sure to wait until resilver is done before rebooting.
root@folkvang:~# zpool statuspool: rpool<br>    state: DEGRADED<br>    status: One or more devices is currently being resilvered.  The pool will continue to function, possibly in a degraded state.<br>    action: Wait for the resilver to complete.<br>    scan: resilver in progress since Fri Sep  2 16:45:53 2016<br>    13.2M scanned out of 8.83G at 902K/s, 2h50m to go<br>    12.9M resilvered, 0.15% done<br>    config:<br>    NAME                      STATE    READ WRITE CKSUM<br>    rpool                    DEGRADED    0    0    0<br>      mirror-0                DEGRADED    0    0    0<br>        replacing-0          UNAVAIL      0    0    0<br>          993077023721924477  FAULTED      0    0    0  was /dev/sdk2<br>          sdl2                ONLINE      0    0    0  (resilvering)<br>        sdk2                  ONLINE      0    0    0<br>    errors: No known data errors
== After fixing the drive, we need to ensure that the boot sectors are configured. ==
proxmox-boot-tool format /dev/sdb2
proxmox-boot-tool init /dev/sdb2
proxmox-boot-tool refresh
proxmox-boot-tool status
proxmox-boot-tool clean


: root@folkvang:~# <b>zpool replace -f rpool 993077023721924477 /dev/sdl2</b><br>
<b>grub-install /dev/sdk<br>grub-install /dev/sdl<br>update-grub</b>
: <b>Make sure to wait until resilver is done before rebooting.</b><br>
: root@folkvang:~# <b>zpool status</b><br>
::   pool: rpool<br>
:::   state: DEGRADED<br>
:::   status: One or more devices is currently being resilvered.  The pool will continue to function, possibly in a degraded state.<br>
:::   action: Wait for the resilver to complete.<br>
:::   scan: resilver in progress since Fri Sep  2 16:45:53 2016<br>
:::     13.2M scanned out of 8.83G at 902K/s, 2h50m to go<br>
:::     12.9M resilvered, 0.15% done<br>
:::   config:<br>
:: NAME                      STATE    READ WRITE CKSUM<br>
:: rpool                    DEGRADED    0    0    0<br>
:::   mirror-0                DEGRADED    0    0    0<br>
:::     replacing-0          UNAVAIL      0    0    0<br>
:::       993077023721924477  FAULTED      0    0    0  was /dev/sdk2<br>
:::       sdl2                ONLINE      0    0    0  (resilvering)<br>
:::     sdk2                  ONLINE      0    0    0<br>
:::
:::   errors: No known data errors




(Just in case I did)<br>
[[Category:Linux Tutorials]]
<b>grub-install /dev/sdk<br>
[[Category:Proxmox Tutorials]]
grub-install /dev/sdl<br>
update-grub</b>

Latest revision as of 05:29, 1 December 2024

Copy partitions from good disk sda to blank disk sdb

sgdisk /dev/sda -R /dev/sdb	# sgdisk  /dev/sda<From this disk> -R /dev/sdb<Replicate to this disk>
sgdisk -G /dev/sdb # randomize the GUID on the new disk since it was copied from the other drive.

Using Parted to verify the partition table of /dev/sdb

(parted) select /dev/sdb
Using /dev/sdb
(parted) p
Model: ATA WDC WD2000FYYZ-0 (scsi)
Disk /dev/sdb: 2000398934016B
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags 1 1048576B 2097151B 1048576B Grub-Boot-Partition bios_grub 2 2097152B 136314879B 134217728B fat32 EFI-System-Partition boot, esp 3 136314880B 2000397885439B 2000261570560B zfs PVE-ZFS-Partition (Ok partitions copied)

Copy data from /dev/sda1 to /dev/sdb1

dd if=/dev/sda1 of=/dev/sdb1 bs=512 #This is the bios boot partition  
root@folkvang:~# dd if=/dev/sda1 of=/dev/sdb1 bs=512
2014+0 records in   
2014+0 records out  
1031168 bytes (1.0 MB) copied, 0.10164 s, 10.1 MB/s  

Replace the failed partition in the zpool

Find the ID of the failed block device

root@folkvang:~# zpool status
pool: rpool
    state: DEGRADED
    status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.
    action: Replace the device using 'zpool replace'.
    see: http://zfsonlinux.org/msg/ZFS-8000-4J
    scan: scrub repaired 0 in 0h25m with 0 errors on Sun May  8 11:20:27 2016
    config
    NAME                    STATE     READ WRITE CKSUM
    rpool                   DEGRADED     0     0     0
      mirror-0              DEGRADED     0     0     0
        993077023721924477  FAULTED      0     0     0  was /dev/sdk2
        sdk2                ONLINE       0     0     0
    errors: No known data errors

Call zpool to replace the failed device

root@folkvang:~# zpool replace -f rpool 993077023721924477 /dev/sdl2

Make sure to wait until resilver is done before rebooting.

root@folkvang:~# zpool statuspool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Sep 2 16:45:53 2016
13.2M scanned out of 8.83G at 902K/s, 2h50m to go
12.9M resilvered, 0.15% done
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
replacing-0 UNAVAIL 0 0 0
993077023721924477 FAULTED 0 0 0 was /dev/sdk2
sdl2 ONLINE 0 0 0 (resilvering)
sdk2 ONLINE 0 0 0
errors: No known data errors

After fixing the drive, we need to ensure that the boot sectors are configured.

proxmox-boot-tool format /dev/sdb2

proxmox-boot-tool init /dev/sdb2

proxmox-boot-tool refresh

proxmox-boot-tool status

proxmox-boot-tool clean

grub-install /dev/sdk
grub-install /dev/sdl
update-grub