I need help with RAID error

Amadex · September 2021

Hello, I've never had to deals with issues with RAID (new to dedicated servers since we switched from VPS's)

This is the error that I've got to email:

A Fail event had been detected on md device /dev/md/2.

It could be related to component device /dev/nvme0n1p3.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1]
md2 : active raid1 nvme1n1p3[1] nvme0n1p3[0](F)
      3745885504 blocks super 1.2 [2/1] [_U]
      bitmap: 3/28 pages [12KB], 65536KB chunk

md0 : active raid1 nvme1n1p1[1] nvme0n1p1[0]
      4189184 blocks super 1.2 [2/2] [UU]

md1 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
      523264 blocks super 1.2 [2/2] [UU]

unused devices:

A Fail event had been detected on md device /dev/md/1.

It could be related to component device /dev/nvme0n1p2.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1]
md2 : active raid1 nvme1n1p3[1]
      3745885504 blocks super 1.2 [2/1] [_U]
      bitmap: 4/28 pages [16KB], 65536KB chunk

md0 : active raid1 nvme1n1p1[1] nvme0n1p1[0](F)
      4189184 blocks super 1.2 [2/1] [_U]

md1 : active raid1 nvme1n1p2[1] nvme0n1p2[0](F)
      523264 blocks super 1.2 [2/1] [_U]

unused devices:

and there's second mail with:

A Fail event had been detected on md device /dev/md/0.

It could be related to component device /dev/nvme0n1p1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1]
md2 : active raid1 nvme1n1p3[1]
      3745885504 blocks super 1.2 [2/1] [_U]
      bitmap: 4/28 pages [16KB], 65536KB chunk

md0 : active raid1 nvme1n1p1[1]
      4189184 blocks super 1.2 [2/1] [_U]

md1 : active raid1 nvme1n1p2[1]
      523264 blocks super 1.2 [2/1] [_U]

unused devices:

What should I do? Thanks.

Amadex · September 2021

Here's more info that I've Googled

[root@blue ~]# smartctl -H /dev/nvme1n1p2
smartctl 7.1 2020-08-23 r5080 [x86_64-linux-4.18.0-305.17.1.lve.el8.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

[root@blue ~]# smartctl -H /dev/nvme1n1p1
smartctl 7.1 2020-08-23 r5080 [x86_64-linux-4.18.0-305.17.1.lve.el8.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

[root@blue ~]# smartctl -H /dev/nvme1n1p3
smartctl 7.1 2020-08-23 r5080 [x86_64-linux-4.18.0-305.17.1.lve.el8.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

[root@blue ~]# fdisk -l /dev/nvme1n1p1 /dev/nvme1n1p2 /dev/nvme1n1p3
Disk /dev/nvme1n1p1: 4 GiB, 4294967296 bytes, 8388608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 131072 bytes / 131072 bytes


Disk /dev/nvme1n1p2: 512 MiB, 536870912 bytes, 1048576 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 131072 bytes / 131072 bytes


Disk /dev/nvme1n1p3: 3.5 TiB, 3835922030080 bytes, 7492035215 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 131072 bytes / 131072 bytes

Amadex · September 2021

And the partitions:

[root@blue ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         63G     0   63G   0% /dev
tmpfs            63G     0   63G   0% /dev/shm
tmpfs            63G  952K   63G   1% /run
tmpfs            63G     0   63G   0% /sys/fs/cgroup
/dev/md2        3.5T   50G  3.3T   2% /
/dev/md1        485M  332M  128M  73% /boot
tmpfs            13G     0   13G   0% /run/user/0

The server is Hetzner AX 101

and I've did the partition thing with their tutorial for disk larger than 2TB with imageinstall (rescue while OS install)

PART swap swap 4G
PART /boot ext3 512M
PART / ext4 all

SGraf · September 2021

Helped a bit via discord, .... mdam should be back in sync for now.

Not_Oles · September 2021

@SGraf Thanks for helping! Thanks also for posting about the resolution so the thread wouldn't be left just dangling.

@Amadex @SGraf Could you guys please post a brief note about

what the problem was,
what caused the problem, and
how you fixed it?

Best wishes and kindest regards from a clueless™ guy in the desert! 🏜️

SagnikS · September 2021

You should get the NVMe that was removed from the array replaced ASAP. I had that happen too and put it back in the RAID array because a badblocks and SMART test came out clean. A few hours later, the node started behaving extremely weirdly (high iowait) and eventually crashed.

CamoYoshi · September 2021

I don't understand why in 2021 people still do mdadm arrays... Logical Volume Groups, btrfs, or ZFS are the way to go. There's too many issues with write holes and desyncing on mdadm that require manual intervention for my tastes.

Not_Oles · September 2021

@SagnikS said: the NVMe that was removed from the array

Sorry, how do we know that an NVMe was removed from the array?

SagnikS · September 2021

@Not_Oles said:

@SagnikS said: the NVMe that was removed from the array

Sorry, how do we know that an NVMe was removed from the array?

Got an alert from our monitoring software, and an email too. cat /proc/mdstat will also show your array as degraded.

SagnikS · September 2021

@CamoYoshi said:
I don't understand why in 2021 people still do mdadm arrays... Logical Volume Groups, btrfs, or ZFS are the way to go. There's too many issues with write holes and desyncing on mdadm that require manual intervention for my tastes.

LVM RAID uses the same underlying driver iirc, btrfs is still not stable and ZFS has a performance overhead/you need to tune it properly. mdadm still works just fine ootb and is a tested solution.

Amadex · September 2021

@SGraf helped me a lot. Thanks again! 🙌

@Not_Oles
Problem: I've got email that RAID has failed
What caused the problem: dunno
Fixed: @SGraf was troubleshooting

Idk if I should replace the disk or keep it for now.

SGraf · September 2021

@Amadex said:
@SGraf helped me a lot. Thanks again! 🙌

@Not_Oles
Problem: I've got email that RAID has failed
What caused the problem: dunno
Fixed: @SGraf was troubleshooting

Idk if I should replace the disk or keep it for now.

As i said in the chat, get that disk replaced. Just because we put the mdam raid back together for now, doesnt mean it will be stable in the future.

We saw i/o write erors on the ssd before the system dropped the ssd completely. the disk came back after a reboot and we re-added+synced the ssd.

Falzo · September 2021

run smartctl -a against your nvmes to get an idea how worn out they are...

Amadex · September 2021

@Falzo

[root@blue ~]# smartctl -a /dev/nvme0n1
smartctl 7.1 2020-08-23 r5080 [x86_64-linux-4.18.0-305.17.1.lve.el8.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZQL23T8HCLS-00A07
Serial Number:                      S64HNE0R514262
Firmware Version:                   GDC5302Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 3,840,755,982,336 [3.84 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
Number of Namespaces:               32
Namespace 1 Size/Capacity:          3,840,755,982,336 [3.84 TB]
Namespace 1 Utilization:            154,588,065,792 [154 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Thu Sep 30 12:26:00 2021 CEST
Firmware Updates (0x17):            3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x005f):   Security Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     83 Celsius
Namespace 1 Features (0x1a):        NA_Fields No_ID_Reuse *Other*

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    25.00W   14.00W       -    0  0  0  0       70      70
 1 +     8.00W  0.0800W       -    1  1  1  1       70      70

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        37 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    10,941,488 [5.60 TB]
Data Units Written:                 1,419,512 [726 GB]
Host Read Commands:                 14,992,414
Host Write Commands:                7,351,883
Controller Busy Time:               118
Power Cycles:                       5
Power On Hours:                     74
Unsafe Shutdowns:                   0
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               37 Celsius
Temperature Sensor 2:               47 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

[root@blue ~]# smartctl -a /dev/nvme1n1
smartctl 7.1 2020-08-23 r5080 [x86_64-linux-4.18.0-305.17.1.lve.el8.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZQL23T8HCLS-00A07
Serial Number:                      S64HNE0R514263
Firmware Version:                   GDC5302Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 3,840,755,982,336 [3.84 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
Number of Namespaces:               32
Namespace 1 Size/Capacity:          3,840,755,982,336 [3.84 TB]
Namespace 1 Utilization:            3,840,755,978,240 [3.84 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Thu Sep 30 12:26:38 2021 CEST
Firmware Updates (0x17):            3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x005f):   Security Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     83 Celsius
Namespace 1 Features (0x1a):        NA_Fields No_ID_Reuse *Other*

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    25.00W   14.00W       -    0  0  0  0       70      70
 1 +     8.00W  0.0800W       -    1  1  1  1       70      70

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        36 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    135,580 [69.4 GB]
Data Units Written:                 12,285,985 [6.29 TB]
Host Read Commands:                 645,231
Host Write Commands:                22,273,118
Controller Busy Time:               30
Power Cycles:                       5
Power On Hours:                     74
Unsafe Shutdowns:                   0
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               36 Celsius
Temperature Sensor 2:               46 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

[root@blue ~]# sudo sfdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 3.5 TiB, 3840755982336 bytes, 7501476528 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 131072 bytes / 131072 bytes
Disklabel type: gpt
Disk identifier: 8A555D66-C2F6-4F35-8677-55BDFC148408

Device           Start        End    Sectors  Size Type
/dev/nvme0n1p1    4096    8392703    8388608    4G Linux RAID
/dev/nvme0n1p2 8392704    9441279    1048576  512M Linux RAID
/dev/nvme0n1p3 9441280 7501476494 7492035215  3.5T Linux RAID
/dev/nvme0n1p4    2048       4095       2048    1M BIOS boot

Partition table entries are not in disk order.
[root@blue ~]# sudo sfdisk -l /dev/nvme1n1
Disk /dev/nvme1n1: 3.5 TiB, 3840755982336 bytes, 7501476528 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 131072 bytes / 131072 bytes
Disklabel type: gpt
Disk identifier: 95B72003-3BE5-4905-ABFB-F6DB8851BA89

Device           Start        End    Sectors  Size Type
/dev/nvme1n1p1    4096    8392703    8388608    4G Linux RAID
/dev/nvme1n1p2 8392704    9441279    1048576  512M Linux RAID
/dev/nvme1n1p3 9441280 7501476494 7492035215  3.5T Linux RAID
/dev/nvme1n1p4    2048       4095       2048    1M BIOS boot

Partition table entries are not in disk order.

Amadex · September 2021

[root@blue ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 nvme0n1p3[0] nvme1n1p3[1]
      3745885504 blocks super 1.2 [2/2] [UU]
      bitmap: 6/28 pages [24KB], 65536KB chunk

md0 : active raid1 nvme0n1p1[2] nvme1n1p1[1]
      4189184 blocks super 1.2 [2/2] [UU]

md1 : active raid1 nvme0n1p2[2] nvme1n1p2[1]
      523264 blocks super 1.2 [2/2] [UU]

unused devices:

Falzo · September 2021

@Amadex said:

these NVMe are brand new. there is nothing to argue over for having them replaced.

did you power off the server via panel at some point after the first installation? maybe a power loss lead to the broken initial raid sync in the first place...

Amadex · September 2021

@Falzo said:

@Amadex said:

these NVMe are brand new. there is nothing to argue over for having them replaced.

did you power off the server via panel at some point after the first installation? maybe a power loss lead to the broken initial raid sync in the first place...

I've never did that. After Plesk installation + Centos 8 > CloudLinux 8 conversion I did a normal reboot via ssh. Server was bought on 27.09.2021 and everything was installed on that day + rebooted. Since then I've touched nothing.

Falzo · September 2021

@Amadex said:

@Falzo said:

@Amadex said:

these NVMe are brand new. there is nothing to argue over for having them replaced.

did you power off the server via panel at some point after the first installation? maybe a power loss lead to the broken initial raid sync in the first place...

I've never did that. After Plesk installation + Centos 8 > CloudLinux 8 conversion I did a normal reboot via ssh. Server was bought on 27.09.2021 and everything was installed on that day + rebooted. Since then I've touched nothing.

weird... I however doubt that there is anything wrong with one of the NVMes at all, whatever hickup that has been then.

Amadex · September 2021

@Falzo said:

@Amadex said:

@Falzo said:

@Amadex said:

these NVMe are brand new. there is nothing to argue over for having them replaced.

did you power off the server via panel at some point after the first installation? maybe a power loss lead to the broken initial raid sync in the first place...

I've never did that. After Plesk installation + Centos 8 > CloudLinux 8 conversion I did a normal reboot via ssh. Server was bought on 27.09.2021 and everything was installed on that day + rebooted. Since then I've touched nothing.

weird... I however doubt that there is anything wrong with one of the NVMes at all, whatever hickup that has been then.

I will wait and see If it happens again, thanks everyone for replies

CamoYoshi · September 2021

@SagnikS said:

@CamoYoshi said:
I don't understand why in 2021 people still do mdadm arrays... Logical Volume Groups, btrfs, or ZFS are the way to go. There's too many issues with write holes and desyncing on mdadm that require manual intervention for my tastes.

LVM RAID uses the same underlying driver iirc, btrfs is still not stable and ZFS has a performance overhead/you need to
tune it properly. mdadm still works just fine ootb and is a tested solution.

LVM at least has the benefit of self-healing in RAID1 scenarios despite calling mdadm for the underlying RAID functionality, and offers greater flexibility over mdadm.

btrfs is considered stable for RAID1: https://btrfs.wiki.kernel.org/index.php/Status - Performance tuning just needs to happen to the code and then it'll be a lot more viable, but that being said it is quite performant as it is now.

ZFS performance overhead is way overblown; the "1TB of storage needs 1GB of RAM" is only for enterprise level applications with many clients simultaneously reading and writing to the array, and the recommended tuning settings are well documented and understood.

ZFS performance tuning official recommendations: https://openzfs.github.io/openzfs-docs/Performance and Tuning/Workload Tuning.html

ZFS developer on the "1GB for 1TB" rule:
https://www.reddit.com/r/DataHoarder/comments/5u3385/linus_tech_tips_unboxes_1_pb_of_seagate/ddrh5iv/
https://www.reddit.com/r/DataHoarder/comments/5u3385/linus_tech_tips_unboxes_1_pb_of_seagate/ddrngar/

SagnikS · September 2021

@CamoYoshi said:

@SagnikS said:

@CamoYoshi said:
I don't understand why in 2021 people still do mdadm arrays... Logical Volume Groups, btrfs, or ZFS are the way to go. There's too many issues with write holes and desyncing on mdadm that require manual intervention for my tastes.

LVM RAID uses the same underlying driver iirc, btrfs is still not stable and ZFS has a performance overhead/you need to
tune it properly. mdadm still works just fine ootb and is a tested solution.

LVM at least has the benefit of self-healing in RAID1 scenarios despite calling mdadm for the underlying RAID functionality, and offers greater flexibility over mdadm.

btrfs is considered stable for RAID1: https://btrfs.wiki.kernel.org/index.php/Status - Performance tuning just needs to happen to the code and then it'll be a lot more viable, but that being said it is quite performant as it is now.

ZFS performance overhead is way overblown; the "1TB of storage needs 1GB of RAM" is only for enterprise level applications with many clients simultaneously reading and writing to the array, and the recommended tuning settings are well documented and understood.

ZFS performance tuning official recommendations: https://openzfs.github.io/openzfs-docs/Performance and Tuning/Workload Tuning.html

ZFS developer on the "1GB for 1TB" rule:
https://www.reddit.com/r/DataHoarder/comments/5u3385/linus_tech_tips_unboxes_1_pb_of_seagate/ddrh5iv/
https://www.reddit.com/r/DataHoarder/comments/5u3385/linus_tech_tips_unboxes_1_pb_of_seagate/ddrngar/

I'm not sure what you're referring to by LVM's self healing .

I'm referring to BTRFS's stability, A few weeks back I had a friend lose data due to a power loss (OpenSUSE + Btrfs). This rarely happens with EXT4.

And yes, not sure where that RAM requirement came from. I run a few large (100+TB) Proxmox servers that run ZFS (that's the only logical choice, xfs has been having problems with files disappearing apparently, ext4 is limited) and they run on very little RAM. I find it really nice that the writehole problems are patched there, but there are a few quirks. IOWait was much higher when the VM was in a zvol than when it was in a file in the same zpool that the zvol was in. Another problem with ZFS is the lack of mainstream linux support atm. It should improve in future though hopefully.

I have high hopes on a project called bcachefs, it seems really cool

I need help with RAID error

Comments