I need advise about repairing (or not repairing) somewhat corrupted BTRFS volume.
I have a fairly big BTRFS RAID1 volume, currently consisting of 6 physical devices (HDDs). The volume survived many hardware failures and drive replacements.
After all that has happened, the volume is in a relatively satisfactory, but far from ideal condition. It mounts, most of the data is readable, new data is written. But at the same time:
The last replacement of the failed disk is not completed and cannot be completed for the reason described below.
Data balancing on the volume cannot be completed because of the logical file system structure corruption on one of the volumes. When attempting to perform the balancing multiple diagnostic messages appear (shown below) in the system log and the balancing process hangs forever. After this it cannot be interrupted nor killed.
Some data yet cannot be read from the volume and I suspect that if I leave the volume in the current state and keep on writing to it, the amount of unreadable data may increase (although I am not sure).
Attempt to offline check the volume with "btrfs check" reveal some diagnostic messages (shown below). The messages look reasonable and give hope that the volume can be repaired with "btrfs check --repair". But the manual instructs: "Do not use --repair unless you are advised to do so by a developer or an experienced user". So I came here (where I may probably find experienced users) to ask for such advice.
More specifically I want to understand the following:
If I try to perform "btrfs check --repair", what are the chances to lose all the remaining data?
If I do not try to perform "btrfs check --repair", what are the chances to that the logical structure corruption will grow and affect new data?
The data on the volume are not vitally important, but it would be much better to save them than to lose.
The technical details that may help to give the right advise, follow:
Normally the server runs Oracle Unbreakable Linux 6 with 4.1.12-124.48.6.el6uek.x86_64 kernel and btrfs-progs v4.2.2. Btrfs-check was run from a Ubuntu 22.04 liveCD with the kernel 5.15 and btrfs-progs 5.16.2. Unlike on Unbreakable Linux, running btrfs tools on Ubuntu liveCD (e.g. "btrfs dev del missing") does not cause uninterruptible blocking and at least btrfs program can be killed.
The current state of the volume:
[root@monster ~]# btrfs fi show
Label: 'Data' uuid: 3728eb0c-b062-4737-962b-b6d59d803bc3
Total devices 7 FS bytes used 4.53TiB
devid 1 size 1.82TiB used 1.66TiB path /dev/sda
devid 3 size 1.82TiB used 1.66TiB path /dev/sdd
devid 4 size 931.51GiB used 772.00GiB path /dev/sdb
devid 5 size 1.82TiB used 1.66TiB path /dev/sde
devid 6 size 1.82TiB used 1.66TiB path /dev/sdf
devid 7 size 1.82TiB used 1.66TiB path /dev/sdc
Some devices missing
The kernel messages that appear (many times) when the data balancing process hangs:
Aug 16 08:44:16 monster kernel: [156480.131059] INFO: task btrfs:3068 blocked for more than 120 seconds.
Aug 16 08:44:16 monster kernel: [156480.131790] btrfs D ffff88007fa98680 0 3068 3049 0x00000080
Aug 16 08:44:16 monster kernel: [156480.132282] [<ffffffffc0188195>] btrfs_start_ordered_extent+0xf5/0x130 [btrfs]
Aug 16 08:44:16 monster kernel: [156480.132311] [<ffffffffc01886df>] btrfs_wait_ordered_range+0xdf/0x140 [btrfs]
Aug 16 08:44:16 monster kernel: [156480.132336] [<ffffffffc01c08a2>] btrfs_relocate_block_group+0x262/0x2f0 [btrfs]
Aug 16 08:44:16 monster kernel: [156480.132361] [<ffffffffc019606e>] <brbtrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
Aug 16 08:44:16 monster kernel: [156480.132385] [<ffffffffc01972fc>] __btrfs_balance+0x4dc/0x8d0 [btrfs]
Aug 16 08:44:16 monster kernel: [156480.132409] [<ffffffffc0197978>] btrfs_balance+0x288/0x600 [btrfs]
Aug 16 08:44:16 monster kernel: [156480.132445] [<ffffffffc01a4113>] btrfs_ioctl_balance+0x3c3/0x440 [btrfs]
Aug 16 08:44:16 monster kernel: [156480.132470] [<ffffffffc01a5d70>] btrfs_ioctl+0x600/0x2a70 [btrfs]
The kernel messages that appear (many times) when attempting to read the unreadable data (or scrub the volume):
Aug 10 10:39:25 monster kernel: [12185191.075904] btrfs_dev_stat_print_on_error: 25 callbacks suppressed
Aug 10 10:39:30 monster kernel: [12185196.077024] btrfs_dev_stat_print_on_error: 60097 callbacks suppressed
Aug 10 10:39:35 monster kernel: [12185201.079721] btrfs_dev_stat_print_on_error: 191515 callbacks suppressed
Aug 10 10:39:40 monster kernel: [12185206.081052] btrfs_dev_stat_print_on_error: 192818 callbacks suppressed
Aug 10 10:39:45 monster kernel: [12185211.114693] btrfs_dev_stat_print_on_error: 91855 callbacks suppressed
Aug 10 10:39:48 monster kernel: [12185213.769604] btrfs_end_buffer_write_sync: 5 callbacks suppressed
Aug 10 10:39:50 monster kernel: [12185216.218880] btrfs_dev_stat_print_on_error: 57 callbacks suppressed
Aug 10 10:39:55 monster kernel: [12185221.227411] btrfs_dev_stat_print_on_error: 138 callbacks suppressed
Aug 10 10:40:02 monster kernel: [12185227.611771] btrfs_dev_stat_print_on_error: 167 callbacks suppressed
Aug 10 10:40:07 monster kernel: [12185232.904970] btrfs_dev_stat_print_on_error: 63 callbacks suppressed
Aug 10 10:40:12 monster kernel: [12185237.955002] btrfs_dev_stat_print_on_error: 54 callbacks suppressed
The kernel messages that appeared when I attempted to replace the failed drive (the failed drive does not relate to the issue at hand and now is physically removed):
Aug 10 11:22:52 monster kernel: [ 1458.081598] BTRFS: btrfs_scrub_dev(<missing disk>, 2, /dev/sdc) failed -5
Aug 10 11:22:52 monster kernel: [ 1458.082080] WARNING: CPU: 0 PID: 4051 at fs/btrfs/dev-replace.c:418 btrfs_dev_replace_start+0x2dd/0x330 [btrfs]()
Aug 10 11:22:52 monster kernel: [ 1458.082111] Modules linked in: autofs4 coretemp ipmi_devintf ipmi_si ipmi_msghandler sunrpc 8021q mrp garp stp llc ipt_REJECT nf_reject_ipv4 xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 iTCO_wdt iTCO_vendor_support pcspkr e1000 serio_raw i2c_i801 i2c_core lpc_ich mfd_core e1000e ptp pps_core sg acpi_cpufreq shpchp i3200_edac edac_core ext4 jbd2 mbcache2 btrfs raid6_pq xor sr_mod cdrom aacraid sd_mod ahci libahci mpt3sas scsi_transport_sas raid_class floppy dm_mirror dm_region_hash dm_log dm_mod
Aug 10 11:22:52 monster kernel: [ 1458.082114] CPU: 0 PID: 4051 Comm: btrfs Not tainted 4.1.12-124.48.6.el6uek.x86_64 #2
Aug 10 11:22:52 monster kernel: [ 1458.082152] [<ffffffffc01c16ed>] btrfs_dev_replace_start+0x2dd/0x330 [btrfs]
Aug 10 11:22:52 monster kernel: [ 1458.082169] [<ffffffffc01883d2>] btrfs_ioctl+0x1c62/0x2a70 [btrfs]
Aug 10 11:29:06 monster kernel: [ 1831.770194] BTRFS: btrfs_scrub_dev(<missing disk>, 2, /dev/sdc) failed -5
Aug 10 11:29:06 monster kernel: [ 1831.770654] WARNING: CPU: 1 PID: 4335 at fs/btrfs/dev-replace.c:418 btrfs_dev_replace_start+0x2dd/0x330 [btrfs]()
Aug 10 11:29:06 monster kernel: [ 1831.771030] Modules linked in: autofs4 coretemp ipmi_devintf ipmi_si ipmi_msghandler sunrpc 8021q mrp garp stp llc ipt_REJECT nf_reject_ipv4 xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 iTCO_wdt iTCO_vendor_support pcspkr e1000 serio_raw i2c_i801 i2c_core lpc_ich mfd_core e1000e ptp pps_core sg acpi_cpufreq shpchp i3200_edac edac_core ext4 jbd2 mbcache2 btrfs raid6_pq xor sr_mod cdrom aacraid sd_mod ahci libahci mpt3sas scsi_transport_sas raid_class floppy dm_mirror dm_region_hash dm_log dm_mod
The output of the "btrfs check":
root@ubuntu-server:~# btrfs check --readonly -p /dev/sda
Opening filesystem to check...
Checking filesystem on /dev/sda
UUID: 3728eb0c-b062-4737-962b-b6d59d803bc3
[1/7] checking root items (0:06:22 elapsed, 2894917 items checked)
Invalid mapping for 11707729661952-11707729666048, got 14502780010496-14503853752320d)
Couldn't map the block 11707729661952
Couldn't map the block 11707729661952
bad tree block 11707729661952, bytenr mismatch, want=11707729661952, have=0
ref mismatch on [11707729661952 4096] extent item 0, found 1sed, 1398310 items checked)
tree backref 11707729661952 root 7 not found in extent tree
backpointer mismatch on [11707729661952 4096]
owner ref check failed [11707729661952 4096]
bad extent [11707729661952, 11707729666048), type mismatch with chunk
[2/7] checking extents (0:06:58 elapsed, 1398310 items checked)
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache (0:07:38 elapsed, 4658 items checked)
Invalid mapping for 11707729661952-11707729666048, got 14502780010496-14503853752320
Couldn't map the block 11707729661952
Couldn't map the block 11707729661952
bad tree block 11707729661952, bytenr mismatch, want=11707729661952, have=0
Invalid mapping for 11707729661952-11707729666048, got 14502780010496-14503853752320
Couldn't map the block 11707729661952
Couldn't map the block 11707729661952
bad tree block 11707729661952, bytenr mismatch, want=11707729661952, have=0
Invalid mapping for 11707729661952-11707729666048, got 14502780010496-14503853752320
skipped many repetitions --------------------
Couldn't map the block 11707729661952
Couldn't map the block 11707729661952
Couldn't map the block 11707729661952
Couldn't map the block 11707729661952
bad tree block 11707729661952, bytenr mismatch, want=11707729661952, have=0
Invalid mapping for 11707729661952-11707729666048, got 14502780010496-14503853752320
Couldn't map the block 11707729661952
Couldn't map the block 11707729661952
bad tree block 11707729661952, bytenr mismatch, want=11707729661952, have=0
Invalid mapping for 11707729661952-11707729666048, got 14502780010496-14503853752320
Couldn't map the block 11707729661952
Couldn't map the block 11707729661952
Couldn't map the block 11707729661952
Couldn't map the block 11707729661952
skipped many repetitions --------------------
bad tree block 11707729661952, bytenr mismatch, want=11707729661952, have=0<br/>
root 5 inode 1025215 errors 500, file extent discount, nbytes wrong<br/>
Found file extent holes:
start: 50561024, len: 41848832
root 5 inode 1025216 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 275 namelen 29 name ft-v05.2024-04-06.112000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025217 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 277 namelen 29 name ft-v05.2024-04-06.112500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025218 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 279 namelen 29 name ft-v05.2024-04-06.113000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025219 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 281 namelen 29 name ft-v05.2024-04-06.113500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025220 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 283 namelen 29 name ft-v05.2024-04-06.114000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025221 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 285 namelen 29 name ft-v05.2024-04-06.114500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025222 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 287 namelen 29 name ft-v05.2024-04-06.115000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025223 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 289 namelen 29 name ft-v05.2024-04-06.115500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025224 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 291 namelen 29 name ft-v05.2024-04-06.120000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025225 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 293 namelen 29 name ft-v05.2024-04-06.120500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025226 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 295 namelen 29 name ft-v05.2024-04-06.121000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025227 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 297 namelen 29 name ft-v05.2024-04-06.121500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025228 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 299 namelen 29 name ft-v05.2024-04-06.122000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025229 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 301 namelen 29 name ft-v05.2024-04-06.122500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025230 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 303 namelen 29 name ft-v05.2024-04-06.123000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025231 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 305 namelen 29 name ft-v05.2024-04-06.123500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025232 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 307 namelen 29 name ft-v05.2024-04-06.124000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025233 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 309 namelen 29 name ft-v05.2024-04-06.124500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025234 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 311 namelen 29 name ft-v05.2024-04-06.125000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025235 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 313 namelen 29 name ft-v05.2024-04-06.125500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025236 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 315 namelen 29 name ft-v05.2024-04-06.130000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025237 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 317 namelen 29 name ft-v05.2024-04-06.130500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025238 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 319 namelen 29 name ft-v05.2024-04-06.131000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025239 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 321 namelen 29 name ft-v05.2024-04-06.131500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025240 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 323 namelen 29 name ft-v05.2024-04-06.132000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025241 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 325 namelen 29 name ft-v05.2024-04-06.132500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025242 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 327 namelen 29 name ft-v05.2024-04-06.133000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025243 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 329 namelen 29 name ft-v05.2024-04-06.133500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025244 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 331 namelen 29 name ft-v05.2024-04-06.134000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025245 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 333 namelen 29 name ft-v05.2024-04-06.134500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025246 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 335 namelen 29 name ft-v05.2024-04-06.135000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025247 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 337 namelen 29 name ft-v05.2024-04-06.135500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025248 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 339 namelen 29 name ft-v05.2024-04-06.140000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025249 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 341 namelen 29 name ft-v05.2024-04-06.140500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025250 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 343 namelen 29 name ft-v05.2024-04-06.141000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025251 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 345 namelen 29 name ft-v05.2024-04-06.141500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025252 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 347 namelen 29 name ft-v05.2024-04-06.142000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025253 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 349 namelen 29 name ft-v05.2024-04-06.142500+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025254 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 351 namelen 29 name ft-v05.2024-04-06.143000+0300 filetype 1 errors 4, no inode ref
root 5 inode 1025255 errors 2001, no inode item, link count wrong
unresolved ref dir 1025079 index 353 namelen 29 name ft-v05.2024-04-06.143500+0300 filetype 1 errors 4, no inode ref
skipped many repetitions --------------------
root 5 inode 1032304 errors 2001, no inode item, link count wrong
unresolved ref dir 997350 index 6 namelen 7 name 2024-05 filetype 2 errors 4, no inode ref
root 5 inode 1041264 errors 2001, no inode item, link count wrong
unresolved ref dir 997350 index 7 namelen 7 name 2024-06 filetype 2 errors 4, no inode ref
root 5 inode 1049935 errors 2001, no inode item, link count wrong
unresolved ref dir 997350 index 8 namelen 7 name 2024-07 filetype 2 errors 4, no inode ref
root 5 inode 1058895 errors 2001, no inode item, link count wrong
unresolved ref dir 997350 index 9 namelen 7 name 2024-08 filetype 2 errors 4, no inode ref
[4/7] checking fs roots (0:12:36 elapsed, 10657 items checked)
ERROR: errors found in fs roots
found 4984662896640 bytes used, error(s) found
total csum bytes: 4846592840
total tree bytes: 5727440896
total fs tree bytes: 155164672
total extent tree bytes: 321896448
btree space waste bytes: 234524798
file data blocks allocated: 4978935451648
referenced 4975629070336