zfs regular read pauses

Question

We have a zfs file system running on openindiana with an almost purely read-only workload. Recently we have found that the io is blocked or pauses every 1 minute for around 2-3 seconds. Does anyone know why this happens and how to stop it occurring?

pool: data

state: ONLINE

status: The pool is formatted using a legacy on-disk format.  The pool can
        still be used, but some features are unavailable.

action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on software that does not support feature
        flags.

scan: none requested

config:

        NAME                       STATE     READ WRITE CKSUM
        data01                     ONLINE       0     0     0
          raidz1-0                 ONLINE       0     0     0
            c6t5E83A97A6BAB722Fd0  ONLINE       0     0     0
            c6t5E83A97BBE7FE8F1d0  ONLINE       0     0     0
            c6t5E83A97CB2D1609Fd0  ONLINE       0     0     0
            c6t5E83A97E0D2BC6ACd0  ONLINE       0     0     0
            c6t5E83A97E1E2A6F66d0  ONLINE       0     0     0
            c6t5E83A97E4C627FC0d0  ONLINE       0     0     0
            c6t5E83A97E44EB7FC0d0  ONLINE       0     0     0
            c6t5E83A97E124F8168d0  ONLINE       0     0     0
            c6t5E83A97E86762E4Dd0  ONLINE       0     0     0
            c6t5E83A97E25716311d0  ONLINE       0     0     0
            c6t5E83A97EBEEB8B60d0  ONLINE       0     0     0
            c6t5E83A97FCF5551F5d0  ONLINE       0     0     0

errors: No known data errors

As requested by @jlliagre, here are the results from calling hotkernel during good and bad times.

good times

zfs`vdev_is_dead 1 0.0%
ip`conn_ip_output 1 0.0%
sd`sdintr 1 0.0%
mpt_sas`mptsas_start_cmd0 1 0.0%
genunix`new_mstate 1 0.0%
genunix`segvn_faulta 1 0.0%
unix`swtch 1 0.0%
unix`tsc_scalehrtime 1 0.0%
zfs`ddt_stat_add 1 0.0%
unix`lwp_segregs_save 1 0.0%
rootnex`rootnex_get_sgl 1 0.0%
genunix`ldi_strategy 1 0.0%
mpt_sas`mptsas_scsi_init_pkt 1 0.0%
genunix`ddi_fm_dma_err_get 1 0.0%
zfs`zio_create 1 0.0%
zfs`zio_destroy 1 0.0%
unix`strcmp 1 0.0%
unix`cpu_acpi_idle 1 0.0%
ip`ip_xmit_attach_llhdr 1 0.0%
genunix`mdi_pi_get_vhci_private 1 0.0%
genunix`lbolt_event_driven 1 0.0%
unix`disp 1 0.0%
genunix`cpu_update_pct 1 0.0%
unix`tsc_gethrtime 1 0.0%
ip`tcp_build_hdrs 1 0.0%
genunix`dnlc_lookup 1 0.0%
unix`page_create_va 1 0.0%
genunix`kmem_zalloc 1 0.0%
unix`mmapobj_map_elf 1 0.0%
bnx`bnx_intr_1lvl 1 0.0%
genunix`btopr 1 0.0%
genunix`fsflush 1 0.0%
genunix`gexec 1 0.0%
unix`mmu_tlbflush_entry 1 0.0%
unix`hment_alloc 1 0.0%
specfs`spec_lockcsp 1 0.0%
zfs`vdev_stat_update 1 0.0%
genunix`stk_getptr 1 0.0%
unix`resume 1 0.0%
unix`do_splx 1 0.0%
unix`page_clr_all_props 1 0.0%
bnx`bnx_xmit_ring_xmit_mblk 1 0.0%
genunix`list_create 1 0.0%
scsi`scsi_uscsi_handle_cmd 1 0.0%
unix`page_lookup_nowait 1 0.0%
genunix`syscall_mstate 1 0.0%
genunix`kmem_free 1 0.0%
genunix`as_map_locked 1 0.0%
unix`cpu_idle_adaptive 1 0.0%
unix`page_lookup_create 1 0.0%
sd`sd_start_cmds 1 0.0%
unix`0xfffffffffb85 1 0.0%
unix`atomic_cas_64 1 0.0%
genunix`restorectx 1 0.0%
genunix`vmem_is_populator 1 0.0%
unix`disp_anywork 1 0.0%
unix`prefetch_page_r 1 0.0%
sd`xbuf_iostart 1 0.0%
zfs`zfs_read 1 0.0%
unix`hwblkclr 1 0.0%
specfs`sfind 1 0.0%
genunix`swapfs_getvp 1 0.0%
zfs`vdev_cache_allocate 1 0.0%
zfs`buf_hash_find 1 0.0%
genunix`taskq_thread 2 0.0%
unix`tsc_read 2 0.0%
unix`tsc_gethrtimeunscaled 2 0.0%
genunix`fop_getpage 2 0.0%
unix`inflate_table 2 0.0%
unix`ddi_get32 2 0.0%
unix`mutex_exit 2 0.0%
unix`idle 3 0.0%
zfs`lzjb_compress 3 0.0%
unix`mutex_enter 6 0.1%
unix`page_nextn 9 0.1%
genunix`fsflush_do_pages 9 0.1%
unix`disp_getwork 12 0.2%
unix`acpi_cpu_cstate 441 5.7%
unix`i86_mwait 7118 92.7%

bad times

genunix`cpu_update_pct 1 0.0%
unix`splr 1 0.0%
bnx`bnx_timer 1 0.0%
unix`av_check_softint_pending 1 0.0%
genunix`devid_str_decode_id 1 0.0%
pcplusmp`apic_send_ipi 1 0.0%
genunix`callout_downheap 1 0.0%
unix`prefetch_page_r 1 0.0%
dtrace`dtrace_state_clean 1 0.0%
genunix`fsflush 1 0.0%
unix`link_ptp 1 0.0%
ip`ip_input_cksum_v4 1 0.0%
ip`tcp_timer_callback 1 0.0%
unix`page_pptonum 1 0.0%
unix`do_splx 1 0.0%
unix`page_nextn 5 0.1%
genunix`fsflush_do_pages 6 0.1%
unix`acpi_cpu_cstate 782 17.7%
unix`i86_mwait

ewwhite · Answer 1 · 2014-01-02T23:01:20.830

Well, there are a few things wrong here...

Your zpool version is old. zpool upgrade will fix that.
The pool design does not lend itself to high performance. You have the effective IOPS potential of one disk. (this is likely the biggest issue here)
You have too many drives in your RAIDZ1. You shouldn't go over 7...
There are likely other tunables that may help here. Can you show zfs list?

Take a read of: http://nex7.blogspot.com/2013/03/readme1st.html for more general guidelines.

jlliagre · Accepted Answer · 2014-01-03T09:48:39.863

Perhaps is there an issue with power management on your machine.

Try disabling it by adding the following line to the /etc/power.conf file

cpupm disable

next run

pmconfig

and see if the I/O behavior changes.

Update:

As disabling power management doesn't change anything, the fact is the kernel is essentially idle during the "Bad times" with zero ZFS activity,.

I would then infer the application is not I/O bound and is simply doing something else during these periods of time.

zfs regular read pauses

2 Answers2