Ниже будут перечислены используемые параметры для тюнинга ZFS.
arc_reduce_dnlc_percent
If the ARC detects low memory (via arc_reclaim_needed()), then we call arc_kmem_reap_now() and subsequently dnlc_reduce_cache() – which reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT).
So yeah, dnlc_nentries would be really interesting to see (especially if its << ncsize).
The version of statit that we’re using is still attached to ancient 32-bit counters that /are/ overflowing on our runs. I’m fixing this at the moment and I’ll send around a new binary this afternoon.
Default: 0x3
How to change:
# echo arc_reduce_dnlc_percent/W0t2 | mdb -kw
zfs_arc_max, zfs_arc_min (deprecated in 11.2)
Determines the maximum/minimum size of the ZFS Adjustable Replacement Cache (ARC). Solaris 11.2 deprecates the zfs_arc_max kernel parameter in favor of user_reserve_hint_pct and that’s cool.
Default:
How to change:
arc_shrink_shift
This variable controls the amount of RAM that arc_shrinks will try to reclaim. By default this is set to 5, which equates to shrinking by 1/32 of arc_max. We tuned this to 11, which is 1/2048 of arc_max. Based on that, we would be shrinking the arc by about 100MB per shrink event, rather than 6GB of RAM.
Every second a process runs which checks if data can be removed from the ARC and evicts it. Default max 1/32nd of the ARC can be evicted at a time. This is limited because evicting large amounts of data from ARC stalls all other processes. Back when 8GB was a lot of memory 1/32nd meant 256MB max at a time. When you have 196GB of memory 1/32nd is 6.3GB, which can cause up to 20-30 seconds of unresponsiveness (depending on the record size).
(where 11 is 1/2 11 or 1/2048th, 10 is 1/2 10 or 1/1024th etc. Change depending on amount of RAM in your system).
Default: 0x5
How to change:
# echo arc_shrink_shift/W0xa | mdb -kw
zfs_mdcomp_disable
This parameter controls compression of ZFS metadata (indirect blocks only). ZFS data block compression is controlled by the ZFS compression property that can be set per file system.
Default: 0
How to change:
# echo zfs_mdcomp_disable/W0t1 | mdb -kw
zfs_prefetch_disable
This parameter determines a file-level prefetching mechanism called zfetch. This mechanism looks at the patterns of reads to files and anticipates on some reads, thereby reducing application wait times.
Default: 0
How to change:
# echo zfs_prefetch_disable/W0t1 | mdb -kw
metaslab_aliquot
Metaslab granularity, in bytes. This is roughly similar to what would be referred to as the “stripe size” in traditional RAID arrays. In normal operation, ZFS will try to write this amount of data to a top-level vdev before moving on to the next one.
The traditional VDEV space re-balancing occurred by means of a bias based on a 512K metaslab_aliquot and the number of VDEV children. This bias mechanism will not function correctly with large allocation sizes. An alternate method may need to be devised to allow effective re-balancing when streams of large allocations occur.
Intel is currently working on a alternate re-balancing solution for large blocks.
Default: 0x80000
How to change:
# echo metaslab_aliquot/W0x90000 | mdb -kw
spa_max_replication_override
Количество DVA (data virtual address) в указателе блока, так называемые ditto-blocks
Default: 0x3
How to change:
spa_mode_global
Is used to define the mode in which given zpool can be initialized internally by ZFS, typically used as READ/WRITE mode.
Default: 0x3
How to change:
zfs_flags
Set additional debugging flags
flag value | symbolic name | description |
---|---|---|
0x1 | ZFS_DEBUG_DPRINTF | Enable dprintf entries in the debug log |
0x2 | ZFS_DEBUG_DBUF_VERIFY | Enable extra dnode verifications |
0x4 | ZFS_DEBUG_DNODE_VERIFY | Enable extra dnode verifications |
0x8 | ZFS_DEBUG_SNAPNAMES | Enable snapshot name verification |
0x10 | ZFS_DEBUG_MODIFY | Check for illegally modified ARC buffers |
0x20 | ZFS_DEBUG_SPA | Enable spa_dbgmsg entries in the debug log |
0x40 | ZFS_DEBUG_ZIO_FREE | Enable verification of block frees |
0x80 | ZFS_DEBUG_HISTOGRAM_VERIFY | Enable extra spacemap histogram verifications |
0x100 | ZFS_DEBUG_METASLAB_VERIFY | Verify space accounting on disk matches in-core range_trees |
0x200 | ZFS_DEBUG_SET_ERROR | Enable SET_ERROR and dprintf entries in the debug log |
Default: 0x0
How to change:
# echo zfs_flags/W0x8 | mdb -kw
zfs_txg_synctime_ms
This sets how often (in milliseconds) the cache dumps to disk (tgx sync).
Default: 0x1388
How to change:
# echo zfs_txg_synctime_ms/W0x2000 | mdb -kw
zfs_ssd_txg_synctime_ms
This sets how often (in milliseconds) the cache dumps to SSD disk (tgx sync). Only for SSD disks
Default: 0x2170
How to change:
# echo zfs_ssd_txg_synctime_ms/W0x21700 | mdb -kw
zfs_txg_timeout
Seconds between transaction group commits (delay between ZIL commits changes)
Default: 0x5
How to change:
#echo zfs_txg_timeout/W0t120 | mdb -kw
zfs_write_limit_min
Min tgx write limit
Default: 0x800000
How to change:
zfs_write_limit_max
Max tgx write limit
Default: 0xff98dc00
How to change:
zfs_write_limit_shift
log2(fraction of memory) per txg (int)
Default: 0x3
How to change:
zfs_write_limit_override
Override txg write limit
Default: 0x0
How to change:
# echo zfs_write_limit_override/W0t402653184 | mdb -kw
zfs_no_write_throttle
Disable write throttling
Default: 0x0
How to change:
# echo zfs_no_write_throttle/W 1 | mdb -kw
zfs_vdev_cache_max
essentially disables the vdev cache as the random I/Os are not going to be lower than XXX
Default: 0x4000
How to change:
zfs_vdev_cache_size
Total size of the per-disk cache
Default: 0x0
How to change:
zfs_vdev_cache_bshift
is the base 2 logarithm of the size used to read disks.
Default: 0x10
How to change:
zfs_vdev_max_pending
This parameter controls, how many I/O requests can be pending per vdev. For example when you have 100 disks visible from your OS with a zfs:zfs_vdev_max_pending of 2, you have 200 request outstanding at maximum. When you have 100 disks hidden behind your storage controller just showing a single LUN, you will have – you will know it – 2 pending requests at maximum.
Default: 0xa
How to change:
# echo zfs_vdev_max_pending/W0t35 | mdb –kw
zfs_vdev_min_pending
same that above.
Default: 0x4
How to change:
zfs_scrub_limit
maximum number of scrub/resilver I/O per leaf vdev
Default: 0xa
How to change:
zfs_vdev_time_shift
Deadline time shift for vdev I/O
Default: 0x6
How to change:
zfs_vdev_ramp_rate
Exponential I/O issue ramp-up rate
Default: 0x2
How to change:
zfs_vdev_aggregation_limit
Max vdev I/O aggregation size
Default: 0x20000
How to change:
zfs_nocacheflush
This parameter controls ZFS write cache flushes for the entire system.Oracle’s Sun hardware should not require tuning this parameter. If you need to tune cache flushing, considering tuning it per hardware device. See the general instructions below. Contact your storage vendor for instructions on how to tell the storage devices to ignore the cache flushes sent by ZFS.
Default: 0x1
How to change:
zil_replay_disable
Disable intent logging replay. Can be disabled for recovery from corrupted ZIL. If zil_replay_disable = 1
, then when a volume or filesystem is brought online, no attempt to replay the ZIL is made and any existing ZIL is destroyed. This can result in loss of data without notice.
Default: 0x0
How to change:
metaslab_df_alloc_threshold
The minimum free space, in percent, which must be available in a space map to continue allocations in a first-fit fashion. Once the space map’s free space drops below this level we dynamically switch to using best-fit allocations.
Default: 0x100000
How to change:
metaslab_df_free_pct
Percentage free space in metaslab
Default: 0x4
How to change:
zio_injection_enabled
Enable fault injection.
To handle fault injection, we keep track of a series of zinject_record_t structures which describe which logical block(s) should be injected with a fault. These are kept in a global list. Each record corresponds to a given spa_t and maintains a special hold on the spa_t so that it cannot be deleted or exported while the injection record exists. Device level injection is done using the ‘zi_guid’ field. If this is set, it means that the error is destined for a particular device, not a piece of data. This is a rather poor data structure and algorithm, but we don’t expect more than a few faults at any one time, so it should be sufficient for our needs.
Default: 0x0
How to change:
zfs_immediate_write_sz
Limit on data size being sent to the ZIL. (Синхронные записи будут записываться непосредственно в пул или записываться в slog. По умолчанию это 32k. Операции записи, превышающие это значение буду выполняться непосредственно в пуле)
Default: 0x8000
How to change:
zfs_read_chunk_size
Bytes to read per chunk
Default: 0x100000
How to change:
zfs_vdev_max_queue_wait
Is a factor used to trigger I/O starvation avoidance behavior. Used in conjunction with zfs_vdev_max_pending to track the earliest I/O that has been issued. If more than zfs_vdev_max_queue_wait full pending queues have been issued since, this I/O is being starved. Don’t accept any more I/Os. This will drain the pending queue until the starved I/O is processed.
Default: 0x4
How to change:
zfetch_max_streams
Max number of streams per zfetch (prefetch streams per
file).
Default: 0x8
zfetch_min_sec_reap
Min time before an active prefetch stream can be reclaimed
Default: 0x2
zfetch_block_cap
Max number of blocks to prefetch at a time
= 0x100
zfetch_array_rd_sz
If prefetching is enabled, disable prefetching for reads larger than this size.
Default: 0x100000
zfs_no_scrub_io
Set for no scrub I/O. Use 1 for yes and 0 for no (default).
Default: 0x0
zfs_no_scrub_prefetch
Set for no scrub prefetching. Use 1 for yes and 0 for no (default).
Default: 0x0
Unknown:
- 11.3
fzap_default_block_shift = 0xe
metaslab_gang_threshold = 0x100001
vdev_mirror_shift = 0x15
zvol_immediate_write_sz = 0x8000
zfs_no_scan_io = 0x0
zfs_no_scan_prefetch = 0x0
zfetch_maxbytes_ub = 0x2000000
zfetch_maxbytes_lb = 0x400000
zfetch_target_blks = 0x100
zfetch_throttle_interval = 0xa
zfetch_num_hash_buckets = 0x400000
zfetch_ageout = 0xa
zfetch_ageout_sleep_time = 0x2
zfs_default_bs = 0x9
zfs_default_ibs = 0xe
zfs_vdev_future_reads = 0x2
zfs_vdev_future_read_bytes = 0x40000
zfs_vdev_future_writes = 0x2
zfs_vdev_future_write_bytes = 0x40000
- 11.2 / 11.1
zfs_vdev_future_pending = 0xa
https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html