ZFS
openzfs
install
- (This may break pool import) If given whole disk zfs will leave small partition at begin/end and mark it with a wholedisk property.
That small space is useful for uefi bootloader via: (zfs raidz expansion uses this since 2.2.0)
mkfs.vfat -F 16 /dev/disk/by-uuid/XXXXXX
andgrub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=DISK1
. A larger partition can allow proper fat32 fs type, grub installation(--boot-directory
) and kernel/initramfs storage. - add to kmod to initramfs via mkinitcpio/dracut
- ex. add zfs to HOOKS in
/etc/defaults/mkinitcpio.conf
- regen initramfs
mkinitcpio -P
- ex. add zfs to HOOKS in
- update-grub with zfs root (if on root)
- Pam module for auto decrypt/mount on user login
- /etc/pam.d/zfs-key
auth optional pam_zfs_key.so homes=zroot/data/home runstatedir=/run/pam_zfs_key session [success=1 default=ignore] pam_succeed_if.so service = systemd-user quiet session optional pam_zfs_key.so homes=zroot/data/home runstatedir=/run/pam_zfs_key password optional pam_zfs_key.so homes=zroot/data/home runstatedir=/run/pam_zfs_key
- /etc/pam.d/system-auth and /etc/pam.d/su-l auth include zfs-key session include zfs-key password include zfs-key
- Manual pam script
/etc/pam.d/system-auth
auth optional pam_exec.so expose_authtok /sbin/zfs-pam-login
- zfs-pam-login
PASS=$(cat -)
zfs load-key "${ZFS_HOME_VOL}" <<< "${PASS}" || continue
zfs mount "${ZFS_HOME_VOL}" || true
- zfs-pam-login
- add systemd services for device scanning/import/automounting
- set cache if not scanning for pools
zpool set cachefile=/etc/zfs/zpool.cache POOL
systemctl enable zfs-import-cache
systemctl enable zfs-import.target
- enable mounts if not using ZED
systemctl enable zfs-mount
systemctl enable zfs.target
- set cache if not scanning for pools
- set arc memory in kernel params(grub), initramfs
/etc/default/zfs
or modprobe params/etc/modprobe.d/zfs.conf
usage
zpool
scrub(error check), resilver(parity), trim(ssd), adding/removing diskszfs
Mounting, keys, snapshots, rollbacks
notes
- If you lose a vdev in a pool you LOSE THE POOL
- Autoexpand allows the 'safe' thing of smallest partition that can grow. WIP raidz expand pool size.
- Manual pool config can get more out of smaller disks with the same redundancy
- Linux 4.12+ udev IO rule for
zfs_vdev_scheduler
to reduce cpu for manual partition
- Linux 4.12+ udev IO rule for
- Manual pool config can get more out of smaller disks with the same redundancy
- When expanding rebalancing is not done leaving potentially higher resilver times in the future increasing the chance of cascading failure.
- snapshot, make tmp dataset, send | recv to new dataset to redistribute blocks, destroy old snapshot, rename dataset
- manual de-dupe can be done with
cp --reflink
as of 2.2.0 sudo zfs send zroot/data/tmp@snap-1 | sudo zfs receive -Fduv POOL
will createPOOL/data/tmp@snap-1
- manual de-dupe can be done with
- snapshot, make tmp dataset, send | recv to new dataset to redistribute blocks, destroy old snapshot, rename dataset
- Sparse files can be useful for testing/migrating setups if the enough storage is actually present(piecemeal the datasets)
dd if=/dev/zero of=/zpool-file bs=1M count=1024
zpool create test /zpool-file
zpool attach POOL EXISTING_DEVICE_ID NEW_DEVICE_ID
to create a mirrorzfs set xattr=sa
zfs set acltype=posixacl
zpool set feature@large_blocks=enabled ztank
to enable larger blockszpool set feature@encryption=enabled ztank
to enable encryptionzfs set canmount=noauto ztank/dataset
to disable auto mounting- relatime for normal timestamps
- SLOG requires devices that will write data on power loss
- SPECIAL vdevs store metadata (good for ssd) but need redundancy as they can take the pool down
- spare drive helps resilver time (zed auto replace)
- Single device zfs can use the COPIES attribute to help redundancy
- /tmp sync off
- enable sharing on dataset for nfs
- set snapdir of dataset to visible for .zfs/snapshots
- L2ARC/ssd cache with persistence(2.0+) for arc speed
- L2ARC has default
l2arc_write_max
of 8MiB/s and 8MiB/s burst (to fill up cache) - uses arc ram (more for smaller blocks) to index
- L2ARC has default
- zfs using cpu despite not being in use
- https://github.com/kimono-koans/httm for snapshot traversal
zfs create -o recordsize=X ztank/dataset
for bittorrent 16k, 1M for sequential, 64k for sqlite(w/pagesize), 4K for vms/monero(system page size)- must be set at creation time
- ztest is userspace w/o zvols
- openebs cstor uses userspace openzfs with custom zrepl (based on 0.7.9)
- https://github.com/iomesh/zfs/tree/uzfs-dev is userspace fork using fuse vfs
- compresses with blocks 7/8 of original size
- non default compression algo zstd is simd optimized (older chips)
- RAM ECC check for ARC
modprobe zfs zfs_flags=16
- https://jro.io/truenas/openzfs/
- 6 vdev raidz2 eliminates allocation overhead, 3/5/9/17 raidz1, 7/11/19 raidz3
- slop space for reserved space under high usage to ensure operations can complete