ZFS

openzfs

install

  • (This may break pool import) If given whole disk zfs will leave small partition at begin/end and mark it with a wholedisk property. That small space is useful for uefi bootloader via: (zfs raidz expansion uses this since 2.2.0) mkfs.vfat -F 16 /dev/disk/by-uuid/XXXXXX and grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=DISK1. A larger partition can allow proper fat32 fs type, grub installation(--boot-directory) and kernel/initramfs storage.
  • add to kmod to initramfs via mkinitcpio/dracut
    • ex. add zfs to HOOKS in /etc/defaults/mkinitcpio.conf
    • regen initramfs mkinitcpio -P
  • update-grub with zfs root (if on root)
  • Pam module for auto decrypt/mount on user login
    • /etc/pam.d/zfs-key
auth       optional                    pam_zfs_key.so homes=zroot/data/home runstatedir=/run/pam_zfs_key
session [success=1 default=ignore]     pam_succeed_if.so service = systemd-user quiet
session    optional                    pam_zfs_key.so homes=zroot/data/home runstatedir=/run/pam_zfs_key
password   optional                    pam_zfs_key.so homes=zroot/data/home runstatedir=/run/pam_zfs_key
  • /etc/pam.d/system-auth and /etc/pam.d/su-l auth include zfs-key session include zfs-key password include zfs-key
  • Manual pam script
    • /etc/pam.d/system-auth auth optional pam_exec.so expose_authtok /sbin/zfs-pam-login
      • zfs-pam-login PASS=$(cat -) zfs load-key "${ZFS_HOME_VOL}" <<< "${PASS}" || continue zfs mount "${ZFS_HOME_VOL}" || true
  • add systemd services for device scanning/import/automounting
    • set cache if not scanning for pools zpool set cachefile=/etc/zfs/zpool.cache POOL
      • systemctl enable zfs-import-cache
      • systemctl enable zfs-import.target
    • enable mounts if not using ZED
      • systemctl enable zfs-mount
      • systemctl enable zfs.target
  • set arc memory in kernel params(grub), initramfs /etc/default/zfs or modprobe params /etc/modprobe.d/zfs.conf

usage

  • zpool scrub(error check), resilver(parity), trim(ssd), adding/removing disks
  • zfs Mounting, keys, snapshots, rollbacks

notes

  • https://github.com/OpenIndiana/time-slider python time slider tool for backups
  • GPTZFSBOOT for bios zfs boot
  • If you lose a vdev in a pool you LOSE THE POOL
  • Autoexpand allows the 'safe' thing of smallest partition that can grow. WIP raidz expand pool size.
    • Manual pool config can get more out of smaller disks with the same redundancy
      • Linux 4.12+ udev IO rule for zfs_vdev_scheduler to reduce cpu for manual partition
  • When expanding rebalancing is not done leaving potentially higher resilver times in the future increasing the chance of cascading failure.
    • snapshot, make tmp dataset, send | recv to new dataset to redistribute blocks, destroy old snapshot, rename dataset
      • manual de-dupe can be done with cp --reflink as of 2.2.0
      • sudo zfs send zroot/data/tmp@snap-1 | sudo zfs receive -Fduv POOL will create POOL/data/tmp@snap-1
  • Sparse files can be useful for testing/migrating setups if the enough storage is actually present(piecemeal the datasets)
    • dd if=/dev/zero of=/zpool-file bs=1M count=1024
      • zpool create test /zpool-file
  • zpool attach POOL EXISTING_DEVICE_ID NEW_DEVICE_ID to create a mirror
  • zfs set xattr=sa
  • zfs set acltype=posixacl
  • zpool set feature@large_blocks=enabled ztank to enable larger blocks
  • zpool set feature@encryption=enabled ztank to enable encryption
  • zfs set canmount=noauto ztank/dataset to disable auto mounting
  • relatime for normal timestamps
  • SLOG requires devices that will write data on power loss
  • SPECIAL vdevs store metadata (good for ssd) but need redundancy as they can take the pool down
  • spare drive helps resilver time (zed auto replace)
  • Single device zfs can use the COPIES attribute to help redundancy
  • /tmp sync off
  • enable sharing on dataset for nfs
  • set snapdir of dataset to visible for .zfs/snapshots
  • L2ARC/ssd cache with persistence(2.0+) for arc speed
    • L2ARC has default l2arc_write_max of 8MiB/s and 8MiB/s burst (to fill up cache)
    • uses arc ram (more for smaller blocks) to index
  • zfs using cpu despite not being in use
  • https://github.com/kimono-koans/httm for snapshot traversal
  • zfs create -o recordsize=X ztank/dataset for bittorrent 16k, 1M for sequential, 64k for sqlite(w/pagesize), 4K for vms/monero(system page size)
    • must be set at creation time
  • ztest is userspace w/o zvols
  • compresses with blocks 7/8 of original size
    • non default compression algo zstd is simd optimized (older chips)
  • RAM ECC check for ARC modprobe zfs zfs_flags=16
  • https://jro.io/truenas/openzfs/
    • 6 vdev raidz2 eliminates allocation overhead, 3/5/9/17 raidz1, 7/11/19 raidz3
  • slop space for reserved space under high usage to ensure operations can complete
  • bclone or BRT allows commands like cp to copy almost instantly by only incrementing the block counter
    • v2.2.0 dedupe
    • only different blocks will be overwritten (common file parts shared)
    • can not be maintained across send/recv (depends on-disk references)
    • zpool sync if creating in a loop to ensure the TX finishes
  • https://scholarworks.wm.edu/cgi/viewcontent.cgi?article=2720&context=honorstheses chapter 3 for description of zfs architecture
#!/usr/bin/env -S guix shell --pure bash coreutils zfs --

#
# GuixSD install script synthesized from:
#
#   - mx00s's install.sh (https://gist.github.com/mx00s/ea2462a3fe6fdaa65692fe7ee824de3e)
#   - Erase Your Darlings (https://grahamc.com/blog/erase-your-darlings)
#   - ZFS Datasets for NixOS (https://grahamc.com/blog/nixos-on-zfs)
#   - NixOS Manual (https://nixos.org/nixos/manual/)
#
# It expects the name of the block device (e.g. 'sda') to partition
# and install GuixSD on and an authorized public ssh key to log in as
# 'root' remotely. The script must also be executed as root.
#
# Example: `sudo ./install.sh sde "ssh-rsa AAAAB..."`
#

set -euo pipefail

################################################################################

export COLOR_RESET="\033[0m"
export RED_BG="\033[41m"
export BLUE_BG="\033[44m"

function err {
    echo -e "${RED_BG}$1${COLOR_RESET}"
}

function info {
    echo -e "${BLUE_BG}$1${COLOR_RESET}"
}

################################################################################

export DISK=$1
export AUTHORIZED_SSH_KEY=$2

if ! [[ -v DISK ]]; then
    err "Missing argument. Expected block device name, e.g. 'sda'"
    exit 1
fi

export DISK_PATH="/dev/${DISK}"

if ! [[ -b "$DISK_PATH" ]]; then
    err "Invalid argument: '${DISK_PATH}' is not a block special file"
    exit 1
fi

if ! [[ -v AUTHORIZED_SSH_KEY ]]; then
    err "Missing argument. Expected public SSH key, e.g. 'ssh-rsa AAAAB...'"
    exit 1
fi

if [[ "$EUID" > 0 ]]; then
    err "Must run as root"
    exit 1
fi

export ZFS_POOL="rpool"

# ephemeral datasets
export ZFS_LOCAL="${ZFS_POOL}/local"
export ZFS_DS_ROOT="${ZFS_LOCAL}/root"
export ZFS_DS_GUIX="${ZFS_LOCAL}/guix"
export ZFS_DS_VAR_GUIX="${ZFS_LOCAL}/var-guix"

# persistent datasets
export ZFS_SAFE="${ZFS_POOL}/safe"
export ZFS_DS_HOME="${ZFS_SAFE}/home"
export ZFS_DS_PERSIST="${ZFS_SAFE}/persist"

export ZFS_BLANK_SNAPSHOT="${ZFS_DS_ROOT}@blank"

################################################################################

info "Running the UEFI (GPT) partitioning and formatting directions from the NixOS manual ..."
parted "$DISK_PATH" -- mklabel gpt
parted "$DISK_PATH" -- mkpart primary 512MiB 100%
parted "$DISK_PATH" -- mkpart ESP fat32 1MiB 512MiB
parted "$DISK_PATH" -- set 2 boot on
export DISK_PART_ROOT="${DISK_PATH}1"
export DISK_PART_BOOT="${DISK_PATH}2"

info "Formatting boot partition ..."
mkfs.fat -F 32 -n boot "$DISK_PART_BOOT"

info "Creating '$ZFS_POOL' ZFS pool for '$DISK_PART_ROOT' ..."
zpool create -f "$ZFS_POOL" "$DISK_PART_ROOT"

info "Enabling compression for '$ZFS_POOL' ZFS pool ..."
zfs set compression=on "$ZFS_POOL"

info "Creating '$ZFS_DS_ROOT' ZFS dataset ..."
zfs create -p -o mountpoint=legacy "$ZFS_DS_ROOT"

info "Configuring extended attributes setting for '$ZFS_DS_ROOT' ZFS dataset ..."
zfs set xattr=sa "$ZFS_DS_ROOT"

info "Configuring access control list setting for '$ZFS_DS_ROOT' ZFS dataset ..."
zfs set acltype=posixacl "$ZFS_DS_ROOT"

info "Creating '$ZFS_BLANK_SNAPSHOT' ZFS snapshot ..."
zfs snapshot "$ZFS_BLANK_SNAPSHOT"

info "Mounting '$ZFS_DS_ROOT' to /mnt/guix ..."
mkdir /mnt/guix
mount -t zfs "$ZFS_DS_ROOT" /mnt/guix

info "Mounting '$DISK_PART_BOOT' to /mnt/guix/boot ..."
mkdir /mnt/guix/boot
mount -t vfat "$DISK_PART_BOOT" /mnt/guix/boot

info "Creating '$ZFS_DS_GUIX' ZFS dataset ..."
zfs create -p -o mountpoint=legacy "$ZFS_DS_GUIX"

info "Disabling access time setting for '$ZFS_DS_GUIX' ZFS dataset ..."
zfs set atime=off "$ZFS_DS_GUIX"

info "Mounting '$ZFS_DS_GUIX' to /mnt/guix/gnu ..."
mkdir /mnt/guix/gnu
mount -t zfs "$ZFS_DS_GUIX" /mnt/guix/gnu

info "Creating '$ZFS_DS_VAR_GUIX' ZFS dataset ..."
zfs create -p -o mountpoint=legacy "$ZFS_DS_VAR_GUIX"

info "Mounting '$ZFS_DS_VAR_GUIX' to /mnt/guix/var/guix ..."
mkdir -p /mnt/guix/var/guix
mount -t zfs "$ZFS_DS_VAR_GUIX" /mnt/guix/var/guix

info "Creating '$ZFS_DS_HOME' ZFS dataset ..."
zfs create -p -o mountpoint=legacy "$ZFS_DS_HOME"

info "Mounting '$ZFS_DS_HOME' to /mnt/guix/home ..."
mkdir /mnt/guix/home
mount -t zfs "$ZFS_DS_HOME" /mnt/guix/home

info "Creating '$ZFS_DS_PERSIST' ZFS dataset ..."
zfs create -p -o mountpoint=legacy "$ZFS_DS_PERSIST"

info "Mounting '$ZFS_DS_PERSIST' to /mnt/guix/persist ..."
mkdir /mnt/guix/persist
mount -t zfs "$ZFS_DS_PERSIST" /mnt/guix/persist

info "Permit ZFS auto-snapshots on ${ZFS_SAFE}/* datasets ..."
zfs set com.sun:auto-snapshot=true "$ZFS_DS_HOME"
zfs set com.sun:auto-snapshot=true "$ZFS_DS_PERSIST"

info "Creating persistent directory for host SSH keys ..."
mkdir -p /mnt/guix/persist/etc/ssh

info "Enter password for the root user ..."
ROOT_PASSWORD_HASH="$(mkpasswd -m sha-512 | sed 's/\$/\\$/g')"

info "Enter personal user name ..."
read USER_NAME

info "Enter password for '${USER_NAME}' user ..."
USER_PASSWORD_HASH="$(mkpasswd -m sha-512 | sed 's/\$/\\$/g')"

info "Writing GuixSD configuration to /persist/guix-config/config.scm ..."
cat <<EOF > /mnt/guix/persist/guix-config/config.scm
;; -*- mode: scheme; -*-
;; This is an operating system configuration template
;; for a "desktop" setup with Xfce where the root
;; partition is on ZFS and rolled back to @blank
;; before boot.

(use-modules (gnu) (gnu system nss) (guix utils))
(use-service-modules desktop sddm)
(use-package-modules certs gnome)

;; This is our first monkey-patch.
(set! (@ (gnu system file-systems) %pseudo-file-system-types)
  (cons "zfs" %pseudo-file-system-types))

(define %initrd/pre-mount
  (with-imported-modules (source-module-closure
                          '((guix build syscalls)
                            (guix build utils)))
    #~(begin
        (use-modules (gnu build file-systems)
                     (gnu build linux-boot)
                     ((guix build syscalls)
                      #:hide (file-system-type))
                     (guix build utils))

        ;; XXX: Major Hack! Enables mounting ZFS datasets via legacy mountpoints.
        (let ((orig (@ (gnu build file-systems) canonicalize-device-spec)))
          (set! (@ (gnu build file-systems) canonicalize-device-spec)
            (lambda (spec)
              (let ((device (if (file-system-label? spec)
                                (file-system-label->string spec)
                                spec)))
                (if (and (string? device)
                         (char-set-contains? char-set:letter (string-ref device 0))
                         (#$%initrd/import-device-zpool device))
                    device
                    (orig spec))))))

        ;; In my actual config this is where I run plymouth and decrypt keyfiles
        ;; (but call `load-key' in a per-dataset loop below).
        )))

(define %initrd/import-device-zpool
  #~(lambda (device)
      (let ((zpool (substring device 0 (or (string-index device #\/) 0)))
            (present? (lambda (device)
                        (and (not (zero? (string-length device)))
                             (zero? (system* #$(file-append zfs "/sbin/zfs")
                                             "list" device))))))
        (unless (or (zero? (string-length zpool))
                    (present? device))
          (invoke #$(file-append zfs "/sbin/zpool") "import" zpool)

          ;; Here's where the rollback happens.
          ;;
          ;; In my actual config I have an ugly loop that handles multiple
          ;; zpools and decryption via load-key, hence the more dynamic parsing
          ;; above.
          ;;
          ;; We're just gonna do this for illustrative purposes:
          (when (equal? zpool "zpool")
            (system* #$(file-append zfs "/sbin/zfs")
                     "rollback" "zpool/local/root@blank"))))))

(define (%initrd file-systems . kwargs)
  (apply raw-initrd
    (cons file-systems
          (substitute-keyword-arguments kwargs
            ((#:linux linux)
             #;OMITTED)
            ((#:pre-mount pre-mount #t)
             #~(begin #$%initrd/pre-mount
                      #$pre-mount))))))

(define %users
  (cons (user-account
                  (name "${USER_NAME}")
                  (id 1000) ; Put a pin in this.
                  (password "${USER_PASSWORD_HASH}")
                  (supplementary-groups '("wheel" "netdev"
                                          "audio" "video")))
                 %base-user-accounts))

(operating-system
  (host-name "antelope")
  (timezone "America/Los_Angeles")
  (locale "en_US.utf8")

  ;; Use the UEFI variant of GRUB with the EFI System
  ;; Partition mounted on /boot/efi.
  (bootloader (bootloader-configuration
                (bootloader grub-efi-bootloader)
                (targets '("/boot/efi"))))

  ;; ====================
  ;; SUBSTANTIAL OMISSION
  ;; ====================
  ;;
  ;; The kernel package needs to have the ZFS module either built-in or in
  ;; its `modules' output. This is left as an exercise to the reader because
  ;; my current solution involves building the kernel several times,
  ;; desperately needs re-worked, and is too long / abstracted to
  ;; include here. Said monstrosity also ensures that the ZFS module
  ;; is built against the correct kernel by setting the package's `#:linux'
  ;; argument.
  ;;
  (kernel #;OMITTED)
  (initrd %initrd)

  ;; The rest of the neccessary ZFS bits and bobs *are* included.
  (initrd-modules
    (cons "zfs" %base-initrd-modules))

  (file-systems (append
                 (list (file-system
                         (mount-point "/")
                         (device "rpool/local/root")
                         (type "zfs"))
                       (map (match-lambda
                              ((d mp)
                               (file-system
                                 (mount-point mp)
                                 (device d)
                                 (type "zfs")
                                 (needed-for-boot? #t))))
                            '(("rpool/local/root"     . "/")
                              ("rpool/local/guix"     . "/gnu")
                              ("rpool/local/var-guix" . "/var/guix")
                              ("rpool/safe/home"      . "/home")
                              ("rpool/safe/persist"   . "/persist")))
                       (file-system
                         (mount-point "/boot")
                         (device (uuid "6f62e623-5aa9-4681-a6da-9e0a68e7fbfb"))
                         (type "ext4"))
                       (file-system
                         (device (uuid "1234-ABCD" 'fat))
                         (mount-point "/boot/efi")
                         (type "vfat")))
                 %base-file-systems))

  (users %users)

  (packages (append (list
                      zfs
                      nss-certs ;; for HTTPS access
                      gvfs)     ;; for user mounts
                    %base-packages))

  (services (cons* (service xfce-desktop-service-type)
                   (simple-service 'zfs-mod-loader
                                   kernel-module-loader-service-type
                                   '("zfs"))
                   (simple-service 'zfs-udev-rules
                                   udev-service-type
                                   `(,zfs)))
                   ;;
                   ;; Some directories may have already been populated by other
                   ;; activation services on first run, so the function below
                   ;; will move them into /persist before creating a symlink.
                   ;;
                   ;; I've thought about doing this in the initfs, but I don't
                   ;; think we have a hook between file-system-mounts and
                   ;; activation so we'd have to mount/unmount the datsets
                   ;; ourselves ahead of when Guix mounts them...
                   ;;
                   (simple-service 'symlink-activation activation-service-type
                     (with-imported-modules (source-module-closure
                                             '((guix build utils)))
                       #~(begin
                           (use-modules (ice-9 match)
                                        (guix build utils))
                           (map (lambda (lst)
                                  (apply (lambda* (dest src #:optional mode user group)
                                           (let ((users '#$(map (lambda (u) (cons (user-account-name u) (user-account-uid u)))
                                                                %users))
                                                 (groups '#$(map (lambda (g) (cons (user-group-name g) (user-group-id g)))
                                                                 %base-groups))
                                                 (get-id (lambda (name file)
                                                           (let* ((port (open-pipe* OPEN_READ #$(file-append gawk "/bin/gawk")
                                                                                    "-F:" "$1 == NAME {print $3}" (string-append "NAME=" name)
                                                                                    file))
                                                                  (str (read-line port)))
                                                             (close-pipe port)
                                                             (string->number str)))))
                                             (unless (or (not user) (number? user))
                                               (set! user (or (assoc-ref users user)
                                                              (get-id user "/etc/passwd"))))
                                             (unless (or (not group) (number? group))
                                               (set! group (or (assoc-ref groups group)
                                                               (get-id group "/etc/group")))))

                                           ;; src->dest = persist->root-fs, like a symlink:
                                           (mkdir-p (dirname dest))
                                           (let ((perms-target (if src src dest))
                                                 (tempfile (string-append dest ".tmp")))
                                             (if (string-suffix? "/" perms-target)
                                                 (mkdir-p perms-target)
                                                 (mkdir-p (dirname perms-target)))
                                             (when (and src (file-exists? dest))
                                               (unless (file-exists? src)
                                                 (copy-recursively dest src
                                                                   #:keep-permissions? #t))
                                               (delete-file-recursively dest))
                                             (when src
                                               (when (file-exists? tempfile)
                                                 (delete-file tempfile))
                                               (symlink src tempfile)
                                               (rename-file tempfile dest))
                                             (when (file-exists? perms-target)
                                               (chown perms-target (or user -1) (or group -1))
                                               (when mode (chmod perms-target mode)))))
                                         lst))
                             ;; Fresh parent directories and omitted modes default to '#o755 root:root'.
                             ;; TODO: Please use the specified permissions for fresh parent directories.
                             '(("/etc/NetworkManager/system-connections" "/persist/etc/NetworkManager/system-connections/")
                               ("/etc/machine-id"                        "/persist/etc/machine-id" #o644)
                               ("/etc/ssh"                               "/persist/etc/ssh/")
                               ;; Vim won't start without =/var/tmp=.
                               ("/var/tmp"                               #f))))))
                   %desktop-services))