zfs

ZFS

openzfs

install

(This may break pool import) If given whole disk zfs will leave small partition at begin/end and mark it with a wholedisk property. That small space is useful for uefi bootloader via: (zfs raidz expansion uses this since 2.2.0) mkfs.vfat -F 16 /dev/disk/by-uuid/XXXXXX and grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=DISK1. A larger partition can allow proper fat32 fs type, grub installation(--boot-directory) and kernel/initramfs storage.
add to kmod to initramfs via mkinitcpio/dracut
- ex. add zfs to HOOKS in /etc/defaults/mkinitcpio.conf
- regen initramfs mkinitcpio -P
update-grub with zfs root (if on root)
Pam module for auto decrypt/mount on user login
- /etc/pam.d/zfs-key

auth       optional                    pam_zfs_key.so homes=zroot/data/home runstatedir=/run/pam_zfs_key
session [success=1 default=ignore]     pam_succeed_if.so service = systemd-user quiet
session    optional                    pam_zfs_key.so homes=zroot/data/home runstatedir=/run/pam_zfs_key
password   optional                    pam_zfs_key.so homes=zroot/data/home runstatedir=/run/pam_zfs_key

/etc/pam.d/system-auth and /etc/pam.d/su-l auth include zfs-key session include zfs-key password include zfs-key

Manual pam script
- /etc/pam.d/system-auth auth optional pam_exec.so expose_authtok /sbin/zfs-pam-login
  - zfs-pam-login PASS=$(cat -) zfs load-key "${ZFS_HOME_VOL}" <<< "${PASS}" || continue zfs mount "${ZFS_HOME_VOL}" || true
add systemd services for device scanning/import/automounting
- set cache if not scanning for pools zpool set cachefile=/etc/zfs/zpool.cache POOL
  - systemctl enable zfs-import-cache
  - systemctl enable zfs-import.target
- enable mounts if not using ZED
  - systemctl enable zfs-mount
  - systemctl enable zfs.target
set arc memory in kernel params(grub), initramfs /etc/default/zfs or modprobe params /etc/modprobe.d/zfs.conf

usage

zpool scrub(error check), resilver(parity), trim(ssd), adding/removing disks
zfs Mounting, keys, snapshots, rollbacks

notes

zfs anyraid for different sized disk pools
https://github.com/OpenIndiana/time-slider python time slider tool for backups
GPTZFSBOOT for bios zfs boot
If you lose a vdev in a pool you LOSE THE POOL
Autoexpand allows the 'safe' thing of smallest partition that can grow. WIP raidz expand pool size.
- Manual pool config can get more out of smaller disks with the same redundancy
  - Linux 4.12+ udev IO rule for zfs_vdev_scheduler to reduce cpu for manual partition
When expanding rebalancing is not done leaving potentially higher resilver times in the future increasing the chance of cascading failure.
- snapshot, make tmp dataset, send | recv to new dataset to redistribute blocks, destroy old snapshot, rename dataset
  - manual de-dupe can be done with cp --reflink as of 2.2.0
  - sudo zfs send zroot/data/tmp@snap-1 | sudo zfs receive -Fduv POOL will create POOL/data/tmp@snap-1
Sparse files can be useful for testing/migrating setups if the enough storage is actually present(piecemeal the datasets)
- dd if=/dev/zero of=/zpool-file bs=1M count=1024
  - zpool create test /zpool-file
zpool attach POOL EXISTING_DEVICE_ID NEW_DEVICE_ID to create a mirror
zfs set xattr=sa
zfs set acltype=posixacl
zpool set feature@large_blocks=enabled ztank to enable larger blocks
zpool set feature@encryption=enabled ztank to enable encryption
zfs set canmount=noauto ztank/dataset to disable auto mounting
relatime for normal timestamps
SLOG requires devices that will write data on power loss
SPECIAL vdevs store metadata (good for ssd) but need redundancy as they can take the pool down
spare drive helps resilver time (zed auto replace)
Single device zfs can use the COPIES attribute to help redundancy
/tmp sync off
enable sharing on dataset for nfs
set snapdir of dataset to visible for .zfs/snapshots
L2ARC/ssd cache with persistence(2.0+) for arc speed
- L2ARC has default l2arc_write_max of 8MiB/s and 8MiB/s burst (to fill up cache)
- uses arc ram (more for smaller blocks) to index
zfs using cpu despite not being in use
https://github.com/kimono-koans/httm for snapshot traversal
zfs create -o recordsize=X ztank/dataset for bittorrent 16k, 1M for sequential, 64k for sqlite(w/pagesize), 4K for vms/monero(system page size)
- must be set at creation time
ztest is userspace w/o zvols
- openebs cstor uses userspace openzfs with custom zrepl (based on 0.7.9)
- https://github.com/iomesh/zfs/tree/uzfs-dev is userspace fork using fuse vfs
compresses with blocks 7/8 of original size
- non default compression algo zstd is simd optimized (older chips)
RAM ECC check for ARC modprobe zfs zfs_flags=16
https://jro.io/truenas/openzfs/
- 6 vdev raidz2 eliminates allocation overhead, 3/5/9/17 raidz1, 7/11/19 raidz3
slop space for reserved space under high usage to ensure operations can complete
bclone or BRT allows commands like cp to copy almost instantly by only incrementing the block counter
- v2.2.0 dedupe
- only different blocks will be overwritten (common file parts shared)
- can not be maintained across send/recv (depends on-disk references)
- zpool sync if creating in a loop to ensure the TX finishes
https://scholarworks.wm.edu/cgi/viewcontent.cgi?article=2720&context=honorstheses chapter 3 for description of zfs architecture

#!/usr/bin/env -S guix shell --pure bash coreutils zfs --

#
# GuixSD install script synthesized from:
#
#   - mx00s's install.sh (https://gist.github.com/mx00s/ea2462a3fe6fdaa65692fe7ee824de3e)
#   - Erase Your Darlings (https://grahamc.com/blog/erase-your-darlings)
#   - ZFS Datasets for NixOS (https://grahamc.com/blog/nixos-on-zfs)
#   - NixOS Manual (https://nixos.org/nixos/manual/)
#
# It expects the name of the block device (e.g. 'sda') to partition
# and install GuixSD on and an authorized public ssh key to log in as
# 'root' remotely. The script must also be executed as root.
#
# Example: `sudo ./install.sh sde "ssh-rsa AAAAB..."`
#

set -euo pipefail

################################################################################

export COLOR_RESET="\033[0m"
export RED_BG="\033[41m"
export BLUE_BG="\033[44m"

function err {
    echo -e "${RED_BG}$1${COLOR_RESET}"
}

function info {
    echo -e "${BLUE_BG}$1${COLOR_RESET}"
}

################################################################################

export DISK=$1
export AUTHORIZED_SSH_KEY=$2

if ! [[ -v DISK ]]; then
    err "Missing argument. Expected block device name, e.g. 'sda'"
    exit 1
fi

export DISK_PATH="/dev/${DISK}"

if ! [[ -b "$DISK_PATH" ]]; then
    err "Invalid argument: '${DISK_PATH}' is not a block special file"
    exit 1
fi

if ! [[ -v AUTHORIZED_SSH_KEY ]]; then
    err "Missing argument. Expected public SSH key, e.g. 'ssh-rsa AAAAB...'"
    exit 1
fi

if [[ "$EUID" > 0 ]]; then
    err "Must run as root"
    exit 1
fi

export ZFS_POOL="rpool"

# ephemeral datasets
export ZFS_LOCAL="${ZFS_POOL}/local"
export ZFS_DS_ROOT="${ZFS_LOCAL}/root"
export ZFS_DS_GUIX="${ZFS_LOCAL}/guix"
export ZFS_DS_VAR_GUIX="${ZFS_LOCAL}/var-guix"

# persistent datasets
export ZFS_SAFE="${ZFS_POOL}/safe"
export ZFS_DS_HOME="${ZFS_SAFE}/home"
export ZFS_DS_PERSIST="${ZFS_SAFE}/persist"

export ZFS_BLANK_SNAPSHOT="${ZFS_DS_ROOT}@blank"

################################################################################

info "Running the UEFI (GPT) partitioning and formatting directions from the NixOS manual ..."
parted "$DISK_PATH" -- mklabel gpt
parted "$DISK_PATH" -- mkpart primary 512MiB 100%
parted "$DISK_PATH" -- mkpart ESP fat32 1MiB 512MiB
parted "$DISK_PATH" -- set 2 boot on
export DISK_PART_ROOT="${DISK_PATH}1"
export DISK_PART_BOOT="${DISK_PATH}2"

info "Formatting boot partition ..."
mkfs.fat -F 32 -n boot "$DISK_PART_BOOT"

info "Creating '$ZFS_POOL' ZFS pool for '$DISK_PART_ROOT' ..."
zpool create -f "$ZFS_POOL" "$DISK_PART_ROOT"

info "Enabling compression for '$ZFS_POOL' ZFS pool ..."
zfs set compression=on "$ZFS_POOL"

info "Creating '$ZFS_DS_ROOT' ZFS dataset ..."
zfs create -p -o mountpoint=legacy "$ZFS_DS_ROOT"

info "Configuring extended attributes setting for '$ZFS_DS_ROOT' ZFS dataset ..."
zfs set xattr=sa "$ZFS_DS_ROOT"

info "Configuring access control list setting for '$ZFS_DS_ROOT' ZFS dataset ..."
zfs set acltype=posixacl "$ZFS_DS_ROOT"

info "Creating '$ZFS_BLANK_SNAPSHOT' ZFS snapshot ..."
zfs snapshot "$ZFS_BLANK_SNAPSHOT"

info "Mounting '$ZFS_DS_ROOT' to /mnt/guix ..."
mkdir /mnt/guix
mount -t zfs "$ZFS_DS_ROOT" /mnt/guix

info "Mounting '$DISK_PART_BOOT' to /mnt/guix/boot ..."
mkdir /mnt/guix/boot
mount -t vfat "$DISK_PART_BOOT" /mnt/guix/boot

info "Creating '$ZFS_DS_GUIX' ZFS dataset ..."
zfs create -p -o mountpoint=legacy "$ZFS_DS_GUIX"

info "Disabling access time setting for '$ZFS_DS_GUIX' ZFS dataset ..."
zfs set atime=off "$ZFS_DS_GUIX"

info "Mounting '$ZFS_DS_GUIX' to /mnt/guix/gnu ..."
mkdir /mnt/guix/gnu
mount -t zfs "$ZFS_DS_GUIX" /mnt/guix/gnu

info "Creating '$ZFS_DS_VAR_GUIX' ZFS dataset ..."
zfs create -p -o mountpoint=legacy "$ZFS_DS_VAR_GUIX"

info "Mounting '$ZFS_DS_VAR_GUIX' to /mnt/guix/var/guix ..."
mkdir -p /mnt/guix/var/guix
mount -t zfs "$ZFS_DS_VAR_GUIX" /mnt/guix/var/guix

info "Creating '$ZFS_DS_HOME' ZFS dataset ..."
zfs create -p -o mountpoint=legacy "$ZFS_DS_HOME"

info "Mounting '$ZFS_DS_HOME' to /mnt/guix/home ..."
mkdir /mnt/guix/home
mount -t zfs "$ZFS_DS_HOME" /mnt/guix/home

info "Creating '$ZFS_DS_PERSIST' ZFS dataset ..."
zfs create -p -o mountpoint=legacy "$ZFS_DS_PERSIST"

info "Mounting '$ZFS_DS_PERSIST' to /mnt/guix/persist ..."
mkdir /mnt/guix/persist
mount -t zfs "$ZFS_DS_PERSIST" /mnt/guix/persist

info "Permit ZFS auto-snapshots on ${ZFS_SAFE}/* datasets ..."
zfs set com.sun:auto-snapshot=true "$ZFS_DS_HOME"
zfs set com.sun:auto-snapshot=true "$ZFS_DS_PERSIST"

info "Creating persistent directory for host SSH keys ..."
mkdir -p /mnt/guix/persist/etc/ssh

info "Enter password for the root user ..."
ROOT_PASSWORD_HASH="$(mkpasswd -m sha-512 | sed 's/\$/\\$/g')"

info "Enter personal user name ..."
read USER_NAME

info "Enter password for '${USER_NAME}' user ..."
USER_PASSWORD_HASH="$(mkpasswd -m sha-512 | sed 's/\$/\\$/g')"

info "Writing GuixSD configuration to /persist/guix-config/config.scm ..."
cat <<EOF > /mnt/guix/persist/guix-config/config.scm
;; -*- mode: scheme; -*-
;; This is an operating system configuration template
;; for a "desktop" setup with Xfce where the root
;; partition is on ZFS and rolled back to @blank
;; before boot.

(use-modules (gnu) (gnu system nss) (guix utils))
(use-service-modules desktop sddm)
(use-package-modules certs gnome)

;; This is our first monkey-patch.
(set! (@ (gnu system file-systems) %pseudo-file-system-types)
  (cons "zfs" %pseudo-file-system-types))

(define %initrd/pre-mount
  (with-imported-modules (source-module-closure
                          '((guix build syscalls)
                            (guix build utils)))
    #~(begin
        (use-modules (gnu build file-systems)
                     (gnu build linux-boot)
                     ((guix build syscalls)
                      #:hide (file-system-type))
                     (guix build utils))

        ;; XXX: Major Hack! Enables mounting ZFS datasets via legacy mountpoints.
        (let ((orig (@ (gnu build file-systems) canonicalize-device-spec)))
          (set! (@ (gnu build file-systems) canonicalize-device-spec)
            (lambda (spec)
              (let ((device (if (file-system-label? spec)
                                (file-system-label->string spec)
                                spec)))
                (if (and (string? device)
                         (char-set-contains? char-set:letter (string-ref device 0))
                         (#$%initrd/import-device-zpool device))
                    device
                    (orig spec))))))

        ;; In my actual config this is where I run plymouth and decrypt keyfiles
        ;; (but call `load-key' in a per-dataset loop below).
        )))

(define %initrd/import-device-zpool
  #~(lambda (device)
      (let ((zpool (substring device 0 (or (string-index device #\/) 0)))
            (present? (lambda (device)
                        (and (not (zero? (string-length device)))
                             (zero? (system* #$(file-append zfs "/sbin/zfs")
                                             "list" device))))))
        (unless (or (zero? (string-length zpool))
                    (present? device))
          (invoke #$(file-append zfs "/sbin/zpool") "import" zpool)

          ;; Here's where the rollback happens.
          ;;
          ;; In my actual config I have an ugly loop that handles multiple
          ;; zpools and decryption via load-key, hence the more dynamic parsing
          ;; above.
          ;;
          ;; We're just gonna do this for illustrative purposes:
          (when (equal? zpool "zpool")
            (system* #$(file-append zfs "/sbin/zfs")
                     "rollback" "zpool/local/root@blank"))))))

(define (%initrd file-systems . kwargs)
  (apply raw-initrd
    (cons file-systems
          (substitute-keyword-arguments kwargs
            ((#:linux linux)
             #;OMITTED)
            ((#:pre-mount pre-mount #t)
             #~(begin #$%initrd/pre-mount
                      #$pre-mount))))))

(define %users
  (cons (user-account
                  (name "${USER_NAME}")
                  (id 1000) ; Put a pin in this.
                  (password "${USER_PASSWORD_HASH}")
                  (supplementary-groups '("wheel" "netdev"
                                          "audio" "video")))
                 %base-user-accounts))

(operating-system
  (host-name "antelope")
  (timezone "America/Los_Angeles")
  (locale "en_US.utf8")

  ;; Use the UEFI variant of GRUB with the EFI System
  ;; Partition mounted on /boot/efi.
  (bootloader (bootloader-configuration
                (bootloader grub-efi-bootloader)
                (targets '("/boot/efi"))))

  ;; ====================
  ;; SUBSTANTIAL OMISSION
  ;; ====================
  ;;
  ;; The kernel package needs to have the ZFS module either built-in or in
  ;; its `modules' output. This is left as an exercise to the reader because
  ;; my current solution involves building the kernel several times,
  ;; desperately needs re-worked, and is too long / abstracted to
  ;; include here. Said monstrosity also ensures that the ZFS module
  ;; is built against the correct kernel by setting the package's `#:linux'
  ;; argument.
  ;;
  (kernel #;OMITTED)
  (initrd %initrd)

  ;; The rest of the neccessary ZFS bits and bobs *are* included.
  (initrd-modules
    (cons "zfs" %base-initrd-modules))

  (file-systems (append
                 (list (file-system
                         (mount-point "/")
                         (device "rpool/local/root")
                         (type "zfs"))
                       (map (match-lambda
                              ((d mp)
                               (file-system
                                 (mount-point mp)
                                 (device d)
                                 (type "zfs")
                                 (needed-for-boot? #t))))
                            '(("rpool/local/root"     . "/")
                              ("rpool/local/guix"     . "/gnu")
                              ("rpool/local/var-guix" . "/var/guix")
                              ("rpool/safe/home"      . "/home")
                              ("rpool/safe/persist"   . "/persist")))
                       (file-system
                         (mount-point "/boot")
                         (device (uuid "6f62e623-5aa9-4681-a6da-9e0a68e7fbfb"))
                         (type "ext4"))
                       (file-system
                         (device (uuid "1234-ABCD" 'fat))
                         (mount-point "/boot/efi")
                         (type "vfat")))
                 %base-file-systems))

  (users %users)

  (packages (append (list
                      zfs
                      nss-certs ;; for HTTPS access
                      gvfs)     ;; for user mounts
                    %base-packages))

  (services (cons* (service xfce-desktop-service-type)
                   (simple-service 'zfs-mod-loader
                                   kernel-module-loader-service-type
                                   '("zfs"))
                   (simple-service 'zfs-udev-rules
                                   udev-service-type
                                   `(,zfs)))
                   ;;
                   ;; Some directories may have already been populated by other
                   ;; activation services on first run, so the function below
                   ;; will move them into /persist before creating a symlink.
                   ;;
                   ;; I've thought about doing this in the initfs, but I don't
                   ;; think we have a hook between file-system-mounts and
                   ;; activation so we'd have to mount/unmount the datsets
                   ;; ourselves ahead of when Guix mounts them...
                   ;;
                   (simple-service 'symlink-activation activation-service-type
                     (with-imported-modules (source-module-closure
                                             '((guix build utils)))
                       #~(begin
                           (use-modules (ice-9 match)
                                        (guix build utils))
                           (map (lambda (lst)
                                  (apply (lambda* (dest src #:optional mode user group)
                                           (let ((users '#$(map (lambda (u) (cons (user-account-name u) (user-account-uid u)))
                                                                %users))
                                                 (groups '#$(map (lambda (g) (cons (user-group-name g) (user-group-id g)))
                                                                 %base-groups))
                                                 (get-id (lambda (name file)
                                                           (let* ((port (open-pipe* OPEN_READ #$(file-append gawk "/bin/gawk")
                                                                                    "-F:" "$1 == NAME {print $3}" (string-append "NAME=" name)
                                                                                    file))
                                                                  (str (read-line port)))
                                                             (close-pipe port)
                                                             (string->number str)))))
                                             (unless (or (not user) (number? user))
                                               (set! user (or (assoc-ref users user)
                                                              (get-id user "/etc/passwd"))))
                                             (unless (or (not group) (number? group))
                                               (set! group (or (assoc-ref groups group)
                                                               (get-id group "/etc/group")))))

                                           ;; src->dest = persist->root-fs, like a symlink:
                                           (mkdir-p (dirname dest))
                                           (let ((perms-target (if src src dest))
                                                 (tempfile (string-append dest ".tmp")))
                                             (if (string-suffix? "/" perms-target)
                                                 (mkdir-p perms-target)
                                                 (mkdir-p (dirname perms-target)))
                                             (when (and src (file-exists? dest))
                                               (unless (file-exists? src)
                                                 (copy-recursively dest src
                                                                   #:keep-permissions? #t))
                                               (delete-file-recursively dest))
                                             (when src
                                               (when (file-exists? tempfile)
                                                 (delete-file tempfile))
                                               (symlink src tempfile)
                                               (rename-file tempfile dest))
                                             (when (file-exists? perms-target)
                                               (chown perms-target (or user -1) (or group -1))
                                               (when mode (chmod perms-target mode)))))
                                         lst))
                             ;; Fresh parent directories and omitted modes default to '#o755 root:root'.
                             ;; TODO: Please use the specified permissions for fresh parent directories.
                             '(("/etc/NetworkManager/system-connections" "/persist/etc/NetworkManager/system-connections/")
                               ("/etc/machine-id"                        "/persist/etc/machine-id" #o644)
                               ("/etc/ssh"                               "/persist/etc/ssh/")
                               ;; Vim won't start without =/var/tmp=.
                               ("/var/tmp"                               #f))))))
                   %desktop-services))