Go to file
2023-12-04 13:28:04 -08:00
.vscode Implement concurrency. 2023-12-04 13:28:04 -08:00
.gitignore Initial commit. 2023-12-04 01:11:04 -08:00
datashake.go Implement concurrency. 2023-12-04 13:28:04 -08:00
go.mod Implement concurrency. 2023-12-04 13:28:04 -08:00
go.sum Implement concurrency. 2023-12-04 13:28:04 -08:00
Makefile Initial commit. 2023-12-04 01:11:04 -08:00
README.md Initial commit. 2023-12-04 01:11:04 -08:00

datashake - level out zpools by rewriting files

Background

I have a niche problem: my storage server's ZFS pool is lumpy!

NAME                        SIZE  ALLOC   FREE  FRAG    CAP  HEALTH
zones                      32.6T  12.2T  20.4T    3%    37%  ONLINE
 mirror                    3.62T  2.21T  1.41T    5%  61.1%  ONLINE
  c0t5000CCA25DE8EBF4d0        -      -      -     -      -  ONLINE
  c0t5000CCA25DEEC08Ad0        -      -      -     -      -  ONLINE
 mirror                    3.62T  2.22T  1.40T    6%  61.3%  ONLINE
  c0t5000CCA25DE6FD92d0        -      -      -     -      -  ONLINE
  c0t5000CCA25DEEC738d0        -      -      -     -      -  ONLINE
 mirror                    3.62T  2.28T  1.34T    6%  63.0%  ONLINE
  c0t5000CCA25DEAA3EEd0        -      -      -     -      -  ONLINE
  c0t5000CCA25DE6F42Ed0        -      -      -     -      -  ONLINE
 mirror                    3.62T  2.29T  1.33T    5%  63.2%  ONLINE
  c0t5000CCA25DE9DB9Dd0        -      -      -     -      -  ONLINE
  c0t5000CCA25DEED5B7d0        -      -      -     -      -  ONLINE
 mirror                    3.62T  2.29T  1.34T    5%  63.1%  ONLINE
  c0t5000CCA25DEB0F42d0        -      -      -     -      -  ONLINE
  c0t5000CCA25DEECB9Dd0        -      -      -     -      -  ONLINE
 mirror                    3.62T   237G  3.39T    1%  6.38%  ONLINE
  c0t5000CCA24CF36876d0        -      -      -     -      -  ONLINE
  c0t5000CCA249D4AA59d0        -      -      -     -      -  ONLINE
 mirror                    3.62T   236G  3.39T    0%  6.36%  ONLINE
  c0t5000CCA24CE9D1CAd0        -      -      -     -      -  ONLINE
  c0t5000CCA24CE954D2d0        -      -      -     -      -  ONLINE
 mirror                    3.62T   228G  3.40T    0%  6.13%  ONLINE
  c0t5000CCA24CE8C60Ed0        -      -      -     -      -  ONLINE
  c0t5000CCA24CE9D249d0        -      -      -     -      -  ONLINE
 mirror                    3.62T   220G  3.41T    0%  5.93%  ONLINE
  c0t5000CCA24CF80849d0        -      -      -     -      -  ONLINE
  c0t5000CCA24CF80838d0        -      -      -     -      -  ONLINE

You can probably guess what happened: I had a zpool with five mirrors, and then expanded it by adding four more mirrors. ZFS doesn't automatically rebalance existing data, but does skew writes of new data so that more go to the newer mirrors.

To rebalance the data manually, the algorithm is straightforward:

for file in dataset,
  * copy the file to a temporary directory in another dataset
  * delete the original file
  * copy from the temporary directory to recreate the original file
  * delete the temporary directory

As the files get rewritten, not only do the newer mirrors get more full, but also the older mirrors free up space. Eventually, the utilization of all mirrors should converge.

Solution

The datashake program aims to automate the rebalancing process, while also adding some robustness and heuristics.

  • Gracefully handle shutdowns (e.g. Ctrl-c) to prevent files from getting lost.
  • Keep track of processed files, so that if the program stops and resumes, it can skip those files.
  • Write a journal of operations so that, if shut down ungracefully, files in the temporary directory can be identified and recovered.
  • Don't bother processing really small files.