diff --git a/README.md b/README.md index 2c73b71..2cff83f 100644 --- a/README.md +++ b/README.md @@ -2,65 +2,21 @@ ## Background -I have a niche problem: my storage server's ZFS pool is lumpy! +See [this blog post](https://blog.humancabbage.net/posts/datashake) for the +motivation behind this program. Basically, this program copies files back-and- +forth between ZFS datasets to attempt to address unbalanced utilization among +vdevs. -``` -NAME SIZE ALLOC FREE FRAG CAP HEALTH -zones 32.6T 12.2T 20.4T 3% 37% ONLINE - mirror 3.62T 2.21T 1.41T 5% 61.1% ONLINE - c0t5000CCA25DE8EBF4d0 - - - - - ONLINE - c0t5000CCA25DEEC08Ad0 - - - - - ONLINE - mirror 3.62T 2.22T 1.40T 6% 61.3% ONLINE - c0t5000CCA25DE6FD92d0 - - - - - ONLINE - c0t5000CCA25DEEC738d0 - - - - - ONLINE - mirror 3.62T 2.28T 1.34T 6% 63.0% ONLINE - c0t5000CCA25DEAA3EEd0 - - - - - ONLINE - c0t5000CCA25DE6F42Ed0 - - - - - ONLINE - mirror 3.62T 2.29T 1.33T 5% 63.2% ONLINE - c0t5000CCA25DE9DB9Dd0 - - - - - ONLINE - c0t5000CCA25DEED5B7d0 - - - - - ONLINE - mirror 3.62T 2.29T 1.34T 5% 63.1% ONLINE - c0t5000CCA25DEB0F42d0 - - - - - ONLINE - c0t5000CCA25DEECB9Dd0 - - - - - ONLINE - mirror 3.62T 237G 3.39T 1% 6.38% ONLINE - c0t5000CCA24CF36876d0 - - - - - ONLINE - c0t5000CCA249D4AA59d0 - - - - - ONLINE - mirror 3.62T 236G 3.39T 0% 6.36% ONLINE - c0t5000CCA24CE9D1CAd0 - - - - - ONLINE - c0t5000CCA24CE954D2d0 - - - - - ONLINE - mirror 3.62T 228G 3.40T 0% 6.13% ONLINE - c0t5000CCA24CE8C60Ed0 - - - - - ONLINE - c0t5000CCA24CE9D249d0 - - - - - ONLINE - mirror 3.62T 220G 3.41T 0% 5.93% ONLINE - c0t5000CCA24CF80849d0 - - - - - ONLINE - c0t5000CCA24CF80838d0 - - - - - ONLINE +## Usage + +```text +$ datashake --source /tank/stuff --temp /tank/temp --concurrency 2 ``` -You can probably guess what happened: I had a zpool with five mirrors, and then -expanded it by adding four more mirrors. ZFS doesn't automatically rebalance -existing data, but does skew writes of new data so that more go to the newer -mirrors. +## Shortcomings -To rebalance the data, the algorithm is straightforward: - -* for file in dataset, - * copy the file to a temporary directory in another dataset - * delete the original file - * copy from the temporary directory to recreate the original file - * delete the temporary directory - -As the files get rewritten, not only do the newer mirrors get more full, but -also the older mirrors free up space. Eventually, the utilization of all mirrors -should converge. - -## Solution - -The `datashake` program aims to automate the rebalancing process, while also -adding some robustness and heuristics. - -* Gracefully handle shutdowns (e.g. Ctrl-c) to prevent files from getting lost. -* Keep track of processed files, so that if the program stops and resumes, it - can skip those files. -* Write a journal of operations so that, if shut down ungracefully, files in - the temporary directory can be identified and recovered. -* Don't bother processing really small files. +* The way actions and errors are logged in-memory and only persisted at the + end is not robust enough. Program crashes or system power loss can cause + files to be lost in the temporary directory. In the meantime, the program + still writes to `stdout` for each copy operation, so piping the output to + `tee` should suffice for now.