Shorten the README; save the story for the blog post.

2023-12-06 18:33:26 -08:00
parent 62dcffc2c9
commit 0346b96449
1 changed files with 14 additions and 58 deletions
--- a/README.md
+++ b/README.md
@@ -2,65 +2,21 @@

 ## Background

-I have a niche problem: my storage server's ZFS pool is lumpy!
+See [this blog post](https://blog.humancabbage.net/posts/datashake) for the
+motivation behind this program. Basically, this program copies files back-and-
+forth between ZFS datasets to attempt to address unbalanced utilization among
+vdevs.

-```
-NAME                        SIZE  ALLOC   FREE  FRAG    CAP  HEALTH
-zones                      32.6T  12.2T  20.4T    3%    37%  ONLINE
- mirror                    3.62T  2.21T  1.41T    5%  61.1%  ONLINE
-  c0t5000CCA25DE8EBF4d0        -      -      -     -      -  ONLINE
-  c0t5000CCA25DEEC08Ad0        -      -      -     -      -  ONLINE
- mirror                    3.62T  2.22T  1.40T    6%  61.3%  ONLINE
-  c0t5000CCA25DE6FD92d0        -      -      -     -      -  ONLINE
-  c0t5000CCA25DEEC738d0        -      -      -     -      -  ONLINE
- mirror                    3.62T  2.28T  1.34T    6%  63.0%  ONLINE
-  c0t5000CCA25DEAA3EEd0        -      -      -     -      -  ONLINE
-  c0t5000CCA25DE6F42Ed0        -      -      -     -      -  ONLINE
- mirror                    3.62T  2.29T  1.33T    5%  63.2%  ONLINE
-  c0t5000CCA25DE9DB9Dd0        -      -      -     -      -  ONLINE
-  c0t5000CCA25DEED5B7d0        -      -      -     -      -  ONLINE
- mirror                    3.62T  2.29T  1.34T    5%  63.1%  ONLINE
-  c0t5000CCA25DEB0F42d0        -      -      -     -      -  ONLINE
-  c0t5000CCA25DEECB9Dd0        -      -      -     -      -  ONLINE
- mirror                    3.62T   237G  3.39T    1%  6.38%  ONLINE
-  c0t5000CCA24CF36876d0        -      -      -     -      -  ONLINE
-  c0t5000CCA249D4AA59d0        -      -      -     -      -  ONLINE
- mirror                    3.62T   236G  3.39T    0%  6.36%  ONLINE
-  c0t5000CCA24CE9D1CAd0        -      -      -     -      -  ONLINE
-  c0t5000CCA24CE954D2d0        -      -      -     -      -  ONLINE
- mirror                    3.62T   228G  3.40T    0%  6.13%  ONLINE
-  c0t5000CCA24CE8C60Ed0        -      -      -     -      -  ONLINE
-  c0t5000CCA24CE9D249d0        -      -      -     -      -  ONLINE
- mirror                    3.62T   220G  3.41T    0%  5.93%  ONLINE
-  c0t5000CCA24CF80849d0        -      -      -     -      -  ONLINE
-  c0t5000CCA24CF80838d0        -      -      -     -      -  ONLINE
+## Usage
+
+```text
+$ datashake --source /tank/stuff --temp /tank/temp --concurrency 2
 ```

-You can probably guess what happened: I had a zpool with five mirrors, and then
-expanded it by adding four more mirrors. ZFS doesn't automatically rebalance
-existing data, but does skew writes of new data so that more go to the newer
-mirrors.
+## Shortcomings

-To rebalance the data, the algorithm is straightforward:
-
-* for file in dataset,
-  * copy the file to a temporary directory in another dataset
-  * delete the original file
-  * copy from the temporary directory to recreate the original file
-  * delete the temporary directory
-
-As the files get rewritten, not only do the newer mirrors get more full, but
-also the older mirrors free up space. Eventually, the utilization of all mirrors
-should converge.
-
-## Solution
-
-The `datashake` program aims to automate the rebalancing process, while also
-adding some robustness and heuristics.
-
-* Gracefully handle shutdowns (e.g. Ctrl-c) to prevent files from getting lost.
-* Keep track of processed files, so that if the program stops and resumes, it
-  can skip those files.
-* Write a journal of operations so that, if shut down ungracefully, files in
-  the temporary directory can be identified and recovered.
-* Don't bother processing really small files.
+* The way actions and errors are logged in-memory and only persisted at the
+  end is not robust enough. Program crashes or system power loss can cause
+  files to be lost in the temporary directory. In the meantime, the program
+  still writes to `stdout` for each copy operation, so piping the output to
+  `tee` should suffice for now.