Shorten the README; save the story for the blog post.
This commit is contained in:
parent
62dcffc2c9
commit
0346b96449
72
README.md
72
README.md
@ -2,65 +2,21 @@
|
||||
|
||||
## Background
|
||||
|
||||
I have a niche problem: my storage server's ZFS pool is lumpy!
|
||||
See [this blog post](https://blog.humancabbage.net/posts/datashake) for the
|
||||
motivation behind this program. Basically, this program copies files back-and-
|
||||
forth between ZFS datasets to attempt to address unbalanced utilization among
|
||||
vdevs.
|
||||
|
||||
```
|
||||
NAME SIZE ALLOC FREE FRAG CAP HEALTH
|
||||
zones 32.6T 12.2T 20.4T 3% 37% ONLINE
|
||||
mirror 3.62T 2.21T 1.41T 5% 61.1% ONLINE
|
||||
c0t5000CCA25DE8EBF4d0 - - - - - ONLINE
|
||||
c0t5000CCA25DEEC08Ad0 - - - - - ONLINE
|
||||
mirror 3.62T 2.22T 1.40T 6% 61.3% ONLINE
|
||||
c0t5000CCA25DE6FD92d0 - - - - - ONLINE
|
||||
c0t5000CCA25DEEC738d0 - - - - - ONLINE
|
||||
mirror 3.62T 2.28T 1.34T 6% 63.0% ONLINE
|
||||
c0t5000CCA25DEAA3EEd0 - - - - - ONLINE
|
||||
c0t5000CCA25DE6F42Ed0 - - - - - ONLINE
|
||||
mirror 3.62T 2.29T 1.33T 5% 63.2% ONLINE
|
||||
c0t5000CCA25DE9DB9Dd0 - - - - - ONLINE
|
||||
c0t5000CCA25DEED5B7d0 - - - - - ONLINE
|
||||
mirror 3.62T 2.29T 1.34T 5% 63.1% ONLINE
|
||||
c0t5000CCA25DEB0F42d0 - - - - - ONLINE
|
||||
c0t5000CCA25DEECB9Dd0 - - - - - ONLINE
|
||||
mirror 3.62T 237G 3.39T 1% 6.38% ONLINE
|
||||
c0t5000CCA24CF36876d0 - - - - - ONLINE
|
||||
c0t5000CCA249D4AA59d0 - - - - - ONLINE
|
||||
mirror 3.62T 236G 3.39T 0% 6.36% ONLINE
|
||||
c0t5000CCA24CE9D1CAd0 - - - - - ONLINE
|
||||
c0t5000CCA24CE954D2d0 - - - - - ONLINE
|
||||
mirror 3.62T 228G 3.40T 0% 6.13% ONLINE
|
||||
c0t5000CCA24CE8C60Ed0 - - - - - ONLINE
|
||||
c0t5000CCA24CE9D249d0 - - - - - ONLINE
|
||||
mirror 3.62T 220G 3.41T 0% 5.93% ONLINE
|
||||
c0t5000CCA24CF80849d0 - - - - - ONLINE
|
||||
c0t5000CCA24CF80838d0 - - - - - ONLINE
|
||||
## Usage
|
||||
|
||||
```text
|
||||
$ datashake --source /tank/stuff --temp /tank/temp --concurrency 2
|
||||
```
|
||||
|
||||
You can probably guess what happened: I had a zpool with five mirrors, and then
|
||||
expanded it by adding four more mirrors. ZFS doesn't automatically rebalance
|
||||
existing data, but does skew writes of new data so that more go to the newer
|
||||
mirrors.
|
||||
## Shortcomings
|
||||
|
||||
To rebalance the data, the algorithm is straightforward:
|
||||
|
||||
* for file in dataset,
|
||||
* copy the file to a temporary directory in another dataset
|
||||
* delete the original file
|
||||
* copy from the temporary directory to recreate the original file
|
||||
* delete the temporary directory
|
||||
|
||||
As the files get rewritten, not only do the newer mirrors get more full, but
|
||||
also the older mirrors free up space. Eventually, the utilization of all mirrors
|
||||
should converge.
|
||||
|
||||
## Solution
|
||||
|
||||
The `datashake` program aims to automate the rebalancing process, while also
|
||||
adding some robustness and heuristics.
|
||||
|
||||
* Gracefully handle shutdowns (e.g. Ctrl-c) to prevent files from getting lost.
|
||||
* Keep track of processed files, so that if the program stops and resumes, it
|
||||
can skip those files.
|
||||
* Write a journal of operations so that, if shut down ungracefully, files in
|
||||
the temporary directory can be identified and recovered.
|
||||
* Don't bother processing really small files.
|
||||
* The way actions and errors are logged in-memory and only persisted at the
|
||||
end is not robust enough. Program crashes or system power loss can cause
|
||||
files to be lost in the temporary directory. In the meantime, the program
|
||||
still writes to `stdout` for each copy operation, so piping the output to
|
||||
`tee` should suffice for now.
|
||||
|
Loading…
Reference in New Issue
Block a user