Cluster Operations¶
Prior Knowledge¶
LeoFS provides the cluster operation features which are implemented on leofs-adm, LeoFS CLI for administration. LeoFS supports node addition
and node deletion
, and already covers as unique features of LeoFS, node suspension
, node restart
, and node takeover
. You can use those functions after starting a LeoFS system.
Operations¶
Add a Node¶
LeoFS temporally adds a node into the member table of LeoManager's database after launching a new LeoStorage node. If you decide to join it in the cluster, you need to execute leofs-adm rebalance command.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | ## Example: ## 1. Launch a new LeoStorage node ## 2. Check the current state of the cluster $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.3.3 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- max number of joinable DCs | 2 number of replicas a DC | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | d5d667a6 previous ring-hash | d5d667a6 -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+--------------------------+--------------+----------------+----------------+---------------------------- S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:20:19 +0900 S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:20:19 +0900 S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:20:19 +0900 S | [email protected] | attached | | | 2017-04-18 18:20:37 +0900 G | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:20:21 +0900 -------+--------------------------+--------------+----------------+----------------+---------------------------- ## 3. Execute `rebalance` $ leofs-adm rebalance Generating rebalance-list... Generated rebalance-list Distributing rebalance-list to the storage nodes OK 25% - [email protected] OK 50% - [email protected] OK 75% - [email protected] OK 100% - [email protected] OK ## 4. Check the latest state of cluster after rebalancing the cluster $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.3.3 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- max number of joinable DCs | 2 number of replicas a DC | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | ce4bece1 previous ring-hash | 3923d007 -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+--------------------------+--------------+----------------+----------------+---------------------------- S | [email protected] | running | ce4bece1 | 3923d007 | 2017-04-18 18:20:19 +0900 S | [email protected] | running | ce4bece1 | 3923d007 | 2017-04-18 18:20:19 +0900 S | [email protected] | running | ce4bece1 | 3923d007 | 2017-04-18 18:20:19 +0900 S | [email protected] | running | ce4bece1 | 3923d007 | 2017-04-18 18:21:25 +0900 G | [email protected] | running | ce4bece1 | 3923d007 | 2017-04-18 18:20:21 +0900 -------+--------------------------+--------------+----------------+----------------+---------------------------- |
Warning: Avoid Rebalance with many nodes attached|detached
While rebalance is on-going, The more nodes attached|detached there are, The more system resources like network bandwidth, Disk I/O, your LeoFS cluster can consume. Instead we'd recommend you divide the ong big rebalance into smaller multiple rebalances and issue rebalance one by one to avoid exhausting system resouces.
Remove a Node¶
If you need to shrink a target LeoFS' cluster size, you can realize that by following the operation flow.
- Decide to remove a LeoStorage node, whose state must be
running
orstop
- Then execute
leofs-adm detach
command - Finally, execute
leofs-adm rebalance
command to start rebalancing data in the cluster
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | ## Example: ## 1. Check the current state of the cluster $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.3.3 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- max number of joinable DCs | 2 number of replicas a DC | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | 3923d007 previous ring-hash | 3923d007 -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+--------------------------+--------------+----------------+----------------+---------------------------- S | [email protected] | running | 3923d007 | 3923d007 | 2017-04-18 18:31:37 +0900 S | [email protected] | running | 3923d007 | 3923d007 | 2017-04-18 18:31:37 +0900 S | [email protected] | running | 3923d007 | 3923d007 | 2017-04-18 18:31:37 +0900 S | [email protected] | running | 3923d007 | 3923d007 | 2017-04-18 18:31:37 +0900 G | [email protected] | running | 3923d007 | 3923d007 | 2017-04-18 18:31:55 +0900 -------+--------------------------+--------------+----------------+----------------+---------------------------- ## 2. Remove a LeoStorage node $ leofs-adm detach [email protected] OK ## 3. Execute `rebalance` $ leofs-adm rebalance Generating rebalance-list... Generated rebalance-list Distributing rebalance-list to the storage nodes OK 33% - [email protected] OK 67% - [email protected] OK 100% - [email protected] OK ## 3. Check the latest state of cluster after rebalancing the cluster $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.3.3 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- max number of joinable DCs | 2 number of replicas a DC | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | d5d667a6 previous ring-hash | d5d667a6 -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+--------------------------+--------------+----------------+----------------+---------------------------- S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:31:37 +0900 S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:31:37 +0900 S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:31:37 +0900 G | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:31:55 +0900 -------+--------------------------+--------------+----------------+----------------+---------------------------- |
Rollback a detached Node¶
If you detached a node by mistake, you can rollback that node by following the operation below.
- Check the current state of the cluster and specify which nodes are detached by mistake.
- Then execute
leofs-adm rollback
command on each detached node. - Execute
leofs-adm status
to check whether the node state gets back torunning
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | ## Example: ## 1. Check the current state of the cluster $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.4.1 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- [mdcr] max number of joinable DCs | 2 [mdcr] total replicas per a DC | 1 [mdcr] number of successes of R | 1 [mdcr] number of successes of W | 1 [mdcr] number of successes of D | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | 889a8d21 previous ring-hash | fa2ce41b -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+---------+----------------+----------------+---------------------------- type | node | state | rack id | current ring | prev ring | updated at -------+--------------------------+--------------+---------+----------------+----------------+---------------------------- S | [email protected] | detached | | -1 | -1 | 2018-06-22 15:52:38 +0900 S | [email protected] | running | | 889a8d21 | fa2ce41b | 2018-06-22 15:05:41 +0900 S | [email protected] | running | | 889a8d21 | fa2ce41b | 2018-06-22 15:00:14 +0900 S | [email protected] | running | | 889a8d21 | fa2ce41b | 2018-06-22 14:55:54 +0900 G | [email protected] | running | | 889a8d21 | fa2ce41b | 2018-06-22 14:47:33 +0900 -------+--------------------------+--------------+---------+----------------+----------------+---------------------------- ## 2. Rollback a detached LeoStorage node $ leofs-adm rollback [email protected] OK ## 3. Confirm whether the node state gets back to `running` $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.4.1 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- [mdcr] max number of joinable DCs | 2 [mdcr] total replicas per a DC | 1 [mdcr] number of successes of R | 1 [mdcr] number of successes of W | 1 [mdcr] number of successes of D | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | 889a8d21 previous ring-hash | fa2ce41b -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+---------+----------------+----------------+---------------------------- type | node | state | rack id | current ring | prev ring | updated at -------+--------------------------+--------------+---------+----------------+----------------+---------------------------- S | [email protected] | running | | 889a8d21 | fa2ce41b | 2018-06-22 15:52:38 +0900 S | [email protected] | running | | 889a8d21 | fa2ce41b | 2018-06-22 15:05:41 +0900 S | [email protected] | running | | 889a8d21 | fa2ce41b | 2018-06-22 15:00:14 +0900 S | [email protected] | running | | 889a8d21 | fa2ce41b | 2018-06-22 14:55:54 +0900 G | [email protected] | running | | 889a8d21 | fa2ce41b | 2018-06-22 14:47:33 +0900 -------+--------------------------+--------------+---------+----------------+----------------+---------------------------- |
Take Over a Node¶
If a new LeoStorage node takes over a detached node, you can realize that by following the operation flow.
- Execute
leofs-adm detach
command to remove a target node in the cluster - Then launch a new node to take over the detached node
- Finally, execute
leofs-adm reebalance
command to start rebalancing data in the cluster
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | ## Example: ## 1. Check the current state of the cluster (1) $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.3.3 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- max number of joinable DCs | 2 number of replicas a DC | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | d5d667a6 previous ring-hash | d5d667a6 -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+--------------------------+--------------+----------------+----------------+---------------------------- S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:55:35 +0900 S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:55:35 +0900 S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:55:35 +0900 G | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:55:37 +0900 -------+--------------------------+--------------+----------------+----------------+---------------------------- ## 2. Remove a LeoStorage node $ leofs-adm detach [email protected] OK ## 3. Launch a new LeoStorage node ## 4. Check the current state of the cluster(2) $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.3.3 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- max number of joinable DCs | 2 number of replicas a DC | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | d5d667a6 previous ring-hash | d5d667a6 -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+--------------------------+--------------+----------------+----------------+---------------------------- S | [email protected] | detached | d5d667a6 | d5d667a6 | 2017-04-18 18:56:32 +0900 S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:55:35 +0900 S | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:55:35 +0900 S | [email protected] | attached | | | 2017-04-18 18:56:47 +0900 G | [email protected] | running | d5d667a6 | d5d667a6 | 2017-04-18 18:55:37 +0900 -------+--------------------------+--------------+----------------+----------------+---------------------------- ## 5. Execute `rebalance` $ leofs-adm rebalance Generating rebalance-list... Generated rebalance-list Distributing rebalance-list to the storage nodes OK 33% - [email protected] OK 67% - [email protected] OK 100% - [email protected] OK ## 6. Check the latest state of the cluster $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.3.3 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- max number of joinable DCs | 2 number of replicas a DC | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | c613a468 previous ring-hash | c613a468 -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+--------------------------+--------------+----------------+----------------+---------------------------- S | [email protected] | running | c613a468 | c613a468 | 2017-04-18 18:55:35 +0900 S | [email protected] | running | c613a468 | c613a468 | 2017-04-18 18:55:35 +0900 S | [email protected] | running | c613a468 | c613a468 | 2017-04-18 18:58:16 +0900 G | [email protected] | running | c613a468 | c613a468 | 2017-04-18 18:55:37 +0900 -------+--------------------------+--------------+----------------+----------------+---------------------------- |
Suspend a Node¶
When maintenance of a node is necessary, you can suspend a target node temporally. A suspended node does not receive requests from LeoGateway nodes and LeoStorage nodes. LeoFS eventually distributes the state of the cluster to every node.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ## Example: ## 1. Execute `suspend` $ leofs-adm suspend [email protected] OK ## 2. Check the latest state of the cluster $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.3.3 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- max number of joinable DCs | 2 number of replicas a DC | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | c613a468 previous ring-hash | c613a468 -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+--------------------------+--------------+----------------+----------------+---------------------------- S | [email protected] | suspend | c613a468 | c613a468 | 2017-04-18 18:55:35 +0900 S | [email protected] | running | c613a468 | c613a468 | 2017-04-18 18:55:35 +0900 S | [email protected] | running | c613a468 | c613a468 | 2017-04-18 18:58:16 +0900 G | [email protected] | running | c613a468 | c613a468 | 2017-04-18 18:55:37 +0900 -------+--------------------------+--------------+----------------+----------------+---------------------------- |
Resume a Node¶
After suspending a node, if its node restarts and rejoins the cluster, execute leofs-adm resume
command.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | ## Example: ## 1. Execute `resume` $ leofs-adm resume [email protected] OK ## 2. Check the latest state of the cluster $ leofs-adm status [System Confiuration] -----------------------------------+---------- Item | Value -----------------------------------+---------- Basic/Consistency level -----------------------------------+---------- system version | 1.3.3 cluster Id | leofs_1 DC Id | dc_1 Total replicas | 2 number of successes of R | 1 number of successes of W | 1 number of successes of D | 1 number of rack-awareness replicas | 0 ring size | 2^128 -----------------------------------+---------- Multi DC replication settings -----------------------------------+---------- max number of joinable DCs | 2 number of replicas a DC | 1 -----------------------------------+---------- Manager RING hash -----------------------------------+---------- current ring-hash | c613a468 previous ring-hash | c613a468 -----------------------------------+---------- [State of Node(s)] -------+--------------------------+--------------+----------------+----------------+---------------------------- type | node | state | current ring | prev ring | updated at -------+--------------------------+--------------+----------------+----------------+---------------------------- S | [email protected] | running | c613a468 | c613a468 | 2017-04-18 19:01:48 +0900 S | [email protected] | running | c613a468 | c613a468 | 2017-04-18 18:55:35 +0900 S | [email protected] | running | c613a468 | c613a468 | 2017-04-18 18:58:16 +0900 G | [email protected] | running | c613a468 | c613a468 | 2017-04-18 18:55:37 +0900 -------+--------------------------+--------------+----------------+----------------+---------------------------- |