Skip to content

Cluster Operations

Prior Knowledge

LeoFS provides the cluster operation features which are implemented on leofs-adm, LeoFS CLI for administration. LeoFS supports node addition and node deletion, and already covers as unique features of LeoFS, node suspension, node restart, and node takeover. You can use those functions after starting a LeoFS system.

Operations

Add a Node

LeoFS temporally adds a node into the member table of LeoManager's database after launching a new LeoStorage node. If you decide to join it in the cluster, you need to execute leofs-adm rebalance command.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
## Example:
## 1. Launch a new LeoStorage node

## 2. Check the current state of the cluster
$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.3.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | d5d667a6
                previous ring-hash | d5d667a6
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:20:19 +0900
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:20:19 +0900
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:20:19 +0900
  S    | [email protected]      | attached     |                |                | 2017-04-18 18:20:37 +0900
  G    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:20:21 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------

## 3. Execute `rebalance`
$ leofs-adm rebalance
Generating rebalance-list...
Generated rebalance-list
Distributing rebalance-list to the storage nodes
OK  25% - [email protected]
OK  50% - [email protected]
OK  75% - [email protected]
OK 100% - [email protected]
OK

## 4. Check the latest state of cluster after rebalancing the cluster
$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.3.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | ce4bece1
                previous ring-hash | 3923d007
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | ce4bece1       | 3923d007       | 2017-04-18 18:20:19 +0900
  S    | [email protected]      | running      | ce4bece1       | 3923d007       | 2017-04-18 18:20:19 +0900
  S    | [email protected]      | running      | ce4bece1       | 3923d007       | 2017-04-18 18:20:19 +0900
  S    | [email protected]      | running      | ce4bece1       | 3923d007       | 2017-04-18 18:21:25 +0900
  G    | [email protected]      | running      | ce4bece1       | 3923d007       | 2017-04-18 18:20:21 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------

Warning: Avoid Rebalance with many nodes attached|detached

While rebalance is on-going, The more nodes attached|detached there are, The more system resources like network bandwidth, Disk I/O, your LeoFS cluster can consume. Instead we'd recommend you divide the ong big rebalance into smaller multiple rebalances and issue rebalance one by one to avoid exhausting system resouces.

Remove a Node

If you need to shrink a target LeoFS' cluster size, you can realize that by following the operation flow.

  • Decide to remove a LeoStorage node, whose state must be running or stop
  • Then execute leofs-adm detach command
  • Finally, execute leofs-adm rebalance command to start rebalancing data in the cluster
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
## Example:
## 1. Check the current state of the cluster
$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.3.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 3923d007
                previous ring-hash | 3923d007
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | 3923d007       | 3923d007       | 2017-04-18 18:31:37 +0900
  S    | [email protected]      | running      | 3923d007       | 3923d007       | 2017-04-18 18:31:37 +0900
  S    | [email protected]      | running      | 3923d007       | 3923d007       | 2017-04-18 18:31:37 +0900
  S    | [email protected]      | running      | 3923d007       | 3923d007       | 2017-04-18 18:31:37 +0900
  G    | [email protected]      | running      | 3923d007       | 3923d007       | 2017-04-18 18:31:55 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------

## 2. Remove a LeoStorage node
$ leofs-adm detach [email protected]
OK

## 3. Execute `rebalance`
$ leofs-adm rebalance
Generating rebalance-list...
Generated rebalance-list
Distributing rebalance-list to the storage nodes
OK  33% - [email protected]
OK  67% - [email protected]
OK 100% - [email protected]
OK

## 3. Check the latest state of cluster after rebalancing the cluster
$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.3.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | d5d667a6
                previous ring-hash | d5d667a6
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:31:37 +0900
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:31:37 +0900
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:31:37 +0900
  G    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:31:55 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------

Rollback a detached Node

If you detached a node by mistake, you can rollback that node by following the operation below.

  • Check the current state of the cluster and specify which nodes are detached by mistake.
  • Then execute leofs-adm rollback command on each detached node.
  • Execute leofs-adm status to check whether the node state gets back to running
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
## Example:
## 1. Check the current state of the cluster
$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.4.1
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
 [mdcr] max number of joinable DCs | 2
 [mdcr] total replicas per a DC    | 1
 [mdcr] number of successes of R   | 1
 [mdcr] number of successes of W   | 1
 [mdcr] number of successes of D   | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 889a8d21
                previous ring-hash | fa2ce41b
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+---------+----------------+----------------+----------------------------
 type  |           node           |    state     | rack id |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+---------+----------------+----------------+----------------------------
  S    | [email protected]      | detached     |         | -1             | -1             | 2018-06-22 15:52:38 +0900
  S    | [email protected]      | running      |         | 889a8d21       | fa2ce41b       | 2018-06-22 15:05:41 +0900
  S    | [email protected]      | running      |         | 889a8d21       | fa2ce41b       | 2018-06-22 15:00:14 +0900
  S    | [email protected]      | running      |         | 889a8d21       | fa2ce41b       | 2018-06-22 14:55:54 +0900
  G    | [email protected]      | running      |         | 889a8d21       | fa2ce41b       | 2018-06-22 14:47:33 +0900
-------+--------------------------+--------------+---------+----------------+----------------+----------------------------

## 2. Rollback a detached LeoStorage node
$ leofs-adm rollback [email protected]
OK

## 3. Confirm whether the node state gets back to `running`
$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.4.1
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
 [mdcr] max number of joinable DCs | 2
 [mdcr] total replicas per a DC    | 1
 [mdcr] number of successes of R   | 1
 [mdcr] number of successes of W   | 1
 [mdcr] number of successes of D   | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 889a8d21
                previous ring-hash | fa2ce41b
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+---------+----------------+----------------+----------------------------
 type  |           node           |    state     | rack id |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+---------+----------------+----------------+----------------------------
  S    | [email protected]      | running      |         | 889a8d21       | fa2ce41b       | 2018-06-22 15:52:38 +0900
  S    | [email protected]      | running      |         | 889a8d21       | fa2ce41b       | 2018-06-22 15:05:41 +0900
  S    | [email protected]      | running      |         | 889a8d21       | fa2ce41b       | 2018-06-22 15:00:14 +0900
  S    | [email protected]      | running      |         | 889a8d21       | fa2ce41b       | 2018-06-22 14:55:54 +0900
  G    | [email protected]      | running      |         | 889a8d21       | fa2ce41b       | 2018-06-22 14:47:33 +0900
-------+--------------------------+--------------+---------+----------------+----------------+----------------------------

Take Over a Node

If a new LeoStorage node takes over a detached node, you can realize that by following the operation flow.

  • Execute leofs-adm detach command to remove a target node in the cluster
  • Then launch a new node to take over the detached node
  • Finally, execute leofs-adm reebalance command to start rebalancing data in the cluster
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
## Example:
## 1. Check the current state of the cluster (1)
$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.3.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | d5d667a6
                previous ring-hash | d5d667a6
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:55:35 +0900
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:55:35 +0900
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:55:35 +0900
  G    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:55:37 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------

## 2. Remove a LeoStorage node
$ leofs-adm detach [email protected]
OK

## 3. Launch a new LeoStorage node

## 4. Check the current state of the cluster(2)

$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.3.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | d5d667a6
                previous ring-hash | d5d667a6
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | detached     | d5d667a6       | d5d667a6       | 2017-04-18 18:56:32 +0900
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:55:35 +0900
  S    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:55:35 +0900
  S    | [email protected]      | attached     |                |                | 2017-04-18 18:56:47 +0900
  G    | [email protected]      | running      | d5d667a6       | d5d667a6       | 2017-04-18 18:55:37 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------

## 5. Execute `rebalance`
$ leofs-adm rebalance
Generating rebalance-list...
Generated rebalance-list
Distributing rebalance-list to the storage nodes
OK  33% - [email protected]
OK  67% - [email protected]
OK 100% - [email protected]
OK

## 6. Check the latest state of the cluster
$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.3.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | c613a468
                previous ring-hash | c613a468
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 18:55:35 +0900
  S    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 18:55:35 +0900
  S    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 18:58:16 +0900
  G    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 18:55:37 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------

Suspend a Node

When maintenance of a node is necessary, you can suspend a target node temporally. A suspended node does not receive requests from LeoGateway nodes and LeoStorage nodes. LeoFS eventually distributes the state of the cluster to every node.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
## Example:
## 1. Execute `suspend`
$ leofs-adm suspend [email protected]
OK


## 2. Check the latest state of the cluster
$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.3.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | c613a468
                previous ring-hash | c613a468
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | suspend      | c613a468       | c613a468       | 2017-04-18 18:55:35 +0900
  S    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 18:55:35 +0900
  S    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 18:58:16 +0900
  G    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 18:55:37 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------

Resume a Node

After suspending a node, if its node restarts and rejoins the cluster, execute leofs-adm resume command.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
## Example:
## 1. Execute `resume`
$ leofs-adm resume [email protected]
OK

## 2. Check the latest state of the cluster
$ leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.3.3
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | c613a468
                previous ring-hash | c613a468
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 19:01:48 +0900
  S    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 18:55:35 +0900
  S    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 18:58:16 +0900
  G    | [email protected]      | running      | c613a468       | c613a468       | 2017-04-18 18:55:37 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------