Watchdog

Since

LeoFS v1.2.2

Purpose

The watchdog mechanism monitors CPU, Erlang-IO, disk and a storage cluster status in order to keep running a node stably. Also, this mechanism communicates with the MQ mechanism and the auto-compaction/compaction mechanism. Their processings - comsumption of MQ’s messages and deletion of unnecessary objects are affected by this mechanism.

Getting Started

Pre-requirement

Note

We have supported this mechanism for CentOS 6.5/7.0, Ubuntu Server 14.04 LTS, FreeBSD and SmartOS. And also, disk monitoring is using the iostat command and df command, you need to install it before getting started the watchdog.

Configuration of LeoFS Gateway

Note

All default setting of the watchdogs are disabled. Before getting use this mechanism, you need to turn on them.

Modify leo_gateway.conf at the section of wathdog.

LeoFS Gateway’s watchdog properties:

Property Default value Description
rex (rpc-server)
watchdog.rex.interval 5 Watch interval (sec)
watchdog.rex.threshold_mem_capacity 33554432 Threshold memory capacity for binary (byte)
CPU
watchdog.cpu.is_enabled false Is cpu-watchdog enabled? [true|false]
watchdog.cpu.interval 5 Watch interval(sec)
watchdog.cpu.raised_error_times 3 An error is raised to subscribers when a number of errors reached this configuration.
watchdog.cpu.threshold_cpu_load_avg 5.0 Threshold CPU load avg for 1min/5min
watchdog.cpu.threshold_cpu_util 100 Threshold CPU load util (%)

Configuration of LeoFS Storage

Note

All default setting of the watchdogs are disabled. Before getting use this mechanism, you need to turn on them.

Modify leo_storage.conf at the section of watchdog.

LeoFS Storage’s watchdog properties:

Property Default value Description
rex (rpc-server)
watchdog.rex.interval 10 Watch interval (sec)
watchdog.rex.threshold_mem_capacity 33554432 Threshold memory capacity for binary (byte)
CPU
watchdog.cpu.is_enabled false Is cpu-watchdog enabled? [true|false]
watchdog.cpu.interval 10 | Watch interval (sec)
watchdog.cpu.raised_error_times 5 An error is raised to subscribers when a number of errors reached this configuration.
watchdog.cpu.threshold_cpu_load_avg 5.0 Threshold CPU load avg for 1min/5min
watchdog.cpu.threshold_cpu_util 100 Threshold CPU load util (%)
DISK
watchdog.disk.is_enabled false Is disk-watchdog enabled? [true|false]
watchdog.disk.interval 10 Watch interval (sec)
watchdog.disk.raised_error_times 5 An error is raised to clients when a number of errors reached this configuration.
watchdog.disk.threshold_disk_use 85 Threshold disk usage(capacity) (%) - leo_watchdog is using df command
watchdog.disk.threshold_disk_util 100 Threshold disk util (%) - leo_watchdog is using iostat command
watchdog.disk.threshold_disk_rkb 98304 Threshold disk read KB/sec
watchdog.disk.threshold_disk_wkb 98304 Threshold disk write KB/sec
watchdog.disk.target_devices [] Target devices for checking disk utilization
Cluster
watchdog.cluster.is_enabled false Is cluster-watchdog enabled? [true|false]
watchdog.cluster.interval 1 Watch interval (sec)