Skip to content

LeoStorage Settings

Prior Knowledge

Note: Configuration

LeoStorage's features depend on its configuration. If once a LeoFS system is launched, you cannot modify the following LeoStorage's configurations because the algorithm of the data operation strictly adheres to the settings.

Irrevocable and Attention Required Items:

Item Irrevocable? Description
LeoStorage Basic
obj_containers.path Modifiable with condition Able to change the directory of the container(s) but not able to add or remove the directory(s). You need to move the data files which are <obj_containers.path>/avs/object and <obj_containers.path>/avs/metadata, which adhere to this configuration.
obj_containers.num_of_containers Yes Not able to change the configuration because LeoStorage cannot retrieve objects or metadatas. If you want to modify this setting in order to add additional disk volumes for LeoFS then follow the instruction here3.
obj_containers.metadata_storage Yes As above
num_of_vnodes Yes As above
MQ
mq.backend_db Modifiable with condition Lose all the MQ's data after changing
mq.num_of_mq_procs Modifiable with condition As above
Replication and Recovery object(s)
replication.rack_awareness.rack_id Yes Not able to change the configuration because LeoFS cannot retrieve objects or metadatas.
Other Directories Settings
queue_dir Modifiable with condition Able to change the MQ's directory but you need to move the MQ's data, which adhere to this configuration.

Other Configurations

If you want to modify settings like where to place leo_storage.conf, what user is starting a LeoStorage process and so on, refer For Administrators / Settings / Environment Configuration for more information.

Configuration

LeoStorage Configurations

Item Description
LeoManager Nodes
managers Name of LeoManager nodes. This configuration is necessary for communicating with LeoManager's master and LeoManager's slave.

( Default: [[email protected], [email protected]] )

LeoStorage Basic
obj_containers.path Directories of object-containers

( Default: [./avs] )

obj_containers.num_of_containers A number of object-containers of each directory. As backend_db.eleveldb.write_buf_size * obj_containers.num_of_containers memory can be consumed in total, take both into account to meet with your memory footprint requirements on LeoStorage.

( Default: [8] )

obj_containers.sync_mode Mode of the data synchronization. There're three modes:
  • none: Not synchronization every time (default)
  • periodic: Periodic synchronization which depends on obj_containers.sync_interval_in_ms
  • writethrough: Ensures that any buffers kept by the OS are written to disk every time

( Default: none )

obj_containers.sync_interval_in_ms Interval in ms of the data synchronization

( Default: 1000, Unit: msec )

obj_containers.metadata_storage The metadata storage feature is pluggable which depends on bitcask and leveldb.

( Default: leveldb )

num_of_vnodes The total number of virtual-nodes of a LeoStorage node for generating the distributed hashtable, RING

( Default: 168 )

object_storage.is_strict_check Enable strict check between checksum of a metadata and checksum of an object.

( Default: false )

object_storage.threshold_of_slow_processing Threshold of slow processing

( Default: 1000, Unit: msec )

seeking_timeout_per_metadata Timeout of seeking metadatas per a metadata

( Default: 10, Unit: msec )

max_num_of_procs Maximum number of processes for both write and read operation

( Default: 3000 )

num_of_obj_storage_read_procs Total number of obj-storage-read processes per object-container, AVS
  • Range: [1..100]

( Default: 3 )

Watchdog
watchdog.common.loosen_control_at_safe_count When reach a number of safe (clear watchdog), a watchdog loosen the control

( Default: 1 )

Watchdog / REX
watchdog.rex.is_enabled Enables or disables the rex-watchdog which monitors the memory usage of Erlang's RPC component.

( Default: true )

watchdog.rex.interval An interval of executing the watchdog processing

( Default: 10, Unit: sec )

watchdog.rex.threshold_mem_capacity Threshold of memory capacity of binary for Erlang rex

( Default: 33554432, Unit: byte )

Watchdog / CPU
watchdog.cpu.is_enabled Enables or disables the CPU-watchdog which monitors both CPU load average and CPU utilization

( Default: false )

watchdog.cpu.raised_error_times Times of raising error to a client

( Default: 5 )

watchdog.cpu.interval An interval of executing the watchdog processing

( Default: 10, Unit: sec )

watchdog.cpu.threshold_cpu_load_avg Threshold of CPU load average

( Default: 5.0 )

watchdog.cpu.threshold_cpu_util Threshold of CPU utilization

( Default: 100 )

Watchdog / DISK
watchdog.disk.is_enabled Enables or disables the

( Default: false )

watchdog.disk.raised_error_times Times of raising error to a client

( Default: 5 )

watchdog.disk.interval An interval of executing the watchdog processing

( Default: 10, Unit: sec )

watchdog.disk.threshold_disk_use Threshold of Disk use(%) of a target disk's capacity

( Default: 85, Unit: percent )

watchdog.disk.threshold_disk_util Threshold of Disk utilization

( Default: 90, Unit: percent )

watchdog.disk.threshold_disk_rkb Threshold of disk read KB/sec

( Default: 98304, Unit: KB )

watchdog.disk.threshold_disk_wkb Threshold of disk write KB/sec

( Default: 98304, Unit: KB )

watchdog.disk.target_devices Target devices for checking disk utilization

( Default: [] )

Watchdog / CLUSTER
watchdog.cluster.is_enabled Enables or disables the

( Default: false )

watchdog.cluster.interval An interval of executing the watchdog processing

( Default: 10 )

Watchdog / ERRORS
watchdog.error.is_enabled Enables or disables the

( Default: false )

watchdog.error.interval An interval of executing the watchdog processing

( Default: 60 )

watchdog.error.threshold_count Total counts of raising error to a client

( Default: 100 )

Data Compaction
Data Compaction / Basic
compaction.limit_num_of_compaction_procs Limit of a number of procs to execute data-compaction in parallel

( Default: 4 )

compaction.skip_prefetch_size Perfetch size when skipping garbage

( Default: 512 )

compaction.waiting_time_regular Regular value of compaction-proc waiting time/batch-proc

( Default: 500, Unit: msec )

compaction.waiting_time_max Maximum value of compaction-proc waiting time/batch-proc

( Default: 3000, Unit: msec )

compaction.batch_procs_regular Total number of regular compaction batch processes

( Default: 1000 )

compaction.batch_procs_max Maximum number of compaction batch processes

( Default: 1500 )

Data Compaction / Automated Data Compaction
autonomic_op.compaction.is_enabled Enables or disables the auto-compaction

( Default: false )

autonomic_op.compaction.parallel_procs Total number of parallel processes

( Default: 1 )

autonomic_op.compaction.interval An interval time of between auto-comcations

( Default: 3600, Unit: sec )

autonomic_op.compaction.warn_active_size_ratio Warning ratio of active size

( Default: 70, Unit: percent )

autonomic_op.compaction.threshold_active_size_ratio Threshold ratio of active size. LeoStorage start data-comaction after reaching it

( Default: 60, percent )

MQ
mq.backend_db The MQ storage feature is pluggable which depends on bitcask and leveldb.

( Default: leveldb )

mq.num_of_mq_procs A number of mq-server's processes

( Default: 8 )

mq.num_of_batch_process_max Maximum number of bach processes of message

( Default: 3000 )

mq.num_of_batch_process_regular Regular value of bach processes of message

( Default: 1600 )

mq.interval_between_batch_procs_max Maximum value of interval between batch-procs

( Default: 3000, Unit: msec )

mq.interval_between_batch_procs_regular Regular value of interval between batch-procs

( Default: 500, Unit: msec )

Backend DB / eleveldb
backend_db.eleveldb.write_buf_size Write Buffer Size. Larger values increase performance, especially during bulk loads.
Up to two write buffers may be held in memory at the same time, so you may wish to adjust this parameter to control memory usage.Also, a larger write buffer will result in a longer recovery time the next time the database is opened. As backend_db.eleveldb.write_buf_size * obj_containers.num_of_containers memory can be consumed in total, take both into account to meet with your memory footprint requirements on LeoStorage.

( Default: 62914560 )

backend_db.eleveldb.max_open_files Max Open Files. Number of open files that can be used by the DB. You may need to increase this if your database has a large working set (budget one open file per 2MB of working set).

( Default: 1000 )

backend_db.eleveldb.sst_block_size The size of a data block is controlled by the SST block size. The size represents a threshold, not a fixed count. Whenever a newly created block reaches this uncompressed size, leveldb considers it full and writes the block with its metadata to disk. The number of keys contained in the block depends upon the size of the values and keys.

( Default: 4096 )

Replication and Recovery object(s)
replication.rack_awareness.rack_id Rack-Id for the rack-awareness replica placement feature
replication.recovery.size_of_stacked_objs Size of stacked objects. Objects are stacked to send as a bulked object to remote nodes.

( Default: 5242880, Unit: byte )

replication.recovery.stacking_timeout Stacking timeout. A bulked object are sent to a remote node after reaching the timeout.

( Default: 1, Unit: sec )

Multi Data Center Replication / Basic
mdc_replication.size_of_stacked_objs Size of stacked objects. Objects are stacked to send as a bulked object to a remote cluster.

( Default: 33554432, Unit: byte )

mdc_replication.stacking_timeout Stacking timeout. A bulked object are sent to a remote cluster after reaching the timeout.

( Default: 30, Unit: sec )

mdc_replication.req_timeout Request timeout between clusters

( Default: 30000, Unit: msec )

Log
log.log_level Log level:
  • 0:debug
  • 1:info
  • 2:warn
  • 3:error

( Default: 1 )

log.is_enable_access_log Enables or disables the access-log feature

( Default: false )

log.access_log_level Access log's level:
  • 0: only regular case
  • 1: includes error cases

( Default: 0 )

log.erlang Destination of log file(s) of Erlang's log

( Default: ./log/erlang )

log.app Destination of log file(s) of LeoStorage

( Default: ./log/app )

log.member_dir Destination of log file(s) of members of storage-cluster

( Default: ./log/ring )

log.ring_dir Destination of log file(s) of RING

( Default: ./log/ring )

log.is_enable_diagnosis_log Destination of data-diagnosis log(s)

( Default: true )

Other Directories Settings
queue_dir Directory of queue for monitoring "RING"

( Default: ./work/queue )

snmp_agent Directory of SNMP agent configuration

( Default: ./snmp/snmpa_storage_0/LEO-STORAGE )

Item Description
nodename The format of the node name is <NAME>@<IP-ADDRESS>, which must be unique always in a LeoFS system

( Default: [email protected] )

distributed_cookie Sets the magic cookie of the node to Cookie.

- See also: Distributed Erlang

( Default: 401321b4 )

erlang.kernel_poll Kernel poll reduces LeoFS' CPU usage when it has hundreds (or more) network connections.

( Default: true )

erlang.asyc_threads The total number of Erlang aynch threads

( Default: 32 )

erlang.max_ports The max_ports sets the default value of maximum number of ports.

- See also: Erlang erlang:open_port/2

( Default: 64000 )

erlang.crash_dump The output destination of an Erlang crash dump

( Default: ./log/erl_crash.dump )

erlang.max_ets_tables The maxinum number of Erlagn ETS tables

( Default: 256000 )

erlang.smp -smp enable and -smp start the Erlang runtime system with SMP support enabled.

( Default: enable )

erlang.schedulers.compaction_of_load Enables or disables scheduler compaction of load. If it's enabled, the Erlang VM will attempt to fully load as many scheduler threads as mush as possible.

( Default: true )

erlang.schedulers.utilization_balancing Enables or disables scheduler utilization balancing of load. By default scheduler utilization balancing is disabled and instead scheduler compaction of load is enabled, which strives for a load distribution that causes as many scheduler threads as possible to be fully loaded (that is, not run out of work).

( Default: false )

erlang.distribution_buffer_size Sender-side network distribution buffer size (unit: KB)

( Default: 32768 )

erlang.fullsweep_after Option fullsweep_after makes it possible to specify the maximum number of generational collections before forcing a fullsweep, even if there is room on the old heap. Setting the number to zero disables the general collection algorithm, that is, all live data is copied at every garbage collection.

( Default: 0 )

erlang.secio Enables or disables eager check I/O scheduling. The flag effects when schedulers will check for I/O operations possible to execute, and when such I/O operations will execute.

( Default: true )

process_limit The maxinum number of Erlang processes. Sets the maximum number of simultaneously existing processes for this system if a Number is passed as value. Valid range for Number is [1024-134217727]

( Default: 1048576 )

Notes and Tips of the Configuration

obj_containers.path, obj_containers.num_of_containers

You can configure plural object containers with comma separated value of obj_containers.path and obj_containers.num_of_containers.

1
2
obj_containers.path = [/var/leofs/avs/1, /var/leofs/avs/2]
obj_containers.num_of_containers = [32, 64]

object_storage.is_strict_check

Without setting object_storage.is_strict_check to true, there is a little possibility your data could be broken without any caution even if a LeoFS system is running on a filesystem like ZFS1 that protect both the metadata and the data blocks through the checksum when bugs of any unexpected or unknown software got AVS files broken.

Configuration which can affect Load and CPU usage

mq.num_of_mq_procs can affect not only the performance/load during recover/rebalance operations but the load while there is at least one node suspended/downed in the cluster. so that setting mq.num_of_mq_procs to an appropriate value based on the amount of expected traffic and hardware specs is really important. This section would give you the brief understanding on mq.num_of_mq_procs and how to choose the optimal value for your requirements.

  • How the mq.num_of_mq_procs setting affect the system operations

    • High
      • Fast recover/rebalance time
      • High CPU/Load on storage during recover/rebalance and also existing suspended/stopped nodes in the cluster
    • Low
      • Slow recover/rebalance time
      • Low CPU/Load on storage during recover/rebalance and also existing suspended/stopped nodes in the cluster
  • Recommend settings

    • If you have enough CPU resources on storage nodes then set it to a higher one as long as it doesn't affect the operations coming from LeoGateway
    • If you don't then set it to somewhat a lower one unless the recover take too much time

For more details, Please see Issue #9872.

LeoStorage's MQ mechanism depends on the watchdog mechanism to reduce costs of a message consumption. The MQ dynamically updates a number of batch processes and an interval of a message consumption.

Figure: Number-of-batch-processes and interval:

Figure: Number-of-batch-processes and interval

As of Figure: Relationship of Watchdog and MQ, the watchdog can automatically adjust a value of a number of batch processes between mq.num_of_batch_process_min and mq.num_of_batch_process_max, which is increased or decreased with mq.num_of_batch_process_step.

On the other hands, a value of an interval is adjusted between mq.interval_between_batch_procs_min and mq.interval_between_batch_procs_max, which is increased or decreased with mq.interval_between_batch_procs_step.

When the each value reached the min value, the MQ changes the status to suspending, after that the node’s processing costs is changed to low, the MQ updates the status to running, again.

Figure: Relationship of Watchdog and MQ

LeoStorage's auto-compaction mechanism also depends on the watchdog mechanism to reduce costs of processing. The Auto-compaction can dynamically update a number of batch processes and an interval of a processing of seeking an object. The basic design of the relationship with the watchdog is similar to the MQ.

Figure: Number-of-batch-processes and interval

Figure: Number-of-batch-processes and interval

As of Figure: Relationship of the watchdog and the auto-compaction, the watchdog automatically adjusts the value of a number of batch processes between compaction.batch_procs_min and compaction.batch_procs_max, which is increased or decreased with compaction.batch_procs_step.

On the other hand, the value of an interval is adjusted between compaction.waiting_time_min and compaction.waiting_time_max, which is increased or decreased with compaction.waiting_time_step.

When the each value reached the min value, the auto-compaction changes the status to suspending, after that the node’s processing costs is changed to low, the auto-compaction updates the status to running, again.

Figure: Relationship of the watchdog and the auto-compaction

Figure: Relationship of the watchdog and the auto-compaction