April 16, 2013

LeoFS overview

Our Motivation

../../../_images/leofs.007.jpg

We found storage problems in our company, A lot of services depended on Expensive Storages which is stored any unstructured data such as images, documents and so on.

We should resolve 3-problems:

LowROI - Low budget services cannot pay when using expensive storages.

Possibility of SPOF - Depending on the budget, It is difficult to build redundant-structure with expensive products.

Storage expansion is difficult during increasing data - It cannot easily add (expand) an “Expensive Storage”.

Need to move from expensive storage to something.

Aim to ...

../../../_images/leofs.010.jpg

As the result of our try and error, we finally got satisfy our storage requirements, which are 3-things:

ONE-Huge storage - It’s so called Storage Platform.

Non-Stop storage - The storage-system is requested from a lot of web services, which require is always running.

Specialized in the Web - All web-services need to easily communicate with the storage-asystem, We decided that it provide not FUSE but REST-API over HTTP. Depending on specific storage, so it cannot definitely scale.

LeoFS Overview

../../../_images/leofs.013.jpg

LeoFS is able to store various unstructured-data such as photo, document, movie, log-data, and so on. LeoFS already cover from small-size files to large-size files.

We aim to Storage Platform in the cloud.

../../../_images/leofs.014.jpg

For building storage platform, What we’re going to do is provide S3-API. Because S3-API has provided any PG-lang clients, GUI clients, and so on. Also, they’re able to fluently communicate with LeoFS.

In centralized LeoFS, We have been building storage platform in our company. It needs 3-things:

  • High cost performance ratio
  • High reliability
  • High scalability
../../../_images/leofs.019.jpg

LeoFS consists of 3-functions - Storage, Gateway and Manager which depend on Erlang.

Gateway handle http-request/response from any clients when using REST-API OR S3-API and Gateway is already built in object-cache mechanism.

Storage handle GET, PUT and DELETE objects as well as metadata, Also Storage has replicator, recoverer and queueing mechanism for keep running and keep consistency.

Manager, which always monitors gateway-nodes and storage-nodes. Main monitoring status are node-status and RING’s checksum for keep running and keep consistency.

../../../_images/leofs.020.jpg

LeoFS’s system layout is very simple. LeoFS does not has master server, so there is NO-SPOF.

LeoFS already implement SNMP-Agent. You can easily monitor LeoFS with monitoring-tools such as Nagios and Zabbix.

Inside LeoFS

../../../_images/leofs.024.jpg

Mutual function set of loosely connect with Erlang’s RPC as well as the internal storage-cluster.

LeoFS Gateway

../../../_images/leofs.026.jpg

Gateway consists of Stateless Proxy and Object Cache. You can easily increase gateway-nodes during high-load. We chose Cowboy as Gateway’s HTTP-Server, because we expected HIGH-Performance.

It already provide RestAPI and S3-API. They are simpler APIs. it requests to a storage node when inquiring RING, which is based on “consistent-hashing”.

../../../_images/leofs.029.jpg

Also, Object-cache mechanism (hierarchical cache) realizes reduction of traffic between Gateway and Storage.

LeoFS Storage

../../../_images/leofs.031.jpg

“LeoFS-Storage” consists of storage-engine, object-replicator, object-repairer, message queuing, and so on. Also, each storage-engine’s worker consists of metadata(s) and object-container(s) which is log-structured file.

A file (a raw data) is replicated to other nodes up to defined a number of replicas as well as metadata.

../../../_images/leofs.033.jpg

LeoFS’s data structure has 3-layers.

Metadata consists of filename, file-size, checksum, and so on. An actual object is retrieved with file-name, file-size and offset.

Needle is LeoFS’s original file format. A needle consists of metadata, an actual-file and footer. It’s able to recover metadata from a needle.

An object container consists of super-block and any needles.

  1. In case of retrieving an object from the storage:
    • First, Storage engine retrieves a metadata from the metadata-storage
    • Then retrieves an object from the object-container when using file-size and container’s offset
  2. In case of inserting an object into the storage:
    • First, Storage engine inserts a metadata into the metadata-storage
    • Then appends an object into the object storage container
../../../_images/leofs.037.jpg

This mechanism occurs side-effects. Storage-Engines sometimes need to remove unnecessary objects as well as metadata. LeoFS has been taking a measure for less an effect of compaction, which is phased-compaciton. We’re planning to support auto-compaction with LeoFS v0.14.2.

LeoFS Manager

../../../_images/leofs.042.jpg

Manager distributes “RING” and “Storage-cluster’s members”. It always monitors Node status and RING status, because LeoFS is required to realize High-Availability.

Also, Manager provides that it’s able to easily operation methods (suspend/resume/detach/whereis etc) for Gateway and Storage and provide RING to Gateway and Storage, so “LeoFS-manager” manages “RING”. If it found incorrect RING, it fixes RING’s consistency. If Storage found that, it notify that to Manager. Eventually, the problem is resolved by Manager.

Future Works

LeoFS-QoS

QoS realizes 2 things:
  • 1st, It’s able to control request from client to LeoFS during one-minute, QoS refuses requests from one of client when over defined a number of request threshold.
  • 2nd, It’s able to store and see LeoFS’s traffic data for administrators.

OpenStack Integration

For we and you can build Cloud Platform (IaaS).

Multi-DC data replication

../../../_images/leofs.052.jpg

We expect more high-scalability and more High-availability

Wrap up

../../../_images/leofs.054.jpg

We keep improving and growing LeoFS. We will reach important 3-things - High Cost Performance Ratio, High Reliability, and High Scalability in this summer.

LeoFS is the powerful storage-system. I think when you have a chance to use it, you’ll agree that. If you have any questions, please contact us.