Snap Replication File System

Tue
09/04/18

Snap Replication File System

snaprfs is a daemon used to replicate files between computers on your Snap! cluster.

This is somewhat similar to what Hadoop is expected to do.

There are several capabilities as described below.

Synchronize Directory

Keep a directory synchronized by copying the newest version of a file to all the other computers that do not already have it.

This is used to keep definitions in synchronization.

snaprfs keeps the files meta data in place (i.e. stat(3) information, especially the modification time) so it can detect whether one of the files on a computer is newer than a file on another computer and if so, by default the newer file overwrites the older version.

It is also possible to define a specific source. Although, this can be a failure point, it can be easier to make sure that files do not overwrite the one version which is considered the newest.

Replicate File

Some files are known to be created by one specific computer. For example, the status of a computer as generated by the Sitter is saved on the computer on which Sitter is running. That status data is then shared with the other computers on the cluster in order to know the status from any one computer.

This method copies the file from the specific source computer to any other computer that expects a copy. Any time the file changes, a new copy happens.

Disappearing Files

Whenever the Sitter generates a computer's status, it uses the hostname as the filename for that computer's status data. This is practical since the files on each system will look like it is a file from this or that computer as the name may be.

One drawback of this method is the fact that you can change the name of a computer and the old file will stick around. There is currently no heuristic to remove such old files (this is a bug and we have a Jira issue, SNAP-414, about it...) Only one computer will be able to delete the file, what about the others? This is where the Snap RFS comes in and is told that files that are removed on one system need to be removed on all systems.

Implementation

snaprfs is here in part to alleviate the communicator daemon from sending files across the network. However, snaprfs uses communicatord to communicate with other daemons in order to know what work it has to do. Some of the work can also be described in configuration files.

Permissions

Since the daemon is expected to be capable of copying any files, it has to run as root.

For example, in order for this daemon to copy the SSH keys between various user accounts, it needs to be able to read and write those files.

At this point we do not see any safe way of allowing certain permissions for certain files and not for others without having a description of what user/group are allowed under such and such path which at this point is a little too complicated to handle. Later versions may get smarter about that, though.

Messages

SENDFILE filename=...;condition=...

The SYNC message is used to reequest an instant synchronization. This means the file referenced by the SYNC command is going to be replicated on the other systems at the time the event is received.

SYNCHRONIZE path=...;condition=...;descendants=...

The SYNCHRONIZE message requests that snaprfs keeps a directory synchronized between all the destination computers (i.e. computers that satisfy the condition expression.)

The path parameter must specify a full path to an existing directory. Anything else and the message fails.

If not specified, the descendant parameter is given the default value of "false". Setting this parameter to true means that all sub-directories and their files also get replicated. Sub-directories can be added and removed over time as necessary.

For removals, the date when that event happens is the one used to know whether a file should be replicated or removed.

This message is lasts until snaprfs exists or the STOPSYNCHRONIZE command is receiveed.

STOPSYNCHRONIZE path=...

This message cancels a previous SYNCHRONIZE message.

Which directory to stop synchronizing is determined by path=... which has to match the path of the SYNCHRONIZE message.

REMOVEFILE filename=...;condition=...

Once in a while, a file gets deleted. This is replicated deleting the file in the entire cluster.

XML File Format

The snaprfs loads XMLfiles defined under /usr/share/snapwebsites/snaprfs/replicatipn/*.xml

The file are to have the .xml extension.

The format is as follow:

<?xml version="1.0" encoding="utf-8"?>
<snaorfs>
  <synchronize path="..." descendants="...">
    <condition>...</condition>
  </synchronize>
  <replicate path="...">
    <condition>...</condition>
    <freshness timeout="...">...patterns...</freshness>
  </replicate>
  <distribute path="..." port="..." replica="...">
  </distribute>
</snaprfs>

The <synchronize> tag is similar to the SYNCHRONIZE message. It ensures that the specified directory remains synchronized. The path must point to an existing directory. The descendants attribute can be defined in which case sub-directories get synchronized too.

The <replicate> tag allows for replication. This means that servers share files only when they are missing and only if fresh enough. This is useful to handle the snap.cgi cache. When a file gets removed on one computer, it may get replaced by a copy on another computer, but only if fresh enough. If the file does not match any freshness patterns, then it gets ignored. Most often, you will include one freshness tag with "*" as the pattern, as in:

<freshness timeout="86400">*</freshness>

which means copy any other file if not any older than 1 day.

The <distribute> tag is closer to what Hadoop does: distribute files on X number of computers for fast retrieval and replication (fail safe). The snap.cgi caches may end up using this feature some day. Files must be sent and retrieved using a TCP connection to the specified port attribute using localhost as the IP address.

The file that is kept is the newest version. We calculate that using the time and date when the file gets sent to the snaprfs server. Note that the snaprfs server saves the file in its journal no matter what, then it checks whether another newer version was receive while saving its own copy of the file. If so, the older file gets deleted and the newer file gets replicated.

The snaprfs determines where to replicate each file using their Murmur3 checksum, the number of servers available to snaprfs, and the number of replica requested by the administrator. It keeps track of where each replica is saved in its meta data (this is very important in the event the user changes the parameters: i.e. adds new servers, change the replica parameter, etc.)

Note that part of this mechanism is a duplication of what Cassandra does for us already.

The list of servers used by a <distribute> tag can't be defined in the XML file since that has nothing to do with the file itself. It has to be defined in a file under /var/lib/snapwebsites or in a configuration file under /etc/snapwebsites/snapwebsites.d/....

Condition

The condition parameter found in messages and XML files is a C-like expression as understood by the snapexpr command line tool. It can be used to know whether a file should be sent to a given computer or not. If the destination computer gets an event about copying or deleting a file, the condition code is runned and if the result is negative (equals 0 or false) then the file will not be sent to that client.

For example, files that are shared between computers that run snapdbproxy must have that service installed. One way to determine whether a computer has snapdbproxy installed is to check wehther the corresponding service file exists:

file_exists("/usr/share/snapwebsites/services/snapdbproxy.service");

This way the files do not get copied on Cassandra systems.

Coverage Test Results

Access full page here.

The library has a test suite that covers 100% of the code, making it a little more certain that it does not include too many bugs. We try to run the tests each time we create a new version to ensure that it works as expected.

Current Project

Snap! Websites
An Open Source CMS System in C++

Contact Us Directly

Recent blog posts

more

Snap! A C++ Open Source CMS

Snap Replication File System