Inter-process Signalling [core]

The inter-process signalling system is a server that can be communicated with via TCP/IP streams and UDP packets. This is very similar to an RPC system, only extremely specialized for Snap!

When starting Snap! you run snapinit. That process creates the snapcommunicator process which is the signalling server. Once the server is up and running, the other processes are started.

The snapcommunicator process is a server that accepts four types of connections:

  • Quick UDP signals, in most cases those are used to ping a process so it immediately starts a task in the background.
  • Registration TCP/IP connection, this is done by the various processes that expect to be signaled once in a while (via UDP signals). The snapinit, snapserver, snapfirewall, and snapbackend processes make use of this connection scheme.
  • Controller TCP/IP connection, to retrieve the current status of the snapcommunicator process and the registered processes and to send some secure commands to the snapcommunicator (such as a STOP or SHUTDOWN command). This is used by the snapmonitor process to make sure the network is healthy. The controller mechanism can only be used with a password.
  • Inter-computer TCP/IP connections, to process the signals between various computers (i.e. a plugin sending a PING to the Image backend does not need to know where the Image backend is running, the Image backend registered itself and the snapcommunicator process knows about it through this channel. There can even be multiple instances of the Image process running and snapcommunicator will know what to do about it.)

TODO: The local connections should be using a Unix Socket instead of TCP/IP. Unix Sockets can be faster as they do not require the TCP/IP overhead (less context switching, less middling with the data being transferred.)

The idea in implementing snapcommunicator came up when we started to work on multiple websites and multiple computers. It will be a lot easier if all plugins can communicate with a single process and thus one port instead of many ports and processes, processes which could be on separate computers that a UDP packet may not be capable to reach (too many hops, firewalls, problems with sending a UDP packet to a public network pinging computers that aren't yours...)

So instead we decided that we should have one central communication process which is the only one to know of all the processes on the current computer and it has knowledge of the other computers that need to be pinged (instead of each plugin having the need to have such advanced knowledge.)

The server makes use of a very simple protocol. The controller only works if you supply a valid password. The other functionality is more relaxed, allowing much faster access to all the necessary processes.

Implementation: The TCP/IP and UDP ports both are fully functional; only the "password protected" feature is not implemented and may never get implemented because we can distinguish a localhost connection and just 100% trust such connections anyway.

Caching

The inter-process signalling system should save (cache) each signal it receives in a file. Then attempt to propagate them. If the propagation happens successfully then the cached signal can be deleted. If a cache exists, then the signal was not yet propagated so we can process it until we get a valid reply. This should not be done for more than 5 minutes since any backend should anyway verify that it did not miss any signals in the last 5 minutes. (although we may want to support messaging for other reasons which should not time out like that.)

Implementation: Cache is implemented for UDP signals and local TCP/IP messages.

Connections

Many connections are marked as "permanent connections". This allows daemons to automatically reconnect to snapcommunicator, and snapcommunicators between each others, in case a connection is lost. These type of connections are heavier and they may make use of a thread (optional) to do the slow reconnection work.

Connections that are temporary and need to be really fast, as in the snap_child object, make use of a one time UDP port.

Connections that are temporary, but are to be bidirectional, will use a standard TCP/IP connection.

Implementation: Fully functional.

Network Security

The snapcommunicator server can listen to multiple types of IP and networks and each is given a variable level of security.

Implementation: We have the code to detect the type of the client's IP address. We still want to implement various security checks. At this time the security is mainly handled by the firewall.

localhost / Unix socket (highly secure)

One connection is on the localhost IP address (127.0.0.1) since only processes running on the same computer can connect to that IP address, it is considered very safe and no credentials are required in this case (because you are expected to run Snap! on computers that aren't shared between users, only you.)

Implementation: Fully functional, but we still want to add support for the Unix socket instead of a TCP connection. A Unix socket will be slightly faster, over the long term, it will be a very large number of cycles saved.

Private Network (semi-secure and highly secure)

Optionally you can setup a connection on one or more private networks. For example, you could define the usual private network address and mask 10.0.0.0/8 or 192.168.0.0/16.

Here you have the option to require full credentials (the private network is partially shared with others which can be the case on cloud systems) or mark the connections as highly secure.

If the private network is shared, make sure to request for full encryption on those connections. (i.e. use SSL or a VPN and a tun connection instead of a direct connection in clear.)

Implementation: We support local networks that are safe if protected by firewall (i.e. DigitalOcean, Amazon, RackSpace, etc.), we support SSL encryption on all connections going between computers and in some cases within the same computer, and we support VPN. However, be careful with using VPN because one computer will be the VPN server and ALL traffic will go through that one single server (it is a point of failure and a bottle neck for your network.)

Remote Network (secure or insecure)

Assuming you own multiple data centers that have a specific set of IP addresses and these IP addresses are considered private to your organization, then these connections can be marked as secure. It is still highly recommended that you implement a very strong firewall security between all connections and you must request for full encryption of all data traveling on such networks. This is important for one simple reason: IP addresses can easily be spoofed. MAC addresses can also be spoofed. However, it will be harder for hackers to spoof everything and penetrate your network if you use all possible protection mechanisms.

Although you may mark a remote network as secure, credentials will still be used when connecting remotely. However, higher level commands can be authorized (i.e. a full SHUTDOWN for example.)

Insecure remote network are similar, only certain commands can be forbidden to prevent full access to the network in case of a break-in.

Implementation: Not yet implemented. We do support SSL, but not on a per connection basis. In other words, you either have to use SSL on all connections, or no SSL at all. At some point we'll offer two connection ports: one without SSL and one with SSL.

Public Network (insecure)

There is probably no real need for such a setup, unless you have a relatively small setup with many computers in many different data centers and thus do not own one specific Internet based network.

In this case, like for remote networks, you should protect your computers with full and strong firewall settings.

The snapcommunicator is opened to the public network by using the ANY address (0.0.0.0) which means that by default anyone can connect unless you block all IP addresses in your firewall, except a few representing your allowed computers. However, as mentioned earlier, IP addresses can be spoofed so you must also have all the data traveling encrypted and full credentials must be used to finalize the connection. Plus only a limited number of commands will be accepted (mainly the necessary commands to ping the backends for letting them know work is available.)

Implementation: Not yet implemented. The Private Network connection with SSL should suffice for the Public Network. The only real reason for a full public connection to snapcommunicator is if you need to allow computers that have dynamically changing IP addresses and those are not extremely safe.

Implementation

Since a snapcommunicator may end up opening a large number of connections, we will hit all sorts of limits:

  • Maximum number of files to open in an instance of the OS
  • Maximum number of files to open in a process
  • Maximum number of ports that can be used in an instance of the OS

The maximum number of files is generally very high on Linux. It will depend mainly on how much memory you have. On an 8Gb machine, you should get over 1.5 million handles without much of a problem.

However, when initiating a connection through a port, you consume a port. The total number of ports is limited to 65536 with several that are never used (such as 0 and 65535) and some that are reserved by the system (port numbers below 1024). With that in mind, you probably can use 60,000 ports per computer. So... even though you could use a really large number of sockets and other file descriptors in a single process, you will quickly reach the maximum number of ports and be prevented from creating any additional connections (note that servers can accept any number of connections, those do not use extra ports in themselves).

That being said, for snapcommunicator, we can have installations that are per rack. All communications within a rack can make use of full connections. Within one rack you should not have much more than 5,000 instances of the OS running (if that much, that's assuming you use virtualization, of course) and between racks have a different policy: use a small number of communicators (such as one per machine so it could be between 12 and 48 per rack) to connect to other racks.

Note that to quickly handle many sockets snapcommunicator uses poll().

In current Linux versions, poll() and similar calls support 16768 sockets (soft limit) with a total maximum of 65536 files opened simultaneously per computer.

UDP/IP versus TCP/IP

We use the UDP/IP protocol when a process wants to very quickly send a signal to another process. For example, when you save a new image in the database and attach a script to that image. That script will eventually be executed by the images backend. To make it execute that script as soon as possible, you send a UDP/IP signal (a PING) to the images backend. This message is defined with "images" as the service and "PING" as the command, it although takes "uri" as a parameter representing the domain that just generating this PING message. The PING wakes the service as soon as possible requesting the service to check all the possible changes instead of sleeping more or going back to sleep. You can send a UDP/IP signal with the snapsignal tool:

snapsignal "images/PING uri=http://www.example.com/"

The messages (protocol) used to send/receive messages is the same for UDP and TCP and is described below.

UDP/IP and TCP/IP Protocol

The protocol is one line per command (i.e. UTF-8 characters ended by one '\n'.) A line includes the following parts:

  • From server name (optional)
  • From service name (optional)
  • Service name (optional)
  • Server name (optional)
  • Command name (required)
  • Parameters (optional)

The syntax to enter those parts is:

protocol: sent-from sender command parameters
sent-from: '<' server-name ':' service-name ' '
         | <empty>
name: [a-zA-Z_][a-zA-Z0-9_]*
server-name: name
service-name: name
sender: server-name ':' service
      | service
      | <empty>
service: service-name '/'
       | '*' '/'
       | '?' '/'
       | '.' '/'
command: [A-Z_][A-Z0-9_]*
parameters: parameter_set
          | parameters ';' parameter_set
parameter_name: name
parameter_set: parameter_name '=' value
value: ANY if no '"' or ';'
     | '"' ANY '"'

Where value can include a '"', '\', '\n', and '\r' if escaped with a backslash. Note that the '\n' character (so code 0x0A) is transformed to the characters '\' and 'n' so it does not look like a newline ending a command.

The sender information is optional. It is the less than (<), the name of the origin server, a colon (:), and the name of the service sending the message. If specified, it must be followed by a space. Both, the name of the server and the name of the service must be specified.

The destination service name, found just before the command, defines which service the message is for. It may be preceeded by the name of a specific server, in which case the message will be sent to that specific server. Since we send all messages to the Snap Communicator server, the server needs to know who the message is really for. Services register with the REGISTER command and that command includes their name allowing the Snap Communicator to know where to send such messages. Other Snap Communicator servers send or receive a CONNECT command with the name of that other server. Messages for which the destination is down at the time the Snap Communicator receives them are cached for a while (WARNING: only local messages are currently cached, inter-computer messages are just lost if no destination can be found.) If the destination appears at a later time, the messages are forward then.

The special destination service names "*", "?", and "." can be used to broadcast a message, meaning that the message will be sent to multiple destinations. The "*" is used to broadcast a message to all services on the local computer and all remote computers in the entire cluster.

The "?" service name is used to broadcast a message to all computers within one data center.

The "." service name is used to broadcast a message to all local services currently registered.

Note that the destination server name can also be set to "*" meaning that any destination server can be selected by Snap Communicator.

Note: At this point we do not foresee the need to transmit large amount of data, only simple commands, so one line of data is plenty. Large data is generally saved in the database or some files instead. (There is already one counter example: the snapmanagerdaemon sends MANAGERSTATUS messages that can become pretty large.)

The commands we understand are:

ACCEPT services=<...>;heard_of=<...>;neighbors=<host>(,<host>)*;server_name=<name>

The ACCEPT command is send in response to a CONNECT command.

The services parameter defines a comma separated list of services offered by that computer. This allows the connecting computer to know how to forward messages to those services even if they are not directly linked to this snapcommunicator. The list of local services are handed by snapinit to snapcommunicator using the SERVICES message.

The heard_of parameter defines a comma separated list of services that the snapcommunicator we are connecting to heard of, meaning that if sent a message for one of those services, it will know where to forward the message to.

The neighbors parameter defines a comma separated list of hosts that have a snapcommunicator server running. These are added to our own list of neighbors so later we can attempt to connect to those servers ourselves if we have not already done so. Note that this parameter is a limited list of neighbors as defined in this snapcommunicator configuration file. Each snapcommunicator also uses the GOSSIP message to tell each others about more snapcommunicators that exist in the cluster. This very list is kept relatively small to avoid sending humugous messages between each instance of snapcommunicator (this is not yet implemented, though).

The server_name parameter is the name of the server that sent the ACCEPT message. It is used to know where to send a message when we receive it. This parameter is mandatory.

ACCEPT server_name=charlie;services=images,pagelist,sendmail;heard_of=snapwatchdog

ADDTICKET object_name=<name>;key=<key>;timeout=<date>

The ADDTICKET message is used by the snaplock daemon to add a ticket to the existing list of tickets before it can grab a lock.

The object_name is the name of what is getting locked. In Snap, this is generally set to a URI of a page being worked on. This makes the lock quite specific (i.e. you may lock just one cell if you'd like to do so!)

The key is the name of the server name and PID of the process attempting a lock separated by a slash. This is managed internally.

The timeout date represents the date when the obtension of the lock times out. This date is often really close to 'now'. (i.e. if you want to give the snaplock too 10 seconds to obtain a lock, timeout will be now + 10 sec.)

ADDTICKET object_name=http://example.com/user/counter;key=my_server/123;timeout=145572827

BLOCK ip=<...>;period=</...>

Request that an IP address, as specified by the ip parameter, be blobked for the amount of time as specified by the period parameter.

The firewall accepts this message and reacts by adding that IP to the firewall of the computer it is running on. What gets blocked is defined in the /etc/network/iplock.conf configuration file. By default, iplock assumes that the HTTP (80) and HTTPS (443) ports are to be blocked. If you have other ports opened, such as 8888 or 8080, then you may consider locking those ports too.

The period parameter must be a string set to one of the following values:

  • hour -- blocked for 3600 seconds
  • day -- blocked for 86400 seconds
  • week -- blocked for 86400 x 7 seconds
  • month -- blocked for 86400 x 31 seconds, which month we are on is ignored, we always use 31 days
  • year -- blocked for 86400 x 366 seconds, whether the year is considered to be 365 or 366 days, we always use 366
  • forever -- blocked for 86400 x 366 x 5 seconds, so a little over 5 years, which on the Internet is close to represet forever...
  • Any other value is ignored (with the log of an error) and the block is set to 86400 seconds (i.e. one day)

Note that since iplock does not block other ports, such as port 25 (SMTP) and port 22 (SSH), you should not get those blocked at any point of time until you edit the iplock.conf file.

BLOCK ip=127.0.0.1;period=hour

COMMANDS list=<...>

A local service that registers with the snapcommunicator process may be asked about its understood (implemented) commands with the HELP command. COMMANDS is the expected response.

The list parameter is a string of commands separated by commas. At this time you cannot indicate whether certain parameters are supported, although if your process knows the command it should know how to handle the parameters too (in the long run, if parameters have to change, it should not be a big deal as long as they are optional parameters. If not optional, we may consider adding a new command instead.)

So for example, a process that understands the HELP, PING, and STOP commands would return:

COMMANDS list=HELP,PING,STOP

P.S. Note that all local services must at least understand the HELP, QUITTING, READY, STOP, and UNKNOWN commands. For services that make use of the logger, they should also understand the LOG command. Turning ON the debug in snapcommunicator can help you detect missing entries in the list parameter. Remote connections (limited to snapcommunicator daemons at this point) must understand the ACCEPT, HELP, QUITTING, STOP, and UNKNOWN commands. Again, the --debug command line option can be used to make sure that is the case.

CONNECT types=<...>;services=<...>;heard_of=<...>;neighbors=<host>(,<host>)*

When a Snap Communicator server starts, it tries to connect to other Snap Communicator servers (called neighbors).

The CONNECT command accepts the same parameters as the ACCEPT command: types (types of server), services (list of services offered locally), heard_of (known services), and neighbors (list of explicitly defined neighbors).

The connection will be accepted with an ACCEPT command, which includes the same parameters from the other Snap Communicator.

If the connection is refused, then the REFUSE command is returned with a list of potential neighbors that this Snap Communicator can attempt to connect with. In most cases connections are refused because the instance you are trying to connect to already has a large number of connections and it does not want to have more.

CONNECT type=apache,frontend

DISCONNECT

When a Snap Communicator server that previously CONNECTed is ready to cleanly stop, it tries to disconnect from other Snap Communicator servers using this command. Note that if a connection was REFUSEd then it is not considered connected and the DISCONNECT command should not be used in this circumstance.

The DISCONNECT generally happens when a STOP command is received by a snapcommunicator server. snapcommunicator services that receive this message have to immediately remove that other snapcommunicator from their connections. This will prevent any further forwarding of messages that would otherwise be lost in the process.

DROPTICKET object_name=<name>;key=<key>

Once done with a ticket we want to drop it. This is important to release a lock or avoid grabbing the lock if you did not have it yet (i.e. you are still trying to obtain the lock, but decide to drop it early.)

The object_name is the name of what is getting locked. In Snap, this is generally set to a URI of a page being worked on. This makes the lock quite specific (i.e. you may lock just one cell if you'd like to do so!)

The key is the name of the server name and PID of the process attempting a lock separated by a slash. If the ticket represented a valid lock, then the key is the lock key. This is the ticket number, a slash, the server name, a slash, and the PID of the process that got the lock.

DROPTICKET object_name=http://example.com/ticket;key=232/my_server/3321

GETMAXTICKET object_name=<name>;key=<key>

The snaplock process, in order to assign a ticket number to a lock request needs to retrieve the largest ticket number already in use and add one. That will be the newly assigned ticket number for that the new ticket it is trying to assign. This message is used for the purpose: it asks all the snaplock running about their largest ticket number.

The object_name is the name of what is getting locked. In Snap, this is generally set to a URI of a page being worked on. This makes the lock quite specific (i.e. you may lock just one cell if you'd like to do so!)

The key is the name of the server name and PID of the process attempting a lock separated by a slash.

object_name=http://example.com/ticket;key=my_server/3321

GOSSIP neighbors=<comma separated list of neighbors>

When a Snap Communicator receives CONNECT, ACCEPT, REFUSE, or GOSSIP, some of the neighbors specified in those messages may be considered new to this Snap Communicator.

Such new neighbors have to be boardcast to all Snap Communicators and to do so we further send a GOSSIP messages to all our neighbors.

The GOSSIP message is sent with parsimony to avoid swamping everyone with useless messages since a Snap Communicator sending that information to all its neighbors would send the exact same GOSSIP as another neighbor would also send... We will want to look into a way to coordinate this message properly once we actually implement it.

GOSSIP neighbors=10.0.0.3:4040,10.0.0.20

HELP

The Snap Communicator server sends that command to all services that REGISTER with it. The service is expected to reply with the COMMANDS command, which lists all the commands the service understands.

In the current implementation, this mechanism is used to detect and have special handling of services that understand the STATUS command. In debug mode, all commands are checked and if one is not implemented but required, snapcommunicator throws (i.e. commands such as STATUS and LOG are optional and may or may not be implemented.)

HELP

LISTTICKETS

In order to debug the snaplock tool, one wants to know whether some tickets are active. This command is used to list the currently active tickets and their status. The caller will get a TICKETLIST message as a reply.

Note: since the snaplock process is not multithreaded, the status is always exact at the time it gets queried.

LISTTICKETS

LOCK object_name=<name>;pid=<client pid>;timeout=<obstention timeout>;duration=<lock duration>

The LOCK command is used to obtain a lock. It requires the name of an object which is going to be locked.

The object_name is generally a URI which is going to be locked. The lock can be as precise as a cell, or as broad as the entire database, as long as all users agree on the same name.

The pid is the process identifier of the process asking for the lock. This is used to distinguish that specific process from any other process asking for the same lock. Note that this is really the task identifier which you obtain with gettid(), this is also referenced as the thread identifier.

The timeout parameter is how until when the system has to obtain the lock. If that time and date is passed without obtention of the lock, then the lock fails with a LOCKFAILED. If the lock is obtain within that date, then the lock is obtained and the system sends a LOCKED message.

The duration gives the amount of time the lock is kept once obtained. In other words, this is added to the date when the lock is considered obtained.

LOCK object_name=http://example.com/to/be/locked;pid=123;timeout=12363653;duration=3600

LOG

WARNING: log4cplus has a RollingFileAppender. However, that feature does NOT work properly in a multi-process logging environment. If process A opens the same log file as process B, then A and B will be fighting for the execution of the "rolling" part. There is no good / easy way to avoid that in log4cplus, so we instead use logrotate to do our rolling work (as a bonus, notice that logrotate has many more options in that area, such as compressing rolled files.) As a result we needed to have a LOG signal to be sent to all local services whenever the logrotate tool applies its rotation.

This message can be sent to ask various systems to reset their logger. This is necessary whenever logrotate kicks in and the logger should start logging in the new log file. All you have to do when you receive this event is logging::reconfigure(). So it should be easy enough fo your to take the time. If you are running a temporary process for a short period of time, the implementation of the LOG command is not required.

At this time, snapcommunicator, snapinit, and snapserver accept this message. We need to implement that functionality in all the backends.

Note that the loggingserver should work just fine with the RollingFileAppender assuming that it is the only one accessing those files (since everyone else would be using the loggingserver as the destination point.)

In this case we want to broadcast the LOG message to all local services that understand it. logrotate is asked to do so by send the following message to snapcommunicator:

*/LOG

MAXTICKET object_name=<name>;key=<key>;ticket_id=<number>

When one of the snaplock processes sends a GETMAXTICKET command, it is expected to be answered by sending a MAXTICKET command which includes the largest .

The object_name is the name of what is getting locked. In Snap, this is generally set to a URI of a page being worked on. This makes the lock quite specific (i.e. you may lock just one cell if you'd like to do so!)

The key is the name of the server name and PID of the process attempting a lock separated by a slash. If the ticket represented a valid lock, then the key is the lock key. This is the ticket number, a slash, the server name, a slash, and the PID of the process that got the lock.

The ticket_id number is the largest ticket known by the sender of the MAXTICKET message. The snaplock that sent the GETMAXTICKET is in charge of keeping the largest from all the replies.

MAXTICKET object_name=http://www.example.com/magic/ticket;key=my_server/123;ticket_id=43

PING

The PING signal is understood by most backends so they can be awaken immediately to run their processing. If the backend is already running, the PING can be recorded and the process re-run immediately after the current run ends.

The PING signal is not cummulative. Sending it multiple times before it gets managed still results in a single PING signal. So if the snapcommunicator has problems forwarding a PING to a certain backend (service), it will not cummulate the message.

sendmail/PING

QUITTING

At the time the snapcommunicator receives a STOP or SHUTDOWN command, it is marked as quitting.

Since the system is used on a cluster of computers, it may still receive a few more TCP or UDP messages after the STOP or SHUTDOWN were sent and before all the other systems were told to SHUTDOWN, DISCONNECT, or STOP from this instance. It will answer to all the TCP requests with the QUITTING message. It ignores any UDP messages though.

For this reason all systems connecting to snapcommunicator must implement the QUITTING command.

QUITTING

READY

The READY response is sent by snapcommunicator whenever someone sends the REGISTER command. This way the registration is complete and the client can acknowledge the fact.

READY

REFUSE neighbors=<host>(,<host>)*

The REFUSE command is sent as an answer to the CONNECT command. The receiver is expected to close the TCP connection ASAP after receiving this response.

There are serveral reasons to REFUSE a connection:

  • The receiver already has too many connections (the limit is 100 by default, it could be grown to 16,000 on larger computers.)
  • The receiver sees that it just sent a CONNECT to the sender. In that case, the REFUSE message is sent to the server with the larger IP address.
  • The snapcommunicator has not received any SERVICES message from snapinit.
  • Some other resources at tight and this snapcommunicator does not want to stress the computer out too much.

The command has one parameter called neighbors, which is a comma separated list of IP:port addresses of neighbor computers that one can attempt to connect with to handle inter-computer signals. This parameter is optional (a server may not be told about any neighbors!)

The list of neightbors can be pretty large.

REFUSE neighbors=10.0.0.1,10.0.0.2,10.0.0.5

REGISTER service=<name>;version=1

The REGISTER command is the first sent by a local process that wants to register itself as a running and available service for snapcommunicator to work with. i.e. a daemon that will be running until asked to STOP.

The <name> in the service parameter, which is mandatory, is not the name of a Unix process but the name of the service offered by that process (i.e. the service name may be "images" when the Unix process is "snapbackend").

The REGISTER command only works from localhost IP addresses. Any other registration is automatically refused.

The version parameter, which is mandatory, of the REGISTER command is used to make sure that the protocol version of the client is compatible. This is the snap::snap_communicator::VERSION value. If the versions of snapcommunicator and the client are not equal, then the registration fails.

Once done, a process should UNREGISTER itself.

REGISTER service=images;version=1

SERVICES list=<services>

Whenever snapinit connects to a snapcommunicator server, it sends it its list of SERVICES after it received the READY signal.

The list parameter is a complete list of comma separated services offered on this instance of Snap! Websites.

Backends should not be duplicated anywhere since only one instance can run within a given cluster. However, all the other parts are likely to appear on my computers (i.e. snapinit, snapcommunicator, snapserver.)

TBD: this may change in the future where any backend can be asked to run on a set of websites so multiple instances can run in parallel, only two instances cannot both be working on the same website.

SERVICES list=snapinit,snapcommunicator,images,sendmail

SHUTDOWN

Sending this message to the snapcommunicator requests for all the connected services to be shutdown. This will send a STOP command to all the registered services. It will also send the SHUTDOWN to its snapcommunicator neighbors, in effect shutting down the entire cluster.

The SHUTDOWN command is considered to be a high security command so it can be executed only if the user sending it is connected securely.

To stop one snapcommunicator instance, use the STOP signal instead. In this case, it will send the STOP signal to all the elements directly registered with it and quit. It will not send a STOP or SHUTDOWN message to any other snapcommunicator. It will send a DISCONNECT though.

SHUTDOWN

STATUS service=<name>;status=<up|down>;up_since=<date>;down_since=<date>;...

The snapwatchdog accepts this message to get the status of the snapcommunicator. Mainly this informs the snapwatchdog about who is connected and who is down. Since snapcommunicator can be told who is expected to be up on the same server, it can tell who is down that way.

The snapcommunicator sends HELP to each local service that REGISTERs. Services that implement the STATUS command are sent that command whenever the set of registered services changes in the snapcommunicator server.

STATUS service=images;status=down;up-till=123;down-since=123

STOP

This signal can be sent to any service to ask it to quit.

Backend processes should be programmed to check for the STOP signal at all time. So in a loop that could take minutes to run, one should check whether the STOP signal was received. If so, it should cleanly break out of the loop (in such a way that unfinished work should be restarted on the next run of this backend.)

When sent to the snapcommunicator itself (service = "snapcommunicator" or "") then the STOP command is forwarded to all the services directly registered with this snapcommunicator instance after a DISCONNECT event is sent to all the other snapcommunicator processes it is connected with in the cluster. That way we can hope that the other snapcommunicator processes will not forward any more messages to us at the time we diconnect.

You may request a specific service to stop by specifying the name of that service. For example, to stop the pagelist backend you could send:

pagelist/STOP

TICKETADDED object_name=<name>;key=<key>;timeout=<date>

The TICKETADDED message is the reply to the ADDTICKET message. Once enough snaplock instances (i.e. Quorum) registered a ticket, it is considered valid. Not that the sender is unique (the master) and is the onl one which decides whether the ticket is valid.

The object_name is the name of what is getting locked. In Snap, this is generally set to a URI of a page being worked on. This makes the lock quite specific (i.e. you may lock just one cell if you'd like to do so!)

The key is the name of the server name and PID of the process attempting a lock separated by a slash. This is managed internally.

TICKETADDED object_name=http://example.com/user/counter;key=my_server/123;timeout=145572827

TICKETLIST list=<info>

This command is the reply of the LISTTICKETS command. It include the list parameter which is the list of tickets currently defined in that snaplock instance.

The list of tickets may vary in each snaplock depending on when they started or if somehow they missed a message. You should be able to print that directly in your console.

The ticket identifier is the ticket number. Generally that number remains very small (around 1).

The object name is what is currently locked or is waiting to be locked.

The key is the server and PID of the process that requested the lock.

The timeout field shows when the ticket will time out.

TICKETLIST list=ticket id: 123  object name: \\"http://www.example.com/locked\\"  key: my_server/123  timeout 06/09/2016 18:49:25\n

UNKNOWN command=<command-name>

When you receive a command that you do not understand, you should reply with the UNKNOWN command. This way the sender has a chance to know and indicate in its logs that the command failed. This command should not be used with the command is understood but missused (i.e. invalid/unexpected parameters.)

UNKNOWN command=list

UNLOCK object_name=<name>;pid=<client pid>

In most cases the UNLOCK is sent to release a lock that was obtained by a LOCK command.

The object_name is the name of what was locked.

The pid is the process identifier of the caller (to be precises, it is the task identifier, we use the gettid() which returns the thread identifier in case you are using a thread to handle a lock it will still work as expected.)

UNLOCK object_name=http://example.com/journal/20160606;pid=123

UNREGISTER service=<name>

A service is about to quit and thus it wants to be unregistered. This command is not absolutely required, it is cleaner, though, to call it before the exit() function. It is the opposite of the REGISTER command.

UNREGISTER service=sendmail

How a snapcommunicator server forwards a message

A message may be for the snapcommunicator itself in, which case  it is worked on immediately. It is for a the snapcommunicator if the service parameter of a message is not defined ("") or is set to the name "snapcommunicator".

Other messages are forwarded using the following search mechanism to know where to send the message:

  • The named service exists in the list of services registered with this snapcommunicator; forward the message to this service
  • The named service exists in the list of services as send by the CONNECT/ACCEPT command of a remote snapcommunicator; forward the message to that snapcommunicator
  • The named service exists in the list of heard_of as send by the CONNECT/ACCEPT command of a remote snapcommunicator; forward the message to that snapcommunicator
  • The named service cannot be found at this time; queue the message for later, if the message was a PING, remove any previous instances (there should be 0 or 1)

Messages that get queued are checked each time we get a new REGISTER or CONNECT message.

A service, when it receives a message, is likely to ignore the service parameter of the message since it expects to only receive messages directed to it. If a process represents multiple services, then it would have multiple TCP connections and receive messages on a specific connection as expected. So again, no need to check the service name in that case. In other words, it could be an empty string and the forwarding can handle that.

Snap! Services

The following is a list of Snap! Services that are offered by core and existing plugins. Additional services may be added at any time and may not be listed here. The snapcommunicator has a command line option, --info, that can be used to see the list of connections available in that snapcommunicator and other snapcommunicators connected to this snapcommunicator. It can be used to discover all the services available in a cluster.

Service Name Commands Comments
images HELP,
LOG,
PING,
QUITTING,
READY,
STOP,
UNKNOWN
Run scrits against images whenever a new image is uploaded to the database.
pagelist HELP,
LOG,
PING,
QUITTING,
READY,
STOP,
UNKNOWN
Run list scripts to determine whether new pages are part of said list.
sendmail HELP,
LOG,
PING,
QUITTING,
READY,
STOP,
UNKNOWN
Generate emails and send them to registered users.
snapbackend HELP,
LOG,
PING,
QUITTING,
READY,
STOP,
UNKNOWN
Run various backend processes on a specified schedule (every 5 minutes by default.) This is a CRON like set of tasks.

We use this tool instead of the operating system CRON because this way we can be sure we can send the tool a STOP signal when we are asked to stop the Snap! services.
snapserver HELP,
LOG,
QUITTING,
READY,
STOP,
UNKNOWN
A front-end snapserver which handles connections from clients through Apache2 and snap.cgi.
snapdbproxy HELP,
LOG,
QUITTING,
READY,
STOP,
UNKNOWN
This service connects to snapcommunicator as it is managed by snapinit. However, to connect to the database, you want to use the snapdbproxy port 4042. This port uses a completely different protocol which is described somewhere else (TODO!)
snapcommunicator ACCEPT,
CONNECT,
COMMANDS,
DISCONNECT,
GOSSIP,
HELP,
LOG,
QUITTING,
REFUSE,
REGISTER,
SERVICES,
SHUTDOWN,
STOP,
UNKNOWN,
UNREGISTER
A message sent to the snapcommunicator itself. May use "" as the service name. The snapcommunicator service is the brain for communication on each computer and thus it understands many more messages than any other service. Especially, it has means to connect, disconnect, and gossip with other snapcommunicator servers.

Note that you may send any command to the snapcommunicator when properly marked with a service name (i.e. "sendmail/PING".) Those messages are not included in the list of messages that the snapcommunicator understand since it only forwards them to the specified service ("sendmail" in that last example).
snapfirewall BLOCK,
HELP,
LOG,
QUITTING,
READY,
STOP,
UNKNOWN
A backend system that runs on all frontend computers in order to block unwanted clients.

The BLOCK event is used by systems that detect an unwanted client.
snapwatchdog LOG,
QUITTING,
READY,
STATUS,
STOP,
UNKNOWN
The watchdog process checks the health of a computer.

It checks the CPU, memory, disk usage, and various processes.

It receives STATUS messages from the snapcommunicator to know more about the network health.

Known Bugs

Our current snapcommunicator server implementation is really dump and even though we have a GOSSIP capability (and also offer neighbor information on ACCEPT, CONNECT, REFUSE) we could end up with two sets of computers that never talk to each others.

Message Sequence Chart

The following is a chart of snapinit, snapcommunicator, snapserver, various snapbackends, neighbor snapcommunicator instances (remote, on other computers), and snapsignal. Click to see a larger version of this image (and possibly see the arrows!).

Messages between snapinit, snapcommunicators, snapserver, snapbackend, snapsignal

Snap! Websites
An Open Source CMS System in C++

Contact Us Directly