Unique Numbers with Cassandra

Tue
07/24/12

When using an SQL environment, it is very easy to create a new row of data with a unique number. You use a sequence identifier and your SQL system generates the numbers for you.

With Cassandra, there is no such thing as a unique number. There are facilities to count (add and subtract) but no unique number. This is actually very important to let the system go as fast as possible as it is expected to do. Because the only real way to create a unique number is to either make use of a single computer which generates those numbers (be it with an SQL database or another system) or to have all the computers locked when requiring a new unique number.

The idea of using Cassandra is to get the work done on many computers instead of just one. This means a computer can run as fast as possible and never be locked because of another computer (not only because the other computer could be heavily in use, but also because it could be done!)

Therefore we still have a problem and that problem is to get a unique identifier on any one computer. We solved the problem in Snap! using two things:

1) Each computer must be assigned a unique name (this is a Snap! name so it can be anything other than the hostname)

2) Each computer makes use of a standalone file that includes a number that we increment each time a number is requested; that file gets locked while being updated so only one process can access it at a time; the numbers start at 1.

The process becomes very simple:

Snap! Server Initialization
- Load Snap! settings, including the unique computer name1
- Check whether the unique number file exists, if not create it
Server now runs
Plugin requires a unique number, call the server function that:
- Lock the counter file
- Read the counter file
- Increment the number
- Write the new value in the counter file
- Unlock the counter file
- Return the value that was just written concatenated with the computer name read from the settings (i.e. "serverA-123")

Since the counter file will be and remain very small (8 bytes) it will be kept in memory by the operating system which means that it will be extremely fast to use it.

The final name includes the computer name which makes the resulting identifier unique. For example, the identifier could be a number and the separator a period and you'd get a number that looks like a decimal number: "123.91".

At this point these numbers are used by links that need such a number when multiple links are accepted (i.e. "many" in any one of those schemes: many to one, many to many, one to many.)