libQtCassandra: Main Page

libQtCassandra 1.0

Cassandra System

Explained

The Cassandra System comes with a terminology that can easily throw off people who are used to more conventional database systems.

This library attempts to hide some of the Cassandra terminology by offering objects that seem to be a little closer to what you'd otherwise expect in a database environment.

One Cassandra server instance runs against one cluster. We kept the term cluster as it is the usual term for a set of databases. Writing this in terms of C++ array syntax, the system looks like a multi-layer array as in (you can use this syntax with libQtCassandra, btw):

   cluster[context][table][row][column] = value;
   value = cluster[context][table][row][column];

Note that in Cassandra terms, it would look like this instead:

   cluster[keyspace][column_family][key][column] = value;
   value = cluster[keyspace][column_family][key][column];

One cluster is composed of multiple contexts, what Cassandra calls a keyspace. One context corresponds to one database. A context can be setup to replicate or not and it manages memory caches (it includes many replication and cache parameters.) We call these the context because once a cluster connection is up, you can only have one active context at a time. (If you worked with OpenGL, then this is very similar to the glMakeCurrent() function call.)

Although the libQtCassandra library 100% hides the current context calls since it knows when a context or another needs to be curent, switching between contexts can be costly. Instead you may want to look into using two QCassandra objects each with a different context.

Different contexts are useful in case you want to use one context for statistic data or other data that are not required to be read as quickly as your main data and possibly needs much less replication (i.e. ONE for writes and ALL for reads on a statistic table would work greatly.)

One context is composed of tables, what Cassandra calls a column family. By default, all the tables are expected to get replicated as defined in this context. However, some data may be marked as temporary with a time to live (TTL). Data with a very small TTL is likely to only live in the memory cache and never make it to the disk.

Note that the libQtCassandra library let you create table objects that do not exist in the Cassandra system. These are memory only tables (when you quite they're gone!) These can be used to manage run-time globals via the libQtCassandra system. Obviously, it would most certainly be more effective (faster) to just use globals. However, it can be useful to call a function that usually accesses a Cassandra table, but in that case you dynamically generate said data.

A table is identified by a name. At this time, we only offer QString for table names. Table names must be letters, digits and the underscore. This limitation comes from the fact that it is used to represent a filename. Similarly, it may be limited in length (OS dependent I guess, the Cassandra system does not say from what I've seen. Anyway it should be easy to keep the table names small.)

Tables are composed of rows. Here the scheme somewhat breaks from the usual SQL database as rows are independent from each others. This is because one row may have 10 "columns," and the other may have just 1. Each row is identified by what Cassandra calls a key. The key can either be a string or a binary identifier (i.e. an int64_t for example.)

The name of a row can be typed. In most cases, the default binary type is enough (assuming you save integers in big endians, which is what the libQtCassandra does.) This is important when you want to use a Row Predicate.

Rows are composed of cells. Cassandra calls them columns, but in practice, the name/value pair is just a Cell. Although some tables may define column types and those cells (with the same name) will then be typed and part of that column.

A column is a name and a value pair. It is possible to change the characteristics of a column on how it gets duplicated, cached, and how the column gets compared with a Column Predicate.

The name of a column can be a QString or binary data. It is often a QString as it looks like the name of a variable (var=<value>).

The row and column names are both limited to 64Kb. The value of a column is currently limited to 2Gb, however, you'll need a HUGE amount of memory (~6Gb+) to be able to handle such large values and not only that, you need to do it serially (i.e. one process at a time can send that much data or the memory will quickly exhaust and the processes will fail.) It is strongly advised that you limit your values to Mb instead.

By default, the QCassandra object checks the size with a much small limit (64Mb) to prevent problems. At a later time, we may offer a blob handling that will save large files by breaking them up in small parts saved in the Cassandra database.

Syndicate content

Snap! Websites
An Open Source CMS System in C++

Contact Us Directly