A C++ library to access Cassandra servers
The libQtCassandra library is an advanced C++ library used to access Cassandra servers in C++.
Contrary to the basic Cassadra server interface, this C++ library brings you separate objects that handle each level of the server data, i.e. the cluster, contexts, table, rows, cells.
Details for developers can be found on the reference pages (a 100% complete Doxygen documentation of the library including source code and working examples.)
You may also want to refer to the Installation Instructions to get Cassandra on Ubuntu page on how to install Cassandra on your Ubuntu server. Note that the libQtCassandra library works under MS-Windows as well.
You have 2 requirements to compile the library:
Obviously, I suspect you're using Linux. Other Unix systems should be capable of compiling the code. MS-Windows, I have no clue, but I would imagine so. I'll gladly accept comments and patches (questions, comments, patches, anger should be posted on SourceForge.net instead.)
You can download the source code from SourceForge.net.
We also offer ready to install packages on LaunchPad: Snap CPP (for Ubuntu users.)
Search for the libQtCassandra entries under References.
The documentation is also found inline in the .cpp files of the project. Corrections to the documentation are very welcome!
The following are the main features of libQtCassandra:
When connecting to a Cassandra Cluster, one should always use the proxy server. This is a small front end that allows your software to connect to a Cassandra Cluster opposed to just connecting to a Cassandra Node. The concept is pretty simple: if you directly connect to a node, then the connection may fail because that specific node may be down when the cluster as a whole may still be working just fine.
The Cassandra Proxy Server resolves that problems by connecting to multiple Cassandra Nodes and maintaining statistics about the whole cluster. It is capable of connecting to additional nodes when it loses connections to existing nodes. It can also load balance requests so you avoid hitting nodes that are really busy.
The concept is simple, your server connects to the proxy server, which we expect is running on the same local network, and sends normal Cassandra requests to that proxy. The proxy passes your requests to one of the Cassandra nodes it is already connected with and returns the answer as is from Cassandra, so in effect the proxy is transparent to your process.
Adding an additional network connection can cause some slowness, but since this is a local network connection, it should still be really fast (time to copy data buffers in memory,) especially because this one connection does not require any encryption.
To recompile libQtCassandra you need cmake, make, g++, and all the dependencies (Cassandra, thrift, Qt...)
The instructions to compile the library are something like:
tar xf libQtCassandra-0.5.0.tar.gz mkdir BUILD cd BUILD cmake ../libQtCassandra-0.5.0 make make install
For other targets that you can build, try "make help".
If you cd back out of the BUILD folder, you can use "make -C BUILD" to run make inside the BUILD folder.
Although you can directly run cmake in the libQtCassandra directory, if you plan to do any development work in that folder, I strongly advice for the creation of a separate build folder so you don't mix original source files and generated files.
Patches, problems, please report on the SourceForce.net page.
The latest source published on SourceForge.net (tarball) has a really bad problem in the content::clearCache() function. It will clear the tables from memory, but it will completely lose track of all the tables, even those that still exist in Cassandra. Version 5.5 has a fix. You can find the code in the SourceForge.net project under Code (in git). We also offer a pre-compiled version for Ubuntu via launchpad.
Got a project now officially using the libQtCassandra library? Let us know and we'll include a link to your project/product here.
Building CXX object src/CMakeFiles/QtCassandra.dir/QCassandra.cpp.o Cassandra.h:15:24: fatal error: TProcessor.h: No such file or directory
The thrift library needs to be compiled to work with C++. When it configures (look at the output) it tells you which language extensions it creates that version for. If you don't see C++ selected (...: yes), then you will get that error saying that TProcessor.h cannot be found.
This happens because some dependencies are missing. Unfortunately, I do not know exactly which dependencies. I will update this entry as I discover such. Also we intend, at some point, to detect missing dependencies and generate a clear error instead of moving forward and having strange compilation errors.
As you start build, the configure script of the thrift library should be run by cmake. At some point the output should include something that looks like the following. This includes the thrift library version and the generators. You must have at least C++ indicated. If not, the configure script could not find your C++ compiler and thus skipped on it.
thrift 0.8.0 Building code generators ..... : Building C++ Library ......... : yes Building C (GLib) Library .... : no Building Java Library ........ : no Building C# Library .......... : no Building Python Library ...... : no Building Ruby Library ........ : no Building Haskell Library ..... : no Building Perl Library ........ : no Building PHP Library ......... : no Building Erlang Library ...... : no Building Go Library .......... : no Building TZlibTransport ...... : yes Building TNonblockingServer .. : yes
It is good if you have the TZ included since it will compress the data being sent on the network.
You may find a need for other systems that are required for your environment to function as expected. For example, if you have heavy needs for locks or work that needs to be serialized, then you may want to look into getting Apache ZooKeeper (a C library implementing a barrier (lock) and a queue.)
There are many solutions for locks. I think that the one most often referenced is Apache ZooKeeper, but there are other solutions depending on your needs. Search around before making a decision.
For us, we use snapdb, a daemon that comes with Snap! At first we had a lock object in the libQtCassandra library, but that system does not work when you may end up sending orders to any number of Cassandra nodes. Actually, our lock mechanism opens a single connection to communicate with a single, specific snaplock daemon. Our implementation has very little in limitations, outside of the fact that you have to process one lock with one specific snapdb. For example, we do not have a centralized lock master node. If you are running 5 instances of snapdb, all 5 participate in the locking ability. If one goes down, the lock still continues to work with zero downtime. We could not find another external lock tool that had such a feature.
Right now our snapdb daemon makes use of our snapcommunicator communication system, but we plan to have a version that is a standalone and can be used with any project.
This version uses CQL instead of thrift. However and although it works great for us, it is not that well adapted to the outside as the previous version was. We are still weighing the pros and cons of whether we want to publish an official version of 0.6+. You may get the source code though.
I found a potential problem in the lock mechanism where a crash (abort, segmentation fault) or a process being killed (kill -KILL <pid>) could leave a lock remain in the database that would last forever preventing further locks with the same object name. I now put a TTL on the entering key so if such a crash occurs the key still disappears within seconds.
Bumped copyright noticed to 2016.
Various clean ups.
Note: the jump in version is due to the fact that I did not post source packages for a while on SourceForge.net; it is also due to our nightly build which first was incrementing the wrong version number.
Added support for a regular expression to filter rows as they are read from the lowest level. This is through the row predicate class. This feature is SLOW but can be useful in special cases where you do not have an index and will not be running such requests over and over again.
Changed all shared pointers from the Qt version to the std version so that way we can properly make use of weak pointers.
Make use of the controlled_vars enum capability and avoid many casts.
Tweaked the CMakeLists.txt to define the system headers as such.
Repaired some warnings that the newer version of g++ generated.
Added a fix to clearTable() so it works as expected.
Documented the fact that QMap sorts from top to bottom even when reading data with the Reverse flag turned on.
Removed some debug code.
Compiled with version Cassandra interface version 2.0.1 and thrift version 0.9.0.
Fixed bug with QCassandraRow::exists(), I needed to test the return value of a function.
Created a Debian compatible changelog file.
Read consistency level can now be specified.
Added a synchronization function which is necessary if you are working on a cluster (more than 1 node) and want to create or change the schema to a context (table/column definitions.)
Updated all the tests so they also can work on a cluster of 3+ nodes.
Added a QCassandraValue test out of which I fixed the comparison operators (<, <=, >, >=).
Added support to use an index with QCassandraValue buffers (read-only right now.)
Moved the byte array data reads to the global scope.
Added Bool support to the QCassandraValue class.
Fixed the findContext() so it loads contexts first if not loaded yet.
Fixed the disconnected() so a QCassandra object can now be reused properly.
Fixed the snitch function which now returns the snitch (instead of the protocol version).
Fixed two use of column keys that would use a QString instead of a QByteArray (i.e. a null would inadvertendly end the column key.)
Fixed the dropCell() so it doesn't attempt to read the cell first.
Fixed the CMakeLists.txt so the libQtCassandra library is linked against the thrift library (so your tools do not have to know about thrift directly.) Also removed references to the boost_system library.
Reviewed the SSL connection capability. It is still not considered to be working but the password can now be specified from your application.
Updated documentation to be more accurate and define some missing entries.
Added direct support for QUuid as row and column keys.
Added direct support for char * and wchar_t * so we do not have to first cast strings to QString everywhere.
Fixed bug testing row key size to limit of 64535 instead of 65535.
Added a test as row and column keys cannot be empty. It will now throw an error immediately if so.
Updated some documentation accordingly and with enhancements.
Added a first_char and last_char variables (QChar) in column predicate which can be used to define "[nearly] All column names".
Fixed the names of two functions: setFinishColumnName() and setFinishColumnKey() are now setEndColumnName() and setEndColumnKey() respectively (as documented and so it matches the getters.)
Added support for indexes defined with columns. The column predicate now has a setIndex() function and that allows you to call readCells() repititively until all the columns matching the predicate were returned (very similar to reading a large set of rows.)
Fixed a few things in the documentation.
Added support for composite columns. It was functional before but with knowledge on how to build the column key which is actually quite complicated (okay, not that hard, but libQtCassandra is here to hide that sort of thing!) Use the compositeCell() function of your QCassandraRow objects.
Added support for counters.
Fixed several usage of keys so 0 bytes works as expected. (in getValue() and insertValue())
Small fixes to documentation.
Fixed the QCassandraTable::readRows() so it automatically updates the row predicate with the last row as the new start key. This is very important because the rows returned to you get sorted by key in the table, whereas, in Cassandra they are not sorted that way at all. (At least not by default when you use the RandomPartitioner which is very likely.)
Fixed the QCassandraContext::descriptionOption() which would create empty options when the sought option did not exist in the context.
Upgraded the version of Thrift to 0.8.0. There are some problems with the output of the thrift command line option (some missing #include and invalid references.) I fixed the generated code as required so it compiles and the result works as expected.
Made updates to the code so it works with version 1.1 of Cassandra. This includes proper support for the replication factor which was deprecated as a direct field in the KsDef structure. The other deprecated fields are simply ignored at this point (those are in Tables, see CfDef in interface/cassandra.thrift of Cassandra 1.1)
Fixed replicateOnWrite() which now returns the expected value.
Fixed all the context and table get...() functions so if the value is marked as unset, empty or zero is returned instead of the current value saved in the object (which may not reflect what the database is defined as.)
Added the million_rows test to ensure we can create over 1 million rows and read them back. At this time, in my environment, it often crashes the Cassandra server... Java problems?
Added functions that return the partitioner and snitch information from the cluster.
Fixed QCassandraContext::prepareContextDefinition() which would force the replication factor to 1 instead of the user defined value.
The CMakeLists.txt now properly defines the folder where the compiled thrift library lies so it can link with it in the standalone version of the library.
Fixed the size of the buffer used to save 64 bit integers.
Fixed the size of integers used to handle floating points.
Fixed the double being read as 8 bytes and somehow converted to a float instead of a double.
Fixed the test of the string set in a value to limit the UTF-8 version of the string to 64Mb (instead of the number of UCS-2 characters held by a QString.)
Enhanced documentation about the findRow() and findCell() which do not look for a row or cell in the Cassandra system, it only checks in memory!
Better support older versions of g++ (4.1 cannot properly cast the controlled variables for enumerations) -- thank you to John Griswold for reporting the problem.
Added some missing documentation.
Enhanced the cmake scripts to make it even easier (find/use Qt, Thrift) and thus I jumped to version 0.4.0 because this is a pretty major change from 0.3.x
Removed the Qt sub-folder names from #include.
Made the getValue() function return false so we can know when it fails and react accordingly.
Fixed the use of the slice predicate and ignore the strings null terminator as they ought to be (i.e. a key can include a nul character.)
Added some try/catch to avoid a certain number of fairly legal exceptions (i.e. missing value or column.)
Removed all unwanted files from source package using a CPACK option.
Strip folder name from documentation to make it smaller.
Updated all copyrights to include 2012.
Fixed the creation of a row predicate as it wasn't defining a column predicate which is necessary when we call readRows() with the default parameters on a table.
Added support for installation targets and generation of binary packages.
Added a dropContext() in the QCassandra object.
Added proper unparenting of the context and table classes.
Started to make use of the Controlled Variables (requires 1.3.0 or better.)
The following are things we intend to add at some point: