Cassandra silently failing...

As I was working on the Lock implementation in our C++ Cassandra library, I ran in a rather weird problem. The test would fail as many processes would not obtain the lock in time.

Looking at what was happening, even though I use QUORUM as the access consistency level, I could see that some of the test processes would attempt a read of the table and get nothing (0 columns returned!) even though the other 6 or 7 processes already wrote their information in the database.

After looking for a while, I finally found out that the problem was not Cassandra per se, nor the JavaSDK, nor the Cassandra library. The problem was the limit on the number of files that I had permission to open on the operating system. The limit was at 1024 and even when bumped at 3000 I would still get an error.

Instead, I bumped it to 32768 and now it works! The number of files is changed with ulimit.

I found a couple of sources talking about this problem:

https://askubuntu.com/questions/181215/too-many-open-files-how-to-find-the-culprit
http://lj4newbies.blogspot.com/2007/04/too-many-open-files.html

Neither was in link with Cassandra but both talked about the "too many files error" that I would get when attempting to look at the logs with

tail -F ...cassandra/log/system.log

Hard to find a problem when no errors get logged about it. I still wonder where it really breaks because Cassandra (or Java?) doesn't seem to be aware of any problem (nor my library.)

Snap! Websites
An Open Source CMS System in C++

Contact Us Directly