Snap! Websites
An Open Source CMS System in C++
In order to keep our code as easy as possible to maintain and enhance, we request that all code submitted for the Snap! Server be written in a very specifc way as described below. Being very consistent very much ease reading each others code. The most important aspect is indentation. Definitively make sure your indentation is correct! Although we encourage you to submit patches and plugins of your own making, we require that this coding standard be followed before you submit anything or it is likely that it will be rejected (we do not have the time to do the formatting for you and if we don't need your plugin...)
Yes. We'll be Nazis on the Coding Standard! And there are quite a few rules as you'll see below.
IMPORTANT NOTE: The Snap! system incorporates different libraries such as libQtCassandra and libtld. These do not follow this coding standard. The Qt based libraries try follow the Qt coding standard as closely as possible. Other libraries are generally very small and were generally written using the K&R standard. It is close to what is defined here except for the position of curly brackets.
Your code must be properly indented. This means each time you open a curly brace, the content appears one tab further to the right.
Our tab key adds 1 to 4 spaces. This works well with all the diff tools you may encounter, contrary to using actual tab characters which vary from tool to tool and often cannot be modified.
int my_func(int param1) { if(param1 == 3) { do_something(); } }
The vim equivalent is: set ts=4 sw=4 et
Note that at the end of each file we include the following for vim:
// vim: set ts=4 sw=4 et
This allows the vim editor to immediately know how to handle your tab key and the indentation (shift) key strokes (<< and >>).
The position of curly brackets ({ and }) must follow your indentation perfectly. The opening and closing brackets must be in the exact same column. It should be very noticeable if that is not the case. Just pay attention!
Functions not declared within a class must be written starting in the very first column. That means the very first openting curly bracket of a function is in column 1 (and yes, we assume that the very first column is column 1.) There are two reasons for this: (1) it follows the indentation mechanism very closely, and (2) it allows vim users to jump to the beginning ([[, ]]) and end ([], ][) of functions. You wouldn't believe how much time you can save with those keystrokes!
int my_func(int param1) { if(param1 == 3) { do_something(); } } int my_array[] = { 3, 4, 7, 9 }; foo my_struct[] = { "string", 123, 3.0f }; class bar : public base { public: bar() : base(123) { } };
As you can see, within a class definition, we indent the entire function definition. This means the vim movement to the beginning and end of functions doesn't work. However, we can still go to the beginning and end of the class as a whole.
All our functions, classes, variable names, namespaces are in lowercase with words separated by underscores: cool_function, nice_variable, name_vector; no polish notation (pMyVar, bFlag), no CamelCase. Although we do use a polish like notation for variables (see below) we do not use it to represent a type, but instead to categories our variables.
Static constant variables are an exception in that they replace their C #define variable definitions. Because of that, we declare them in all caps, although words are still separated by underscores: SNAP_NAME_CONTENT_BODY.
Enumerations are viewed as being very similar to C #define variables and as such are written in all caps (SNAP_NAME_CORE_RULES).
There is one exception with enumerations: if you include a maximum value for an enumeration, the word max will be written in lower case. For example:
enum lexer_error_t
{
LEXER_ERROR_NONE,
LEXER_ERROR_INVALID_STRING,
LEXER_ERROR_INVALID_C_COMMENT,
LEXER_ERROR_INVALID_NUMBER,
LEXER_ERROR_max
};
Note: as shown below an enumeration is a type and therefore its name ends with "_t". This allows us to detect and show it as a type.
We add an _t at the end of typedef declarations. This way we can simply distinguish types from variables. Make sure you never end the name of a variable with _t (or any other one letter character.)
enum name_t { SNAP_NAME_LAYOUT_THEME, SNAP_NAME_LAYOUT_LAYOUT };
Vim users can use the following declaration to transform all such types to a different color to quickly distinguish types:
syn match cppType "[A-Za-z_][A-Za-z0-9_]\+_t\>" syn match cppType "\<[A-Za-z_][A-Za-z0-9_:]\+_t\>"
Also we try to avoid macros, in some cases it is either not possible to do things without a macro, or it requires a lot of extra work. For example, we use macros to transform numbers into strings and to quickly and systematically declare Snap! plugins.
Macro names are in all capitals. For example our plugins are defined with:
SNAP_PLUGIN_START(my_plugin, 0, 1) SNAP_PLUGIN_END()
Macros are documented the same way as functions, however, they are documented in the header file where they are declared as Doxygen doesn't (yet) support pre-processor references.
We do allow ... in macros, however, if no parameter is an option then you need to declare a special macro that does not accept parameters (i.s. see the SNAP_LISTEN0() definition.)
Each plugin defines its own namespace inside the snap namespace.
namespace snap { namespace my_plugin { class my_plugin : public plugins::plugin { public: [...] }; } // namespace my_plugin } // namespace snap
This example illustrates a squeleton declaration of the plugin named my_plugin.
Notice that the closing of a namespace is always followed by a comment naming the namespace being closed.
If your class require non-public declarations, write them inside a details namespace (3rd level). Details are NOT to be used by users of your plugin.
All the #include are at the top of the file, after the license.
The order of the #include headers is expected to be as follow:
The poison header is included last because the 3rd party headers may make use of the poisoned functions and there is nothing we can do about it.
It is common practice to separate each group of #include with an empty line.
The constructors follow a peculiar style that puts the colon and commas in the same column on the left side. This is very practical when you want to comment an entry! Think about it... it works seemlessly! (unless you comment out the line with the colon, of course.)
class foo { public: foo(std::string const & str, std::vector<std::string> const & vec, controlled_vars::zint32_t const value) : f_str(str) , f_vec(vec) , f_value(value) //, f_name("") -- auto-init { } };
We want all the variables of a class to appear in all the constructors. Although you may forget some and that's safe, we want to include everything including all the variable members that will automatically be initialized. Those are declared commented out followed by "-- auto-init". There are two reasons for that practice: (1) it shows you did not forget to include all the variable members, and (2) it documents the default value to the reader.
Default integers and floating points: the default constructor of an integer and a floating (and the Boolean type) sets the number to zero. So you may write the following and get zero in that variable member:
// DO NOT USE! class foo { public: foo() : f_value() { } private: int f_value; };
This is really not clear. Instead we want you to write the initializer with the actual default value. Also, since we want to make use of controlled variables for all basic types, we would instead write:
class foo { public: foo() //: f_value(0) -- auto-init { } private: controlled_vars::zint32_t f_value; };
Now this, in comparison, is crystal clear.
We see several kinds of variables:
These are rare but we are not against globals. However, in most cases these will be declared as static variable members of your classes. We do not make use and do not plane to use any threading capabilities so we do not require mutex protections of globals.
All variables names that represent a global, including the static variables of classes, must start with "g_". For example, the global foo would be written:
class foo { private: static int g_foo; };
Note that a static within a function is always viewed as a global variable:
int *get_that_pointer() { static int * g_ptr(new int); return g_ptr; }
Vim users want to use a declaration as follow to clearly distinguish global variables from others:
syn match ptGlobal "\<g_[A-Za-z0-9_]\+\>" hi ptGlobal guifg=#333388
Class and structure variable members are fields, as such all of them must start with "f_". We do insist on this one because reading a function with variables that do not follow a strict scheme is very difficult. Not only that, you are much more likely to shadow variables improperly when not using a prefix.
class foo { public: foo(); private: int f_value; }; struct rgb { unsigned char f_red; unsigned char f_green; unsigned char f_blue; };
Vim users want to use a declaration as follow to clearly distingish variable members from others:
syn match ptField
hi ptField guifg=#883333
We do not enforce (youpi!) any specific syntax for parameters. They should however use as descriptive names as possible when appropriate (i.e. rhs is okay in an operator, a and b are okay in a function such as cmp(int a, int b);, otherwise, use long names composed of words.)
Try to keep the same name in the class declaration and in the implementation of the function. Unfortunately C++ does not enforce such... however, it is definitively not clear if you write something like this:
// .h file class foo { public: int my_function(char * name, char * key); }; // .cpp file int foo::my_function(char * key, char * name) { [...] }
In this example you can see that we swapped the variable definitions. Most programmers will make use of your headers to know how to call your function. They'll get a surprise on that one, won't they?
Local variables should be declared at the location they are going to be used. We do not impose any specific naming convention. Expletive names are better in many cases, but an index named i or j are just fine. However, if all your variables are one letter names you may have to ask yourselves whether your function is just a few loops doing nothing...
Note that we compile using the -Wshadow command line option. This means none of your variables can shadow another or a function member. For some classes, this means you cannot use basic variable names such as name or value.
When defining a reference with the & and a pointer with *, add at least one space before / after the operator as in:
func(reference & foo, pointer * blah);
This is useful now that we place the const keyword at the right place.
The English language is often in the way of proper declarations in programming languages as the terms are often getting inverted. For example, if you declare a set of brush colors, you are likely to write:
int const RED_BRUSH = 0xFF0000; int const GREEN_BRUSH = 0x00FF00; int const BLUE_BRUSH = 0x0000FF;
Reading those variable names make sense in English, it is rather broken in programming though. What is most important (common to all) should appear first and here it is BRUSH. So you should use BRUSH_RED, BRUSH_GREEN, and BRUSH_BLUE instead.
The const keyword is affected by the same drawbacks. In English you are more likely to say a constant integer (const int) when in c++ programming, you want to view it the other way around so it makes sense when handling pointers and references which are expected to be read from right to left, hence the definition presented here: int const READ_BRUSH ....
The variable members of classes must all auto-initialize themselves, preferably to the correct default value. Thefore, all classes that make use of basic types must define their member variables with controlled variables.
For example, a class with a 32bit integer would look like this:
class foo { public: foo(); private: controlled_vars::zint32_t f_value; };
f_value is ensured to be zero on initialization of the class.
If the default value is not zero and you do not want to write your own typedef and declare your own type, then you can always use the mandatory (need init) variables. These force you to initialize the variable member in each and everyone of your constructors.
controlled_vars::mint32_t f_value;
...
foo::foo()
: f_value(123)
{
}
foo() would not compile without the f_value(123) entry because f_value was declared as a mint32_t type. The "m" stands for "mandatory" meaning that an initializer other than the empty initializer is required.
Every single variable member is a private variable member. You may return a reference to it if that's acceptable by your class although in most cases it is expected to be a const.
Protected and public declarations are reserved for functions, enumerations, static const variables, and sub-classes.
Very rarely do you need to declare a structure. Remember that if someone else can change your data then you have zero control over what goes in and what goes out of those variable members. You might as well write pure C code as everyone will know that no one has full control over anything... (sorry, that sounds like a rant!)
Add the necessary getters and setters and make sure that data passed to setters is valid. Although variables that are saved in controlled variables that include boundaries (limited variables) generally do not require additional handling as long as the controlled variables feature is turned on.
class foo { public: int foo_public_leak; // not allowed protected: int foo_protected_leak; // not allowed private: int foo_perfect_var; };
All casting must use C++ casts. We should nearly never (if ever) see a reinterpret_cast<>() in your code. Those are used to cast from one type of pointer to another and there is generally no reason for doing such a thing. If you cannot static_cast<>() or dynamic_cast<>() your pointers then tehre is probably something wrong in your design.
Actually, if you write proper high level C++ code (and for a Snap! Server plugin you do not need to write low level code) then the only casts you need are dynamic_cast<>() and rarely but it happens... const_cast<>().
We use the -Wold-style-cast so they are actually prohibited (because we also have -Werror.)
It happens that a library header that you include makes use of an old cast. In that case, use the GCC #pragma syntax that allows you to ignore these errors in those headers:
#pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wold-style-cast" #include <broken-header.h> #pragma GCC diagnostic pop
Note that at this point these pragmas are often necessary for Qt headers.
There are two main cases of code path that are not followed logically but are not self explanatory when written in C++ and not commented on in any way.
First of all, you often flow through in switches. In that case, add a comment /*FLOWTHROUGH*/. This makes it clear that the intend was indeed to have this case flow through that case.
switch(type) { case TYPE1: ... [do something] ... /*FLOWTHROUGH*/ case TYPE2: ... [do something] ... break; case TYPE3: ... [do something] ... break; }
The comment helps the reader. Obviously you can explain why you flow through, but in general it will be good enough to show that was the intend.
When you call a function, you can explicitly mark it as a "never return" function. This means anything after that function will be ignored. We want you to add a NOTREACHED(); macro after every such call to (1) clearly mark that the code won't be reached and (2) if it were to happen, then we'd immediately catch the problem as the function calls abort().
For example, the snap child process calls the execute() function which exits instead of returning calling the exit() function:
void snap_child::run()
{
... [some code] ...
execute();
NOTREACHED();
}
If you are outside of the snap namespace, you may need to fully qualify the function: snap::NOTREACHED().
It is very important to view any kind of input data as tinted. This means it is always a security risk to use any data that comes from the outside without first making sure it is valid. You may assume that your current state is correct, but you cannot assume that a programmer will call your functions with only valid parameters.
A good example on a web server is HTML content. In HTML you can include JavaScripts with the use of the <script> tag. In most cases users would just want to make nice things possible like scroll a box or fade in an ad. However, black hats are in search of your bank account credentials. This can be obtained with well placed and advanced JavaScript code. This means you do not want to blindly accept HTML content on a web server. Only trusted sources can be allowed to include HTML content as is (without filtering). Other users such as completely anonymous users accessing your site must be viewed as a potential danger. Content submitted by such people can be parsed for things such as <script> data to be removed from the content.
This example shows a much higher level where content can be dangerous. At a lower level, you have similar reasons for checking parameters. As I said before, if you have a value such as a priority which can be defined between 0.0 and 1.0, then when a function called SetPriority() you should only allow the user to change the value to a number from 0.0 to 1.0.
void sitemapxml::url_info::set_priority(float priority) { if(priority < 0.001f) { priority = 0.001f; } if(priority > 1.0f) { priority = 1.0f; } f_priority = priority; }
Note that a decimal number such as 0.001 is a double, not a float. This means if you write an expression with that number, the whole set of values will first be converted to double, the computation done, then converted back to a float and saved in memory.
float f; f = f * 3.0; // convert back and forth to double f = f * 3.0f; // no conversion, math is done in floating point
The behavior of the compiler may vary on a simple case like this one but in many cases it is not possible for the compiler to know whether just floating points are involved. So what we're asking here is for you to add the 'f' at the end of any number that represents a float (opposed to all those numbers that represent doubles.)
Derivations are usually all written on the sale line as the class. If you have a long list, then break it up with one entry per line. You are of course allowed to always use the long form.
// short form derivation class foo : public interface // long form derivation class foo : public interface , protected very::long::name::details::to_be_derived , ... //, private changed_my_mind_not_needed_now
Avoid inlining functions. There are two problems with that: (1) most compilers today can inline functions that are one line or two whether or not written inline; and (2) functions tend to grow over time and programmers tend to NOT moved them to the .cpp files.
The one main exception is for empty virtual destructors. Most often those are created just to make sure that the virtual table remains fully valid.
class foo { public: foo(); virtual ~foo() {} };
We have exceptions when we write some private sub-structures. In those we at times define the entire function body. In those cases the structures are defined in a class in your .cpp file.
We try our best to document functions that we write. Library functions (classes) are generally much more documented than tools.
All documentation goes in your .cpp files. Putting documentation in your headers completely obliterate the possibility for a programmer to see your interface at once (Except in HTML in your Doxygen output, but that's nor practical when I'm in my favorite editor!)
There are very few exceptions when Doxygen just cannot understand a declaration reference in your .cpp file and thus that documentation needs to be in the header no matter what. These cases are really rare.
The documentation must be complete. All the functions, type, variables need a documentation description. When Doxygen is done, it shall not report any warning.
/** \brief My foo function does foo all day.
*
* Here I can describe my function. This is cool, it fooes well.
*
* \param[in] index The index to be used by foo.
* \param[out] result The result of the function.
* \param[in,out] status The current status, may be changed by function.
*
* \return This function returns true if the process succeeded, false otherwise.
*/
Parameters must be listed in the correct order and include the [in], [out], or [in,out] descriptions to be complete.
If the function returns a value, it must be defined in a \return parameter.
Note that you do not define the types of the parameters and return value. Doxygen extracts that information from your function declaration.
If there are important thing to be done to finish up your class, use a \todo block (one per task that remains.)
We only accept files that include the exact same license as we offer (i.e. GPL.) All text files (headers, implementation files, XML, JavaScripts, CSS, XSLT, CMakeLists.txt, etc.) must include a reference to the license. Look at existing files and make a copy of the existing reference.
There is an example. Obviously you want to change the copyright notice with your dates and company or individual name.
// Snap Websites Server -- <describe file briefly> // Copyright (C) 2011-2012 Made to Order Software Corp. // // This program is free software; you can redistribute it and/or modify // it under the terms of the GNU General Public License as published by // the Free Software Foundation; either version 2 of the License, or // (at your option) any later version. // // This program is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // // You should have received a copy of the GNU General Public License // along with this program; if not, write to the Free Software // Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
We use all three of these and the \todo from Doxygen to mark the code we need to work on as time passes. Until we have more programmers we need to develop everything perfectly all at once, you will see such in the code.
See Also: Fixme Comment
Snap! Websites
An Open Source CMS System in C++