Table Fields

"domains" Table

The table includes the rules to be used to check a domain name and determine the exact website key. The key of the domain table is the domain name and TLD which is computed from the URI.

Field Name Description
core::original_rules Rules as written by the user. (text)
core::rules Rules after parsing the original rules. This is a serialized buffer of rules ready to be used by the Snap parser. (QtSerialization output)

The table has a special row named "*index*" which is used to sort the domains in alphabetical order (really "binary UTF-8" which is enough here and does not involve a locale.) This helps when editing the list of domain names using the snap-manager. It helps when an administrator wants to search for a domain by name (as long as the administrator knows the first few letters.)

This is just one extra row because we do not foresee the need to support millions of domain names on a single Snap! C++ system. Thousands will work without a problem. Also it is only used for administration (not while serving hits.)

"websites" Table

The table includes the rules to be used to check a website URI and determine the exact site key.

Field Name Description
core::original_rules Rules as written by the user. (text)
core::rules Rules after parsing the original rules. This is a serialized buffer of rules ready to be used by the Snap parser. (QtSerialization output)

The table has a special row named "*index*" which is used to sort the websites in alphabetical order (really "binary UTF-8" which is enough here and does not involve a locale.) This helps when editing the list of websites using the snap-manager. It helps when an administrator wants to search for a website by name (as long as the administrator knows the first few letters.)

This is just one extra row because we do not foresee the need to support millions of website names on a single Snap! C++ system. Thousands will work without a problem. Also it is only used for administration (not while serving hits.)

The column keys is defined as the name of the domain followed by colon/colon followed by the fully qualified website URI. For example we use the following for our https://www.turnwatcher.com/ website:

turnwatcher.com::www.turnwatcher.com

That way you can easily find all the sub-domains defined for a given domain.

"sites" Table

The site table includes all sorts of globals that the different plugins make use of. There is no real limit with Cassandra since we can have as many as 2 billion columns in one row. However, we obviously want to try to limit the number of columns. If you can (it makes sense,) serialize the data and save it in a single column instead of many. Our serialization process saves floating point values lossless so it is quite safe.

Field Name Description
core::last_updated The date when the website plugins were last updated.
core::last_updated::<plugin-name> The date when that specific plugin was last updated. It is important as different plugins have different thresholds.
core::plugins List of plugins used by this website.
core::plugin_threshold Time when the newest plugin was installed/modified (the actual .so file.)
core::redirect An internal redirect from this site to another. The user doesn't get a Location: when this parameter is defined. It is not recommended, but works seemlessly.
core::site_long_name Long form of this website name.
core::site_name This website name, this one is the "normal" name (also it is the default when a long or short form doesn't exist.)
core::site_short_name Short form of this website name.
users::password::digest The name of the OpenSSL digest to use to encrypt passwords. If undefine, use "sha512".
sitemapxml::count This number represents the number of sitemap.xml files saved in the database for this site. The count is most often 1 as most websites don't have over 50,000 pages. The count does not exist until the backend process generates the XML sitemap data.
sitemapxml::sitemap.xml
sitemapxml::sitemap<count>.xml
The actual files are saved directly here. This is a better location because it is not content that we'd otherwise show end users. The sitemap.xml file is used when just one file is necessar (sitemapxml::count is 1). The other files are used when multiple sitemaps are necessary. The count defines the number of sitemaps files starting at 1 up to <count> included. [as I've been working on the system, the fact that these are not end user content is not an argument anymore. Once I have time to move these files to the main content, I will do it. That will allow us to use the core functions that send "attachments" to clients.]

"content" Table

Each row in the content table represents a page (one row, one page). Many rows are not visible by anonymous visitors and do not really have a good reason to be shown anyway as they are there only to categorize other pages.

The content table is actually used to define the current branch of the page and some global data about the page. The branch data is actually found in the data table. The content table is similar to an indirect pointer in C.

The rows have names that include <owner>. This is the name of a plugin that owns that specific revision control. By default the owner is set to "content" and removed from the key (smaller keys make it a bit faster.)

The branches are created for all languages that the system supports (which is pretty much all languages in the world.) However, the revisions are specific to a language.

Field Name Description
content::revision_control::<owner>::current_branch The current branch of this page. This value is used to find the branch data. This is an int32.
content::revision_control::<owner>::current_revision::<branch>::<language>[_<country>] The current revision of this page. This is what is shown to all anonymous visitors and most registered users. Only editors can see other revisions. This is defined as "<branch>.<revision>".
content::revision_control::<owner>::current_working_branch The current working branch of this page. This value is used to find the branch data. This is an int32. The working branch is the branch that the editor is currently working on.
content::revision_control::<owner>::current_working_revision::<branch>::<language>[_<country>] The current revision the editors are working on. This is used when a new branch is created but not made current immediately (i.e. the editor wants to have the work reviewed before official publication.) This is defined as "<language>[_<country>]/<branch>.<revision>".
content::revision_control::<owner>::last_branch The last branch created for this page. By default this is viewed as zero. When a user creates a branch, it automatically becomes 1 at first. Any new branch and this value is incremented by 1 each time.
content::revision_control::<owner>::last_revision::<branch>::<language>[_<country>] The last revision created in the specified <branch> in the specified <language>. This number is used each time a new revision of the page in that <language> gets saved to the database.
content::created The date when this page was created. This is faster than trying to get the date from the first branch because branch zero (0) may not exist.
content::final If defined, this page cannot have children. This is used so the path looks logical in many cases. Specificaly, this is used for pages that represent an attachment.
path::primary_owner The plugin that owns and knows how to handle (display) this data. In most cases this is "content", however, some pages are special and require a specific plugin (i.e. robots.txt and sitemap.xml are two good examples.)

"data" Table

A row in the data table represents:

  • A specific branch, or
  • A specific revision in a specific language.

The row specific to the branch as a whole is used because many of the parameters are only defined for a branch, not each revision of the branch. Parameters such as the page type and all the links are defined in that row.

The revision rows define the other parameters such as the title and body of the page. When such data does not change between revisions (i.e. you are much more likely to edit the content of a page and not the title.)

Branch
or
Revision
Field Name Description
Revision content::body The body of the page, the actual matter of the page.
Branch
Revision
content::created The date when this content was created.
Branch
Revision
content::modified The date when anything in this content was last modified.
Revision content::title The title of the page when displayed. In general this is what appears in the <title> HTML tag.
Branch layout::layout JavaScript used to determine the layout of the body of this content. This generally defines whether the content is one column, two columns, two columns with a small image at the top-right. One column with a large image after the first paragraph, etc.
Branch layout::theme JavaScript used to determine the theme of the page. This is used only if that very page is being displayed. Otherwise we only use the layout::layout information as defined in the box incorporating another piece of content in the main content.
Branch links::<name> Defines a link from this content to one or more other content entries. The <name> is most often specific to the module. We may consider adding an extra namespace to make sure each module has unique names for its links. (i.e. links::<namespace>::<name>)
Branch links::content::children-<server>-<unique id> List of children of this content. Any content can include a list of children. Each child will have a links::parent pointing back to the content that includes it in its children links.
Branch links::content::page_type The type of content. There is a single type for each page. For example it could be marked as a "Blog" or "Book".
Branch links::sitemapxml::include This link indicates that this page is to be included in the sitemap.xml file.
Branch links::robotstxt::noindex Request that search engines do not include this page in their index.
Branch links::robotstxt::nofollow Request that search engines do not follow links on this page.
Branch links::content::parent The parent of this content. This parameter is used to build a hierarchy of your content. Note that the same content can be given multiple heirarchies with the use of content types or other parent/children links.
Branch links::permissions::administer-<server>-<unique id> Permissions that a user needs to have to be authorized to administer this page.

"users" table

The users table is generic for all Snap! users (i.e. one table for all the websites supported by a Snap! C++ instance.) The row key is the user email. Nothing more. The table includes the password and any other data we have about that user.

Field Name Description
users::original_email The email of the user when he registered his account. At this point there is no special reason to use that email other than, maybe, ask the user on the phone what was that email address for added security.
users::original_ip The original IP address of the user when he registered an account on Snap!
users::password The hashed password of the user.

The hash makes use of the salt (16 bytes before and 16 bytes after the input password in UTF-8).

The hash is computed using the digest as defined by the users::password::digest in the site table. The default is "sha512".
users::password::salt The salt used to encrypt the password. We use 32 bytes of which 16 are prepended and 16 are appended to the password before the hash is calculated. The salt is used as is (in binary form, not changed to hex or anything like that). This adds 256 bits of random entropy to the encryption making it harder to know what he password is (and especially have a rainbow table.)

"sessions" table

The sessions table is used to create sessions for users accessing the website. These are used in mainly two places: logged in and non-logged in cookies, and the form session identifier.

Field Name Description
sessions::id The identifier of this session.
sessions::object_path The path of the object that generated that session identifier.

This is used by blocks and users with the path to the block definition and the user account.
sessions::page_path The path of the page that generated that session identifier.

This is nearly only for form identifiers since in most other cases the same identifier may appear on several different pages.
sessions::plugin_owner The name of the plugin that created this session. This could be the "users" plugin (i.e. for its log in form.)
sessions::remote_addr The remote address of the user when accessing the website at the time the session was created. This can be used for "instant" sessions (i.e. a quick form such as a log in form,) but long lasting sessions cannot really make use of it because it is not unlikely to change for users on a DHCP or dialup.
sessions::time_limit The time when the session goes out of scope and cannot be used anymore. All sessions have a limit. They also include a Cassandra TTL equal to the time limit + one day (86400 seconds) so they automatically get deleted from the database.

This date is used to know whether the user using the session still has permission to do so. If not, the user can generally try again, he just needs to be faster.
sessions::time_to_live The session time to live, this is a duration used to know how long the table will be available for. This is the original time to live value and is not really directly useful to compute when the table goes out of scope. For that purpose, we use the timestamp instead: time limit.
sessions::used_up This entry is not defined by default. Once defined (set to 1) it means that the form was used (POSTed) and thus it cannot be reused.

 

Snap! Websites
An Open Source CMS System in C++

Contact Us Directly