Links between rows

Tue
07/24/12

All the data in our system is linked one way or another. It is important in order to organize the data in a tree and to categorize it (to not say tag it).

The data starts with a Root node and everything is defined under that root node. The linkage is done using the name of each node and the name of the category being used (a category is itself a node of content, so for example the User category is a node defined under the Root node and a user account is defined as a child of the User category. Thus to find users, you start from the Root, go down to the User category, and then list all the children of that category.)

Links are created in the "data" table on a per branch basis. That is, if you create a new revision of your page, your keep the same links along. If you create a new branch, we generally copy all the existing links, but you can then change those links and the old branch links will not change.

In Cassandra each link is a column in the row of data of a branch in the data table. When calling functions from the links plugin you create two link info objects. One is the source and the other is the destination. This information is used to know the name of the link (i.e. Parent, Children, Tag, Permission, etc.) and its type: One or Many.

A link of type One uses its name as is. So the name of the column for a Parent link is defined as "links::content::parent".

A link of type Many has to use a unique identifier. This unique identifier is added to the name. Say the identifier returned is "snap-123", then the name of that link would be:

"links::content::children-snap-123"

Using the Cassandra search capability, we now can read all the links back of a type Many using the name without the unique identifier:

"links::content::parent::"

This gives us an easy way to seamlessly browse through all the links.

In our current implementation, we have ONE link per column. It solves several management problems and inter-computer management of the database.

NOTES

In a later version we may want to include multiple links per column (i.e. up to 100 or so,) but there are management problems with that. Also, we have weighted the concept of using one column for all the links and there are also problems with that concept since that means that one column could grow having millions of links which means a HUGE column very slow to read and a large buffer to handle (i.e. assuming one link takes up 100 bytes, with 1 million links you get 100 Mb of data transfers... and if that's because you have 1 million pages, then it would be loaded each time you try to access a page on your website. The result would be a rather slow server! And you have to think about the necessity of the data when downloading 100Mb and most of the data is unused, it is a gigantic waste.)

Links are implemented.

See: Unique Numbers with Cassandra

See: Categorization feature (tags, hashtags, taxonomy) [core]