Core Features

With this table the computer can test to find out whether a link breaks (i.e. you cannot follow the link to an existing website anymore.)
- Note that links on pages that are not currently published should either not be checked, or checked just once. Unpublished means not visible by regular people (i.e. it could be marked as spam rather than just unpublished.)
- Find out whether we can know when a YouTube video (and others) get removed, can we get a message? Would that page return a 404 even though it shows that black video screen?
- When a link is considered broken, we can use the Redirection feature to send users to an error page so they are not given access to a page with broken links and when someone hits that page error we could send a message to the site administrator
- A broken link could be marked as such and thus made non-clickable; however, it is often that when automatically testing a link it fails because the destination views us as hackers and as a result refuse to play nicely with us and bang! we'd mark the link as non-clickable; so we need an interface to mark links as good whether or not they can be checked by this feature
- A link that (at least) seems to work, may be a link to a website that will otherwise redirect search engines. We need a way to reliably check whether such links exist. This is a black hat technique to get links from your site to pages that otherwise are used to redirect search engines to another website.
Read the destination page and with that data:
- Setup the link title="..." tag with the page <h1> tag
- Automatically mark the link as rel="external" if that is the case, or rel="nofollow" if that link is external and the site owner does not want to give juice away
- Add classes depending on the destination (i.e. it could be as simple as transforming the destination domain name in a class name)
- Add an icon depending on the destination data format (i.e. a PDF icon for .pdf files, etc.)
We can display the list of all the links (publicly or not)
- On a per page basis
- On a per book or other grouping basis
- For the entire website
The user can manage the links (eventually "inline") and mark them as
- Non-clickable (i.e. example.com)
- With rel="nofollow" (i.e. a site to which you do not want to send SEO juice)
- Mark links as rel="author", rel="tag", rel="external"
- Add a class to a given link
- Add the title to links that are missing it
- Make one title work for all the instances of the same link
- Manage a white list (external links that do not get the rel="nofollow" attribute)
- Manage an orange list (external links that always receive a rel="nofollow" attribute)
- Manage a black list (external links that are 100% banned so get removed from the site)
- etc.
End user advanced features on Links include:
- QR code, the link opens a pop-up with its QR code
- In case a link sends the user to a different website, then the user is warned first (i.e. a pop-up appears saying "warning you're actually leaving website blah and you'll be on your very own on the wild Internet!")

What are Those Links About?

In order to mark a link as this or that, one can add a rel="..." parameter. This is valid for <link ...> and <a ...> tags. Indicating relations often can be a great help to search engines. For example, a link to the home page can be marked as rel="home". This is particularly useful if the page is not at "/". Similarly, the author of a page can be indicated with an author link as in rel="author". Further an author can indicate an external page as being his page with the rel="me" indicator.

Editing of links will include the necessary dropdown to select the type of link the user adds. Also, links created by code should all have a rel="..." parameter when these are to appear on public pages. There are many already assigned rel values that we can use.

Source: http://microformats.org/wiki/existing-rel-values

Local Links always Local

Our software should be smart enough to create local links even if the user entered a full URL. That way we can optimize the pages (i.e. instead of href="http://www.example.com/foo/bar/page.html" we could maybe just have "page".)

Note: this is certainly very good in general, however, when generating things such as an Feed feature [core] (Atom, RSS 2.0, etc.) we need to have absolute links to make sure they work as expected.

Local links, when clearly created as such, can make use of the title of the destination page, as I do on Drupal with the {node:126 link} syntax (using '[' and ']' instead of '{' and '}'). This way, when a destination page title changes, any other page with a link to that page can be updated with the background filter mechanism.

Such tokens generate Dependencies between pages.

Local Links that Break

Standard Local Links

In some situations, a local link will break because the destination page moves (gets renamed). In that case, the source anchors should be fixed and now link to the new page. (The content backend process can do that work as required whenever something changes, assuming we have back links for all pages, we should know what pages get marked as modified on such an event.)

Deleted Pages

A local link to a page that gets deleted may require a little help by the editor of the website. Those get added to the list of broken links.

Special Local Links

There are cases when we should not even have to bother with marking a local link to a newly deleted page as a broken link. This happens for a link to a product in an invoice. If the product gets deleted, then it is gone and thus the link is dead. Period. Such links should be marked by the e-Commerce feature plugin as automatically remove on delete.

Navigation Menus

The user is given the possibility to organize one's content so as to form a menu. The pages have next/previous/child/parent references that can be used to build a structured menu.

Although, we need to make sure that the same link can still be used in two different menus. The primary feature is to list all the links, once each.

Automatic URLs

It is possible to write a parser to transform what looks like a URL into an active, clickable URL. (can also manage email addresses and as such it can transform them in JavaScript, etc.)

This feature must skip all example.extension URLs since http://www.example.com/ doesn't exists (so doesn't need to become clickable.) The user should be able to add other exceptions.

The code to transform URLs may be part of the glossary code which also creates links to other pages and possibly websites.

This process can occur at the time the content of a page is saved (i.e. once.)

Also, we can have this process happen while the user edit his text (i.e. the editor is now the smart one!)

Finally, we want the opposite behavior. Someone may be copying a document from another page that has a link to using the example.extension type of URL and those should be unlinked because they are not real links. However, the user could still choose to keep a <span> tag with a class that makes the link look like a link, but not clickable... (i.e. the class can setup the cursor and colors as if that was a link, we could even add a title, maybe using something like overlib.)

Per Page and Type Settings

The feature should allow a user to mark a page so the filter does not get applied.

Similarly, it should be possible to mark a whole set of pages, as defined by their type or a tag, so the filter is ignored on all of them.

We may want to check out what the default should be (i.e. we could offer the opposite: mark what you want to be parsed and not what should not be parsed.)

Automatic Extensions

As we find the HTTP URLs, we can also detect any URL (i.e. whatever the protocol, we should then have a list of accepted protocols.)

We can also transform email addresses into links (mailto:someone@example.com). And with the server and JavaScript, we can scramble email addresses to where hackers have a much harder to time to steal them.

Link Shortening (shorturl plugin)

We offer a link shortener as well, the plugin is called shorturl.

This is used to generate a short URL (shorter than what the site offers by default) to access a page.

The short URL can be generated on the site itself using the /s path and a counter that counts in base 36.

The plugin (will) offers using an external shortener tool such as tinyURL and similar websites.

Note that there is no need nor is it a good idea to create more than one short URL for a page. So we offer a selection of shortener that we support, but we do not use more than one per page (that being said, you may change an any time and old short URLs are not lost in that case.) The reason for the single short URL is search engines: to verify the short URL, search engines are given the short URL in a link in the page and/or the HTTP header. Either way, we can only offer one such link.

The link is saved in the Cassandra database and integrated in the HTML HEAD using the LINK tag with the rel type shortlink.

The shorturl plugin allows for certain pages to not receive a short URL. For example, all pages that are private (not accessible publicly) are automatically ignored by the plugin. Similarly, some pages such as the Terms & Conditions probably do not need to have short URLs attached. Certain lists and intermediate pages probably do not need short version either. The module offers a signal which is used to know whether a short link should be created or not. At times, it can be a lot faster to test a path than permissions (i.e. any data under /ignore should not have a shorten URL.)

The plugin also counts the number of times a shorturl is followed. It uses our statistics tools for the purpose of counting only what we consider valid clicks (i.e. eliminate robots.)

Google "shortlink" proposal: https://code.google.com/archive/p/shortlink/ (It looks like this one is the current official version for Google to be happy.)

The "shorturl" proposal: https://sites.google.com/a/snaplog.com/wiki/short_url

A French shortener: https://lc.cx/api

Link Redirect

URLs to external websites should be marked as "no follow" (see also the Per Page/Type Settings). On the other hand, if we want to count the number of clicks, all the links should go back to us anyway, and then we can emit a 302 (with AJAX, we could send a POST to our server to count and then do the correct redirect avoiding the 302, but it shouldn't make much difference, except that the 302 can be used to better control the destination--especially if the destination disappears or becomes infected... we can then tell the user what happened.)

This redirections can also be handy to make changes in one place instead of many (i.e. if 100 pages have a link to ABC and that was moved to XYZ, only the redirect needs to be changed.)

Inline Link Management

Administrators can be given a dropdown whenever they hover a link. That dropdown can then be used to administrate links without having to edit the page.

Links should be shown with a warning (redirect) or error (broken) so the user can quickly see that there is a problem. The page should have a fixed warning window that tells the user that there are problems and give him/her buttons/links to go to the location of concern. This is not specific to links since other things could be shown there (i.e. a bad word detected in a page, a missing resource such as an image, etc.)

This dropdown could include different types of information as follow:

Broken link, link goes to a 404 or equivalent
Link goes to a 301, 302...
Link is missing a title
Link is missing an access key
Mark link as Follow or No Follow
Clicks should be counted
Add to a list (white, orange, black...)
Mark link as internal even if external (i.e. avoid the "warning" dialog of external links.)
Define the link target
Mark the link to open in a new window/tab (for end users this is not a "target")

Link Spam

See also the Spam extension as links to other websites when they are expected to be reciprocal in some way, we may want to consider a link as bad/ugly/unwanted and should quickly warn the administrator of such links.

See: Anti-Spam feature

Inline Anchor Management

It is quite annoying to not be able to create anchors on the fly and then link to them.

The system should allow us to add anchors on headers (H1, H2, etc.) and save the link in some form of temporary buffer (cookie?) then pretty much automatically create a link using that anchor (after all we have all the info: title, page URI, anchor name...)

If the anchor already exists on the header, then a Copy button should appear (along with the usual Edit and possibly in this case a Delete.)

Link Remover

On the other hand, it can at times be useful to remove all the links in a page, even those entered using the editor (i.e. the author lost the permission to include anchors... but some existed in older posts.)

Link can be completely removed (The whole anchor <a...>...</a>) or just the anchor tag (the <a ...> and </a> tags are removed, but the text in between is kept.)

See: Leaving HTML behind

Development Status

Administration Menu

We now have an administration menu that can be built by assigning specific tags to pages. It still requires some help so we can group the items. Also we want support for user defined menus (list of manually entered links.)