# Leaving HTML behind

## Introduction

In itself, HTML is an output format. What we really want to work with is XML. This gives us a really clean way of handling all sorts of problems that HTML does not handle well (not without some tricks such as adding support for "tags" such as [this-is-a-tag] or comments such as the <!--break--> used by Drupal to define the summary/teaser break.)

The format we want to use is XML for a simple reason, all written documents, in some ways, will easily fit in an XML document. The following are ideas of tags we could use to edit and manage our advance documents. We should look around for existing schema instead of creating our own since I'm sure some people already fixed some of the shortsightedness of HTML.

To transform the document in HTML (or XHTML) use an XSLT transformation script with layout (template) files. Similarly, we could transform the output in a PDF format.

### The <book> tag

The <book> tag can be used to write pages inside a book. A page can then include chapters, sections, sub-sections...

The concept of having a book tag may be a bit far fetch because a book should certainly span over many pages and not represent one single book in one single document (or that's a really big XML document...)

### The <page> tag

The concept of a page in a document is not available in HTML. When you write HTML you write ONE page which may go at length. One simple concept is to break down that long HTML page into several smaller pages that the user can visit using links such as Part 1, 2, 3.

The page tag includes content.

Note that such a tag generates formatting problems with other tags such as the <column> tag since the user may want a <column> tag to apply to page 1, but not 2 and 3 and vice versa. So the page concept may need to use some automatic computation instead of an actual page tag that the user has to place at the right location (although we could ultimately offer both.)

### The <column> tag

The <column> tag is used to define columns for the data it contains. This converts well in a Mozilla (FireFox, SeaMonkey) or Konqueror (Chrome, Safari) browser. For others such as IE it won't work as is. We can either create the columns in the backend with the possibility to not have columns of the same height. Or create columns in the browser using JavaScript.

### The <chapter>, <section>, <subsection>, <division> tags

If creating documentation, it can be really practical to have it broken down in different section levels. We could certainly have any number of levels. 6 levels would match HTML well (i.e. H1 to H6).

Note that the <page> tag can be viewed as the top level, although if you want to write a book, we may want to have a top level named <book> that's composed of <page>'s.

### The <box>, <frame> tag

The <box> tag can be viewed as the HTML <div> tag and is used to create a box of content, often to place such a box as a float.

The <frame> tag is used as a field set with a label. As in other systems we can offer to collapse frames so that way we can put more content on a single page and go from one frame to the next.

### The <p>, <quote>, <cite>, <pre>, <samp>, <info>, <warning>, ... tags

A paragraph of text. Similar to HTML. Should include support for features such as indentation, margins, centering... so we can create blockquotes, quotes, citation, warnings and similar blocks.

We may want to just have a <p> tag and a type attribute instead of many different tags that give us a paragraph with a different look. This could translate to a class in HTML or the use of a specific tag (i.e. <quote> could make use of the <blockquote> tag, but the warning tag would probably use <p class="warning"> and the warning class defined in some CSS file includes the necessary data to make the warning paragraph look as promised in the current theme.)

List of possible formats:

• <p>
• <quote> (<blockquote>)
• <cite>
• <pre>
• <samp>
• <center>
• <help>, <info>, <warning>, <error> -- show a box with an icon and the given text
• <geshi> -- apply a geshi like filter on the code inside the tag, offer to select a language (i.e. C, C++, JavaScript, HTML, etc.)
• [itex] -- apply MathJax, jsmath or alike (although... we may want to do it on the server if possible, make use of the ml DTD and make it inline instead of a paragraph/block.)

### The <list>, <item> tags

Define a list with items. These are often used in documents. Note that we do not mean menu here since these are just lists appearing in user documents.

The anchor in HTML is used for several purposes so we probably want to use a different naming convention. Plus, instead of just an href, we should have ways to reference existing elements in a page, in a book, in a website...

The anchor as in <a name="..."> is not defined with a link. Creating an anchor just means adding a unique identifier to a tag such as a paragraph tag (<p id="come-here">.) Assuming we can define these identifiers with ease, then a link can reference such identifier that later is converted in an href with an anchor ("#come-here" at the end of the corresponding page URL.)

Internal references could be the name of the page, the page number, a random page (?!), etc. The interesting aspect with this mechanism is:

• Page A has a link to Page B
• Page B title is "Cool Page"
• Page B is changed to "Old Cool Page"
• Page C is created and named "Cool Page" (the new "Cool Page")
• Now Page A points to Page C

The <link> tag should be capable of pointing to Page B even after it changed its name, but remember the former name so if a new page is created with that name, the link changes to that new page instead.

A link can be marked for appearing under the article instead of being made a link in place. It should be possible to do so on a page basis too.

Link can accept an icon, for example external links can be assigned a little "external link icon". Similarly, mail:... and any other different type of icon we can think of (i.e. legal, book, printer, text, PDF, etc.)

The broken link feature should be capable of offering the user the new link in case it is a 301 and if the user doesn't want to change the link (because the old one looks nicer or is a shorten link,) then the link tag should hold that information.

Links can be given a disclaimer and clicks on them can be tracked. However, the user should be able to remove such capability on a per link basis (for example, the page being visited may already be a disclaimer in itself in which case following the link would not require someone to again accept to follow the link or not.)

External links may be pinged (in case the website allows us to do that, of course.) The user should be able to turn off the feature on a per link basis. Also we need to save the information about which page was pinged when and with success or not.

Links can be transformed into a QR Code. The only problem is, where do we show the QR Code of a link that's inline? (in a margin? or have a small icon that when hovered shows the code?)

### The basic formatting tags (<b>, <i>, <u>, <em>, <strong>, etc.)

We can support all the inline HTML formatting tags as is. This would include the following:

• <b>
• <br>
• <i>
• <u>
• <strong>
• <em>
• <s> or <strike>
• <q>
• <ins>
• <del>
• <big>
• <small>
• <code>
• <sub>
• <sup>
• <tt>
• <font>
• <hr>
• <vr> (vertical bar for column breaks? or just <columns>?)
• <tooltip> (add tooltip like dropdown, possibly using the <abbr>, <a> or overlib like feature)

However, in HTML some tags were removed (or at least marked as deprecated) such as the <font> tag. What we may want to do is use <span> tags with styles instead of any specialized tags. (TBD)

### The <table> tag

This is a quite complicated tag, but we certainly want to support it! The HTML definitions should work for us.

### The <footnote>, <ref>, <sidenote> tags

In documents you often want to add a note or a reference to another website (a source). These tags can be used to that effect.

Note these tags could make use of the <link> tag with a special type so it knows to add them in the footer of the current page.

The <sidenote> is like a footnote but it is shown off to the side, in the margin (usually right margin.)

### The <insert> tag

The <insert> tag is used to reference another piece of content and insert some of the available information from that other piece of content (i.e. title, teaser, creation date, etc.)

The <insert> tag should offer a wide variety of features:

• Show any one field from the referenced content (title, teaser, author, date, etc.)
• Show a list of matching pages (the list can be random or very specific.)
• Insert some dynamic content (i.e. today's date, some counter, etc.)
• Display another document inline (IFRAME, imported HTML, Doxyen output, etc.)

Also we may want to break down those features using different tags (the list of pages could be generated using the <list> tag, the date could use a <date> tag, etc.)

Such a feature could then be used to create statistic pages, although only shown to administrators, could be a "regular" page inserting all sorts of counters from the website.

### The <toc>, <index> tags

Longer documents may be easier to navigate with a table of contents and an index, especially if the document spans on multiple pages. The <toc> tag is used to position the table of contents or index within a page. Both, the index and table of contents are otherwise automatically generated as required. We also want to show blocks for these features (i.e. show the <toc> from another page.)

It should also be possible to use the <toc> tag with a list of pages so that way we can create a complete table of contents or index of a website, or a website section (i.e. a specific book.)

The <index> tag is used to mark words that should be indexed (i.e. <index>table of contents</index>.) This way we can really give the end user a way to index exactly what he/she wants. A word marked with <index> can also be considered a tag so you can use that to tag your pages (i.e. if you talked about websites, then an instance of the word website should be marked for indexation and that also can become a tag for that page--meaning that you can find a dynamic page (i.e. /tag/website) that lists all the pages that have "website" in them. (We'd have to work on the order of the elements on such dynamic pages.)

### The <img>, <video>, <audio> tags

Insert another media in the page. Video and Audio could use Flash or HTML 5.x.

These tags may include features that are not available in HTML.

For example the image could have features such as crop, resize, advance borders, filters, flips, etc. Images also support areas (clicking areas.)

The video and audio tags would use Flash and at this point that flash animation is what you get.

### The <verbatim> tag

It should be possible to directly insert content in a specific format (i.e. HTML, PDF, XML, etc.) The <verbatim> tag defines the language and thus whether it is compatible with the current output (i.e. you could have a Facebook page that people should Like, etc.)

The content itself must be saved in a <[DATA[...]]> tag to avoid problems.

### The automatic tags

Some features can be automated. For example, a user may write HTML. This is an abbreviation (Hyper Text Meta Language) and as such it should tagged with the <abbr> tag in HTML. This should be completely automatic (although a user may choose to turn off the feature and have their own dictionary of abbreviations.)

The <abbr> is only one example. There are several HTML tags we can work with:

• <abbr>
• <acronym>
• <dfn>
• <kbd>
• <var>
• <a> -- on URLs we discover in the source and have not been "banned" or black listed--this may actually be done in the editor and not automatically?

The corresponding tag can be anything, really, but the idea is to generate something similar to a glossary, an index, a summary... all sorts of table of contents.

### The <search> tag

Add words and sentences that make the search return this page even if those words and sentences do not directly appear in the page (i.e. synonyms, etc.)

This is probably a seperate field managed by the search feature though.

## Form

The other form elements are reserved for the Form layouts which are not available to end users. However, we define them here for now:

### Attributes for all Widgets

All widgets accept a set of widgets and sub-tags that are defined here.

#### The <label> sub-tag

The label of the field. In most cases, this represents the HTML label.

A label can be shown in front of or behind a widget. Also, we can define a label when the widget is or not selected, hovered, focused, or a mix of those states.

#### The <description> sub-tag

A description for this field. This is most often used as a help message. It may be shown below the field or under a question mark button with a sub-window, also it may be shown in a help rectangle on the side.

### The <hidden> tag

Define data that is to remain hidden. This data is sent to the client, but it remains hidden. Note that using the HTML <hidden> tag is the simplest way to get the data back to us. Other means require JavaScript to be enabled.

#### Form Session identifier

All forms receive an auto-generated hidden tag with a random value used to protect forms against cross site forgeries. This ensures that forms we receive are forms we sent in the first place. The auto-generated value of hidden tags represent the session identifier of the form.

### The <fieldset> tag

It is possible to group widgets together. The fieldset is used for that purpose. A fieldset can include a label, a description, and any number of other widgets.

Fieldsets can be opened and closed using JavaScript.

### The <textfield> tag

The text field allows end users to enter text in a text box. The tag includes a set of attributes that define the field. The data between the opening and closing of the tag is part of the text field content (empty, a system default, or whatever the user previously entered in that field.)

• rows--number of rows the textfield should use. If 1, then the HTML <input type="text"> is used; if more than 1, then the textarea is used.
• cols--number of columns the textfield should use. If the number of rows is 1, then this is the size attribute of the input tag.
• maxlength--maximum size of the data entered in the field. The textarea does not enforce a maximum in all browsers (not without JavaScript.) The maximum is always checked by the server on submission though.
• filter--what the user can type in the text field, the default being anything they want.
• password--is defined, then the content is to be shown as dots or stars (hidden). Note that rows must be 1 when this flag is set.
• read-only--mark the field as a read-only field, the end user won't be able to edit it.

The filter requires JavaScript to check each keystroke from the user and ensure validity.

### The <select> and <option> tags

The select tag is used to create checkboxes, radio buttons, drop-down, multiselect or single select lists. The attributes define the different modes and how the flag is rendered.

• minselect--minimum number of flags to select, until that many are selected, the user cannot submit.
• maxselect--maximum number of flags that can be selected, if the user try to select more we can either unselect another entry or prevent the newer selection.
• mode--define the type of flags as: checkbox, radio, dropdown, or list. Note that this flag is ignore if the only way to represent the flags is to use a list anyway.

When minselect is zero and maxselect is 1, then we must use a single select list that you can unselect. However, at some point we will offer a dropdown with a non-mandatory "Please Select" option (or maybe a &nbsp; entry.)

When minselect and maxselect are both 1, then we can use radios, a drop-down list, or a single select list that you cannot unselect.

In all other cases we can have checkboxes or a list.

Note that maxselect cannot be less than 1.

The options one can select are defined by the <option> tags defined within the <select> tag.

For example, radio button flags defining a Yes and No choice would be defined as:

<select minselect="1" maxselect="1" mode="radio">
<option selected="selected">Yes</option>
<option>No</option>
</select>

### The <date> tag

Let the user enter a date using a calendar widget. This requires JavaScript.

One can use a <textfield> with a format attribute to let the user enter text instead.

We may also want to look into a way to tigh two such fields together so the user can choose to enter a date either way.

### The <file> tag

Let the user upload a file. We also want to look into a way to support drag and drop because (1) most everyone is now supporting such; and (2) the Browse button is an annoyance (especially since only one file can be selected at a time--yes, there are ways with Flash, but that doesn't work right on many computers...)

Files are expected to be included in the POST data as attachments.

If possible, we want to define limits on a per widget and form basis on the maximum size (byte wise) and number of files that can be uploaded in this manner.

### The <button> tag

A form button to submit the form. One special button is the submit button, it is selected automatically when a user hit the return key in the form.

Each button can send the POST data to a different URI. In that case we still need to have a generic way to verify the POST data once it reaches the server.

Fields that have a specific requirements can be used to prevent users from clicking a button. Assuming we have JavaScript available, buttons can be made unclickable until all required fields are entered.

### The <data> tag

A form may come with some server side only data. This can be saved in this tag for in memory access. This is very much a TBD because it seems to be that we can just use the database and the session (see the <hidden> tag for more information about the sesson identifier of forms.)

Such data is often useful since the process stops while on the client's computer and restarts later once the client submits the form at which point we may want to use the data to validate the form contents and save it along the submission results.

### Development Status

As it stands, the system supports widgets for forms. The implementation, however, uses the tag <widget> instead of specific tags for each widget. This way we parse a large number of things in a similar way without having to expect each tag to accept the same parameters (although HTML does that...)

The status is "development" because we plan to do quite a bit more work in that area such as letting people edit pages using our own XML (because browsers have an XSLT feature so we do not have to use HTML in a browser.)

Snap! Websites
An Open Source CMS System in C++