Contents Statistics feature

Many parameters define how a page behaves. This includes things such as how fast a page will get loaded, which directly depends on how large the page is, how many pages need to be loaded, whether we can make use of the browser caches, etc.

We want to collect these statistics and let owners know that information is available. This way we can let them optimize their data over time.

Statistics of interest:

  • Size of the resulting HTML page
  • Size of each included content of data (i.e. "boxes")
  • Count of allowed / forbidden boxes; (this will require a per user counter?)
  • Size of each image, CSS, JavaScript files included1
  • Time to generate the HTML page
  • Number of times the different parts were accessed
  • 10 most repeated words and their density2
  • Whether all images have an ALT and TITLE attribute
  • Check the ALT of the first image for your keywords
  • Ways to indicate the website, page type, page keywords so the system can signal pages that are not up to snuff
  • Use of different tags such as the header tags, bold, italic (i.e. if too much text is one way or the other, generate a warning, not enough keywords in bold, etc.)
  • Duplicated content (that definitively needs to be computed by a backend)
  • Limit on the number of links (Google's old recommendation was about 100 links, or 1 link per 1Kb of HTML, this is rather complicated to support if you think of all the links on the page since we have boxes with menus, RSS feeds, latest posts, etc. which all have links too!) -- we may have to handle that one using a JS that parses the whole page and not just the part being edited!
  • Various Third Party Data and Ranking (i.e. Google Page Rank) -- these have to be done with well behaved backends that will save the information in the page for later reference in the statistics. This allows us to check such information at a slow rate (Google allows about 100 such hits/day if I'm correct...) and other third parties will certainly impose some limits
  • Statistics should be available to people do email marketing so extreme dynamism can be added to their system (see Email feature [core] for more things we want to include in our statistics)

Testing Keyword Effectiveness on Google

Using the Google API, it is possible to do a search for each page of a website.

Since the author can provide keywords for his pages, we can determine how that applies locally, and then use the Google API a few days later to see where the page appears in Google all fully automated. The plug-in can repeat the search on a set of active pages (we'd have to see their policy...)

As a side effect, the search results include the ranking information which we can then save locally and display to the users.

See: (at this point I'm not exactly sure which one we have to use, probably the Google Custom Search)


From my own experience, the number of links has rather little bearing on your rank as long as the links are not just and only for spam. However, an important point about links, if you have a very few on a page with a very high rank, then the juice passed down is much higher...

Therefore a way to gather all the links and present that to the editor and properly inform said editor about the fact is important.

  • 1. We also need an overall overview on those because if the same set of files are loaded on all pages, we can eventually merge them into one file so as to make better use of the user's browser cache.
  • 2. This is for SEO purposes. The SEO keywords should have a density between 2% and 4% -- much more and it's a red flag! It be very practical to have a list of all our pages and see where we have mishaps.

Snap! Websites
An Open Source CMS System in C++

Contact Us Directly