Snap! Websites
An Open Source CMS System in C++
For some rather non-obvious reason, spam bots are having fun posting totally random, most usually non readable messages (gibberish as in random letters and numbers) to public forms they find on the Internet.
Even for themselves, it seems to me that it would be a lot more effective to post useful messages and not blow away their chance from the start.
Yet, many of these bots are quite retarded and can be prevented to post uselessly using a few tricks as follow:
See also: Anti-Spam feature
You can include a form identifier in all your public forms. This may not sound like it, but by itself it is very effective against most bots. For some reasons, bots will read the form and send a POST not including the valid identifier (because they reuse the same over and over again.)
There is one drawback with this technique. The identifier cannot be cached or all the users get the same number, including the bots for a long period of time. It is still a good idea to verify that you indeed generated that form at some point. Otherwise bots will send you all sorts of randomly generated forms...
Using CSS, one can hide a field. Having a mechanism to generate a hidden field with or without data has a very interesting side effect. 99% of the bots will not go as far as determining whether a field is hidden (and anyway it is not that easy to read a big pile of CSS and determine such.) So that field is hidden to your normal users (assuming that the CSS is loaded properly) but visible to bots. What will bots do? Change the value of the hidden field with something else. At that point you know that a bot sent the POST because that field should not possibly be changed by one of your regular users.
To strengthen this feature, more than one hidden field can be included in a single form.
CAPTCHA are systems to detect whether someone posting is a human. Most people do not like them and more often than not they are hard to answer. Our system may offer the feature at some point but not as a default! We generally want to try many other methods first.
The obvious drawback is that this method does not work on systems that do not have JavaScript turned on. These methods may just require that the end user have JavaScript or they just do not get the form at all.
One way to very much strengthen the Form Identifier is to have a JavaScript start and request a form identifier at the time the form is used. The request can happen when the form is shown, when the user clicks in one of the fields, or at the time it is going to be POSTed.
The Form button can be replaced by an anchor and a small JavaScript function. The JavaScript function is the one preparing the form and sending it.
The form preparation can include many features such as:
Really the use of JavaScript will break bots 100% until they learn the script. However, if you randomize the variables in the script (something that some viruses will do) it becomes really hard for these bots to go against this ability.
The form could actually be loaded using JavaScript. In other words, if you don't have JavaScript, the form does not appear at all in the HTML source. Since the bot is not going to execute any script this well (with AJAX queries and possibly decryption of the data sent...) the result is that the bot doesn't get the form in the first place and therefore is not going to even know that a form exists on that page.
Snap! Websites
An Open Source CMS System in C++