Website Forms and Spam

For some rather non-obvious reason, spam bots are having fun posting totally random, most usually non readable messages (gibberish as in random letters and numbers) to public forms they find on the Internet.

Even for themselves, it seems to me that it would be a lot more effective to post useful messages and not blow away their chance from the start.

Yet, many of these bots are quite retarded and can be prevented to post uselessly using a few tricks as follow:

See also: Anti-Spam feature

Form Identifier

You can include a form identifier in all your public forms. This may not sound like it, but by itself it is very effective against most bots. For some reasons, bots will read the form and send a POST not including the valid identifier (because they reuse the same over and over again.)

There is one drawback with this technique. The identifier cannot be cached or all the users get the same number, including the bots for a long period of time. It is still a good idea to verify that you indeed generated that form at some point. Otherwise bots will send you all sorts of randomly generated forms...

Hidden Fields

Using CSS, one can hide a field. Having a mechanism to generate a hidden field with or without data has a very interesting side effect. 99% of the bots will not go as far as determining whether a field is hidden (and anyway it is not that easy to read a big pile of CSS and determine such.) So that field is hidden to your normal users (assuming that the CSS is loaded properly) but visible to bots. What will bots do? Change the value of the hidden field with something else. At that point you know that a bot sent the POST because that field should not possibly be changed by one of your regular users.

To strengthen this feature, more than one hidden field can be included in a single form.

CAPTCHA

CAPTCHA are systems to detect whether someone posting is a human. Most people do not like them and more often than not they are hard to answer. Our system may offer the feature at some point but not as a default! We generally want to try many other methods first.

JavaScript

The obvious drawback is that this method does not work on systems that do not have JavaScript turned on. These methods may just require that the end user have JavaScript or they just do not get the form at all.

Query Form Authorization

One way to very much strengthen the Form Identifier is to have a JavaScript start and request a form identifier at the time the form is used. The request can happen when the form is shown, when the user clicks in one of the fields, or at the time it is going to be POSTed.

JavaScript Button

The Form button can be replaced by an anchor and a small JavaScript function. The JavaScript function is the one preparing the form and sending it.

The form preparation can include many features such as:

  • Do nothing more than forward the data to the server; simple but still that means there isn't a default Submit button available
  • Do a simple computation and save the result in a field before sending the POST; like for the hidden field, this value can just be hard coded in this JavaScript but if not found in the right hidden field, the server rejects the post
  • The computation of a checksum of the data (there could be variance in the checksum computation by sending a different JavaScript for a different form; the form identifier can then be used to determine the checksum algorithm so we can verify it on the other end.)
  • The encryption of the data before sending it; this requires a public and private set of keys; the server must be able to decrypt the data and find the form identifier as well as some other values (i.e. field names, etc.) as expected;

Really the use of JavaScript will break bots 100% until they learn the script. However, if you randomize the variables in the script (something that some viruses will do) it becomes really hard for these bots to go against this ability.

JavaScript Load

The form could actually be loaded using JavaScript. In other words, if you don't have JavaScript, the form does not appear at all in the HTML source. Since the bot is not going to execute any script this well (with AJAX queries and possibly decryption of the data sent...) the result is that the bot doesn't get the form in the first place and therefore is not going to even know that a form exists on that page.

Snap! Websites
An Open Source CMS System in C++

Contact Us Directly