Unless you've been living in a cave for the past few years (and we mean a cave without internet access), you're probably aware of "bots", which are scripts that run constantly, roaming around the web looking for forms to fill out. They hit every form on every page they stumble across, and they do this for one of two purposes: spreading spam, or worse, to hijack your site through some sort of exploit. They don't care what kind of form they find, they'll happily fill it out and submit it in the hope that it's vulnerable. If you have a web site that has a form on it you've almost certainly been the target of a bot. If you haven't, you will be…but there are things you can do to help stop the bots.
One of the main reasons bots exist is to spread spam. If you have a contact form on your site, it's a safe bet that sooner or later you'll get spam offering "male enhancement pills", offers to liberate 47 bazillion dollars from a Nigerian prince, pirated software, unsolicited stock tips, "personal" messages from lonely Russian brides, etc etc etc. In a similar vein, bots will eagerly sign up on any forum they can find so they can leave links on their personal profile page, hoping to increase their Google ranking. And of course, they don't discriminate between forums and blogs- they're only too happy to leave hundreds or thousands of comments on your blog pointing to all of the useful services mentioned above (pills, dating, porn, money scams, etc).
Having bots spamming your inbox or forum is bad enough, but even worse is when a bot manages to find a vulnerability in the code that processes the form. Most bots these days are dual-purpose bots- they spread spam but will also try out various exploits on your forms to see if they can hack into your site (usually through poor handling of input data). The exploit might be a SQL injection, a data overflow attempt, a directory traversal technique, or any number of tried-and-true methods of gaining access to your server. Or the bot might try them all, one after another- after all, it's got all the time in the world to spend hacking your site. Its patience is endless and since it's not human, it can work day and night, banging away on your forms. An aggressive bot can bring your site down just by hitting your search form 10,000 times, mindlessly trying out its library of possible exploits. Ever wonder why some forums like VBulletin or SMF have a time limit in between the searches you can do on the forum? It's to limit the load when a bot finds the form, that's why.
So, what can you do to stop bots from hammering your site? All kinds of things, as it turns out. Be warned, though- it's an arms race, and the worthless scumbags clever programmers who write these bots are constantly improving them.
A classic line of defense is the CAPTCHA. These are the malformed letters that you have to type into a form field to verify that you're a human and not a bot. The classic graphical CAPTCHA worked well for a number of years, but after a while bots started incorporating Optical Character Recognition (OCR) libraries and the graphical CAPTCHA was doomed. Google used a fairly effective graphical CAPTCHA for a long time, but they admit that it's of little value now. The bots have cracked it and routinely have their way with it like a drunken prom date.
A new type of CAPTCHA was developed, commonly called "pick the kitty". It would display a bunch of images of cats and dogs and horses and weasels and whatever. The idea was that you, the human, were supposed to click on the picture of the cat to allow the form to be submitted. Bots couldn't tell a cat from a dog (or a weasel) and would fail. This was a nice idea, but if there are 5 images and one of them is a cat, that means that, statistically, 20% of the time the bot would succeed in submitting the form. All the bot-owners had to do was crank up the number of times the bot tried the form, and sooner or later they'd get through. (Try a form just 10 times and chances are that you'll get through at least twice.)
Another form of CAPTCHA was the "add these numbers together" type. You have to add "3 + 5" or "9 + 2" and then put the correct answer into a field on the form. If the answer was right the form would submit. Unfortunately, bots quickly evolved to the point where they could detect this kind of form and do the math- even if the numbers were displayed as graphic images (the bots would use their OCR capability to read the numbers, fill in the form, and submit it).
And so it goes. New CAPTCHAs are developed and the bots are tweaked defeat them.
There are some CAPTCHAs (and some techniques) that haven't been cracked yet. One relies on some clever CSS to confuse bots. The CAPTCHA presents an easy-to-read word for humans, but shows a tangled mess of HTML to the bot. Bots apparently can't parse it, and it's obscure enough that maybe no one will devote the time it takes to crack it.
Another technique is checking the time it takes to fill out and submit a form. A bot will do it in under a second, much faster than a human can. A timestamp is placed in the form each time the page is viewed, and another timestamp is generated when the form is submitted. If the two timestamps are too close together (say, under a 30 seconds or so, it's probably a bot filling out the form and not a human.
You can also place field in your forms that's hidden by CSS. The field is invisible to a human (and may also be non-selectable as well), but the bot parsing the HTML of the page sees it and fill it in. (Bots routinely fill in every field in a form, whether the data in the field makes sense or not.) When the form is submitted the hidden field is checked- if there is data in it then it's taken as a pretty sure sign that it's a bot filling out the form and not a human.
You can also use a CAPTCHA service like ReCaptcha or Akismet, but be aware that they are somewhat vulnerable to bots as well. Some bot-writing teams are reputedly "devoted" to breaking the ReCaptcha system, and according to some reports they have succeeded.
Bot Screening Services
These are probably only long-term solution, but they have their limitations too. A bot-screening service works by checking the email and IP address used when a form is submitted. Sites like BotScout and StopForumSpam keep databases of known bot IP's and email addresses. Using a small bit of code, your contact form or forum makes an API call to BotScout, for example, and BotScout returns a score indicating whether or not the IP and/or email address is a known bot. Based on this you either accept or reject the form submission. These services can be very effective, but bots using brand-new IPs or email addresses may slip through before they're caught and added to the database. Personally I like the BotScout.com service and use it on several of my own sites with good results. Your mileage may vary. [Full Disclosure: I know the guys at BotScout and have dated most of their sisters and/or daughters.]
In The End
It's an arms race out there and the bad guys are winning. The fact is that bots are not going away, and bot activity is increasing radically each year. Learn to deal with them and plan on changing your tactics to keep up with them. Check out the different CAPTCHAs and services until you find one or more that works for you.
I foresee a day when most, if not all, signups to things like forums are done on a "register and send me an email" basis. Some forums have even started charging a dollar via PayPal to signup, refunding the dollar once they verify that you're for real. Most spammers would never dream of spending a dollar just to gain access to a forum for the purposes of spamming, so this practice may become widespread. Either that, or we should pass legislation that applies the death penalty to spammers and bot operators.