If you don't get the reference to the title of this post, maybe this will help:
This probably made you laugh, but the "Little Bobby Tables" problem illustrates a real-world exploit that is used by hackers and malicious users every single day. The formal name for this kind of exploit is "SQL Injection", and it's spectacularly successful against web sites and applications that don't screen or sanitize their input data. Like yours, perhaps.
SQL Injection works inserting (or "injecting") additional queries or code into your application. It does this by putting malicious code into a form field (or sometimes through an uploaded file). Your application grabs the form field, starts processing the data, and is then tricked into executing the malicious code. If someone can get your application to execute code they pass in, the sky's the limit as far as what they can do to your server or application. They can serve up "warez" or child pornography, sell penis pills, and host fake PayPal sites. They can use your server to attack other servers, send spam, infect users who visit your site, and so on.
One of the beauties of SQL Injection is that since SQL syntax is pretty much the same across different brands of databases, the same exploit can often be run successfully against Oracle, mySQL, MSSQL, PostgreSQL, and so on. Isn't that great?
It's beyond the scope of this blog to cover every possible form of SQL Injection, but we'll cover a typical one and give you guidelines on preventing injection exploits.
A Classic Exploit Example
Let's use a very simple SQL statement that selects a user name from a table. The (poorly-written) application doesn't do any screening of the input characters. In this case, the user has entered the name "john" into the form field on the page. The application grabs the name "john" from the form field and then uses it in a SQL query like this:
SELECT * FROM users WHERE user_name = 'john';
Nothing special about that, right? But what if we added something to the end of "john", as in the example below (shown underlined). What if, for example, we typed "john OR 'a' = 'a'" ? The application would grab the data in the form field and use it like this:
SELECT * FROM users WHERE user_name = 'john' OR 'a'='a';
If this code was used in an login system it would probably force the selection of a valid username because the evaluation of 't'='t' is always true. That's fairly benign, but what about if we went a step further and typed something else into the form field…something malicious?
SELECT * FROM users WHERE user_name = 'john' ;DROP TABLE users;
Whoops. Submit the form and the 'users' table disappears into the Big Wastebasket In The Sky.
That's the basic idea behind SQL Injection, but as with most things, there's quite a bit more to it than that. Wikipedia has a pretty good page on SQL Injection concepts; if you're unfamiliar with SQL Injection it would be well worth your time to check it out. Malicious data can be obfuscated (hidden) in innumerable ways such as octal encoding, using tabs or Base-64 encoding to fool injection filters, using deprecated HTML attributes to sneak in bad data, along with many other clever techniques.
Preventing SQL Injection
The theory behind preventing SQL Injection is blissfully simple:
Don't allow malicious code or inappropriate characters to be passed in to your application.
In other words, "sanitize your input data". Never, ever trust the network.
In practice it's usually a matter of checking for appropriate for the data types in the submitted data, and rejecting it or cleaning it if "naughty" characters are found.
You keep saying "characters", isn't it enough to just make sure words like "DROP" and "SELECT" and "UPDATE" aren't allowed to be passed in and processed?
No. Words like "DROP" and "SELECT" and "UPDATE" are perfectly normal words that may be found in legitimate user-submitted content. For example:
"Bob says that we're going to drop the XYZ Widget line; we should select a day to get rid of them and then update the inventory records."
What you want to do is
- screen out or escape characters that aren't appropriate for the data type (e.g. there's no need to have backticks or semi-colons in a submitted telephone number), or
- transform potentially malicious characters so they're safe to handle (e.g. transform HTML brackets into their respective HTML entities).
In other words, if the incoming data should only contain numbers, only allow numbers. If it's only supposed to contain alphanumeric data, only allow alphanumeric data. In addition to validating the kind of data, rigorously enforce limits on the length of data. If someone is submitting a 10-digit telephone number, should you accept a string that's 5,000 characters long?
For screening data, work from the principle of what should be allowed in rather than what should be kept out- you'll find it's a lot shorter list of items to manage. And for screening characters, there's no better tool than Regular Expressions, or "regexp", which we'll cover in the next article.
For a simple RegExp cheat sheet, visit this page:
For PHP coders, we suggest using something like this to clean your incoming data: JetScripts Data Sanitizer and XSS Cleaner