Try Tuts+ Premium, Get Cash Back!

Can You Hack Your Own Site? A Look at Some Essential Security Considerations

Twice a month, we revisit some of our readers’ favorite posts from throughout the history of Nettuts+. This tutorial was first published in July, 2008.

Version one goes gold! Visitors are landing from every corner of the globe. You know there are likely to be a few teething problems; I mean, this is 1.0.0.0… all those zeroes are meant to allow us a little grace, right?

Maybe that dastardly style sheet just won’t cascade elegantly on browser X. An incomplete comment chucks out some broken mark-up. Maybe you should have persisted those database connections after all. Hey, we all overlook things in the excitement of getting our first version running – but how many of these oversights can we happily stomach, and how many might just leave a bitter taste in ours, and more painfully our client’s mouths…

This article walks through the brainstorming stage of planning for what is, in this instance, a hypothetical user-centric web application.

Although you won’t be left with a complete project – nor a market ready framework, my hope is that each of you, when faced with future workloads, may muse on the better practices described. So, without further ado…Are you sitting comfortably?


The Example

We’ve been asked by our client to incorporate into an existing site, a book review system. The site already has user accounts, and allows anonymous commentary.

After a quick chat with the client, we have the following specification to implement, and only twenty four hours to do it:

Note: The client’s server is running PHP5, and MySQL – but these details are not critical to understanding the bugbears outlined in this article.


The Processes:

Our client has given us a PHP include to gain access to the database:

We don’t actually need the source to this file to use it. In fact, had the client merely told us where it lived we could have used it with an include statement and the $db variable.

On to authorization… within the datatable schema we are concerned with the following column names:

  • username, varchar(128) – stored as plain text.
  • password, varchar(128) – stored as plain text.

Given that we’re working against the clock… let’s write a PHP function as quickly as we can that we can re-use to authenticate our users:


$_REQUEST Variables

In the code above you will notice I’ve highlighted an area amber, and an area red.

Why did I highlight the not-so-dangerous $_REQUEST variables?

Although this doesn’t expose any real danger, what it does allow for is a lax approach when it comes to client side code. PHP has three arrays that most of us use to get our posted data from users, and more often than not we might be tempted to use $_REQUEST. This array conveniently gives our PHP access to the POST and GET variables, but herein lies a potential hang-up…

Consider the following scenario. You write your code client side to use POST requests, but you handover the project while you grab a break – and when you get back, your sidekick has written a couple of GET requests into the project. Everything runs okay – but it shouldn’t.

A little while later, an unsuspecting user types an external link into a comment box, and before you know it, that external site has a dozen username/password combinations in its referrer log.

By referencing the $_POST variables instead of $_REQUEST, we eliminate accidentally publishing any working code that might reveal a risky GET request.

The same principle applies to session identifiers. If you find you’re writing session variables into URLs, you’re either doing something wrong or you have a very good reason to do so.


SQL Injection

Referring again to the PHP code, the red highlighted line might have leaped out at some of you? For those who didn’t spot the problem, I’ll give you an example and from there, see if something strikes you as risky.

The quickest protection is to strip the enclosure characters or escape them.

This image makes clear the flaw in embedding variables directly into SQL statements. Although it can’t be said exactly what control a malicious user could have – it is guaranteed, if you use this method to string together an SQL statement, that your server is barely protected. The example above is dangerous enough on a read-only account; the powers a read/write connection have are only limited by your imagination.

Protecting against SQL injection is actually quite easy. Let’s first look at the case of quote enclosed string variables:

The quickest solution is to strip the enclosure characters or escape them. Since PHP 4.3.0, the function mysql_real_escape_string has been available to cleanse incoming strings. The function takes the raw string as a single parameter and returns the string with the volatile characters escaped. However mysql_real_escape_string doesn’t escape all the characters that are valid control characters in SQL… the highlighted elements in the image below shows the techniques I use to sanitise String, Number and Boolean values.

The first highlight, the line that sets $string_b uses a PHP function called addcslashes. This function has been part of PHP since version 4, and as is written in the above example, is my preferred method for SQL string health and safety.

A wealth of information is available in the PHP documentation, but I’ll briefly explain what addcslashes does and how to it differs to mysql_real_escape_string.

From the diagram above you can see that mysql_real_escape_string doesn’t add slashes to the (%) percent character.

The % is used in SQL LIKE clauses, as well as a few others. It behaves as a wildcard and not a literal character. So it should be escaped by a preceding backslash character in any cases where string literals make up an SQL statement.

The second parameter, I pass to addcslashes, which in the image is bold; is the character group PHP will add slashes for. In most cases, it will split the string you provide into characters, and then operate on each. It is worth noting that this character group can also be fed a range of characters, although that is beyond the scope of this article. In the scenarios we’re discussing, we can use alphanumeric characters literally e.g. “abcd1234” and all other characters as either their C-style literal “\r\n\t”, or their ASCII index “\x0A\x0D\x09”.

The next highlight makes our number values safe for SQL statements.

This time we don’t want to escape anything, we just want to have nothing but a valid numerical value – be it an integer or floating point.

You might have noticed line 10, and perhaps wondered what it’s purpose was. A few years ago, I worked on a call centre logging system that was using variable += 0; to ensure numerical values. Why this was done, I cannot honestly say… unless prior to PHP 4 that was how we did it?! Maybe somebody reading can shed some light on the subject. Other than that, if you, like I did, come across a line like that in the wild, you’ll know what it’s trying to do.

Moving forward then; lines 11 and 12 are all we need to prepare our numerical input values for SQL. I should say, had the input string $number_i contained any non-numerical characters in front or to the left of the numerical ones… our values $number_a, $number_b and $number_c would all equals 0.

We’ll use floatval to clean our input numbers; PHP only prints decimal places when they exist in the input value – so printing them into an SQL statement won’t cause any errors if no decimal was in the input. As long as our server code is safe, we can leave the more finicky validating to our client side code.

Before we move on to a final listing for our PHP, we’ll glance at the final code highlight, the Boolean boxing.

Like the C++ equivalent, a Boolean in PHP is really an integer. As in, True + True = Two. There are countless ways to translate an input string to a Boolean type, my personal favourite being: does the lower case string contain the word true?

You all may have you own preferred methods; does the input string explicitly equal “true” or is the input string “1” etcetera… what is important is that the value coming in, whatever it might look like, is represented by a Boolean (or integer) before we use it.

My personal philosophy is simply: if X is true or false, then X is a Boolean. I’ll blissfully write all the code I might need to review later with Booleans and not short, int, tinyint or anything that isn’t Boolean. What happens on the metal isn’t my concern, so what it looks like to a human is far more important.

So, as with numbers and strings, our Booleans are guaranteed safe from the moment we pull them into our script. Moreover our hygienic code doesn’t need additional lines.


Processing HTML

Now that we have protected our SQL from injections, and we’ve made certain only a POST login can affably work with our script, we are ready to implement our review submission feature.

Our client wants to allow review-enabled users to format their contributions as regular HTML. This would seem straightforward enough, but we also know that emails addresses are ten to the penny, and bookstore accounts are created programmatically – so in the best interests of everyone we’ll make sure only the tags we say pass.

Deciding how we check the incoming review might seem daunting. The HTML specification has a rather wholesome array of tags, many of which we’re happy to allow.

As longwinded as the task might seem, I eagerly advise everyone – choose what to allow, and never what to deny. Browser and server mark-up languages all adhere to XML like structuring, so we can base our code on the fundamental fact that executable code must be surrounded by, or be part of, angle bracketed tags.

Granted, there are several ways we can achieve the same result. For this article I will describe one possible regular expression pipeline:

These regular expressions won’t produce a flawless output, but in the majority of cases – they should do a near elegant job.

Let’s take a look at the regular expression we’ll be using in our PHP. You’ll notice two arrays have been declared. $safelist_review and $safelist_comment – this is so we can use the same functions to validate reviews and later, comments:

…and here is the main function that we will call to sanitise the review and comment data:

The input parameters, I have highlighted red and blue. $input is the raw data as submitted by the user and $list is a reference to the expression array; $safelist_review or $safelist_comment depending of course on which type of submission we wish to validate.

The function returns the reformatted version of the submitted data – any tags that don’t pass any of the regular expressions in our chosen list are converted to HTML encoded equivalents. Which in the simplest terms makes < and > into &lt; and &gt; other characters are modified too, but none of these really pose a security threat to our client or the users.

Note: The functions: cleanWhitespace and getTags are included in the article’s source files.

You’d be correct to assume all we have really done is helped survive the aesthetics of our site’s pages, and not done everything to protect the user’s security. There still remains a rather enormous security hole, though: JavaScript injection.

This particular flaw could be fixed by a few more regular expressions, and/or modification to the ones we are already using. Our anchor regular expression only allows “/…”, “h…” and “#…” values as the href attribute – which is really only an example of a solution. Browsers, across the board understand, a huge variety of script visible attributes, such as onClick, onLoad and so forth.

We have in essence created a thorny problem for ourselves. We wanted to allow HTML, but now we have a near endless list of keywords to strip. There is of course, a less than perfect – but quite quickly written way to do this:

On reflection you’d be absolutely justified in asking, “Why didn’t we just use BBCode or Textile or…?”

Myself, if I were dealing with mark-up processing, I might even go for XML walking. After all the incoming data should be valid XML.

However, this article is not meant to teach us how to regex, how to PHP or how to write anything in one particular language. The rationale behind it simply being, don’t leave any doors ajar.

So let’s finish off then; with quick review of what we’ve looked at:

Admittedly, this article hasn’t equipped you with any off the shelf project. A primary purpose of my writing was not to scare away the designers who code, or nitpick the work of coders anywhere, but to encourage everyone to author robust code from the get-go. That said, I do plan to revisit certain elements of this article in more detail later.

Until then, safe coding!

Note: Want to add some source code? Type <pre><code> before it and </code></pre> after it. Find out more
  • http://www.csscount.com Mark Abucayon

    very cool, I will study this one later. Good Job

  • http://maiconweb.com Maicon

    This is great! I have a overdose of useful information. I wait for more articles like this!

  • http://www.rtraction.com David Millar

    We recently came across an interesting attack and we’ve posted a link to your article, the sample attack, the solution and some suggested tips.

    Please see the article at:
    http://www.rtraction.com/blog/devit/sql-injection-hack-using-cast.html

  • http://www.besthostingtop10.com Craig

    Excellent article, hopefully will push myself to check more thoroughly instead of subconsciously skipping the parts of my site that are a bit sketchy!

  • http://hereinthehive.com Dan Donald

    Great post! There are so many attack vectors that can be used now, it’s great to see more practical measures you can work into your code.

    Keep it up!

  • http://enhance.qd-creative.co.uk James

    A very interesting read… thank you! :)

  • http://www.codingbanter.com Raj

    very nice tutorial. definitely useful read. will try it out. Thanx Ben.

  • http://91media.de Lucas

    definetly a great article… but like two guys said before: it’s not a good idea to store passwords in plain text… I can highly recommend to encrypt them using md5 or base64

    • http://humanbagel.com Human_Bagel

      Base64 won’t help much.
      I reccommend whirlpool or ripemd160 via the hash() function.

      If you are using an old verion of php, use sha1(), its a little better than md5.

      • http://creditorwatch.com.au Dale Hurley

        MD5 can even be reserved engineered. The best thing to do is some sort of one way scramble eg

        function password_scramble($password)
        {
        return md5(str_rev(md5(str_rev($password))));
        }

        while it is a little extra processing, not that much so, it does mean the MD5 reserves (http://tools.benramsey.com/md5/) wont be able to find a match, especially if the user password is 12345678.

  • http://www.neoterik.com.au Matthew Prasinov

    Thanks, some interesting stuff here :)

  • http://www.antalika.com konan

    Areally good post!

    We are using mysql_pdo which automatically protects against all these mysql injections.

  • http://test Manisha

    Thanks for this grate source of information.

  • Pingback: links for 2008-09-08 | iKeif - tech and social media geek, mootools fan, and a ton of links

  • Pingback: The Twenty Most Earth-Shattering Tutorials on NETTUTS! - NETTUTS

  • http://eneza.wordpress.com Eneza

    Just the right post for me………………… SCOUR the net for this!!!
    Thanks

  • http://zootoo.com Jeremy

    As far as safe character matching, you might like some RegEx magic for some of those. Namely, your code, as such, would match the following:

    This is poorly-formed HTML

    It would not match:
    This is well-formed HTML

    The first fails on two points, mismatched tags and is not a valid tag. To correct this, you can use a backreference inside the search itself taking:

    /[^<]*/i

    to:
    /.*/i

  • http://Zootoo.com Jeremy

    Well… um.
    <h1>Poorly-formed.</h7>

    And
    <h3><span>Well-formed.</span><h3>

    The regex changes [0-9] to be [1-6] in the first tag expression, changes the middle to allow anything, and adds a backreference to the number matched in the first tag like this:
    </h\\1>

    Now, let’s see if that worked. If not, I give up.

  • Pingback: Vote for this article at blogengage.com

  • Pingback: Added by a PAL to FAQ PAL

  • http://rogeriopvl.com rogeriopvl

    I liked the article in general, but I can’t agree with you, when you recommend pconnect like it’s the holy grail of scalability.

    pconnect is completely unnecessary in most situations. It brings unnecessary problems (yes, scalability problems too!) and needs a well configured apache server.

  • ford

    Hello!

    I am a hacker. I can get you a yahoo, aol, hotmail,myspace faceobook..etc password. Once I have the password, I will show you proof I have it. I will take snapshots of the account or even message you from the account. I do charge a fee per password though.

    Are you interested? Email me at Fordf202006@yahoo.com

    • balls

      Haha you’re an absolute joke

    • http://creditorwatch.com.au Dale Hurley

      Net.Tutsplus – is this not just pure SPAM???

  • Sean J

    Wonderful article.

    I completely agree with exposing more information regarding security in the public. Your explanations and screenshots are great. However I think its time for you to update it as well. Many readers that are either green or lazy will strictly take your word for it as explained and launch sites with only these elements. You allude to even more information by saying that you aren’t teaching the respective technologies, but perhaps you can add another slide saying what would be necessary on the server side as well. Albeit this includes far better recommendations than many other tutorials or scripts that are out.

    Keep it up you bring serious value to the NetTuts site with this type of information.

  • http://www.watchfamilyguy.us harry

    I really like this tut , i think at the end it should say and restart just because you can never been full proof

  • Tom M

    You really shouldn’t advocate the use of mysql_pconnect.

    Here’s a quote from various users of php.net:
    “* When you lock a table, normally it is unlocked when the connection closes, but since persistent connections do not close, any tables you accidentally leave locked will remain locked, and the only way to unlock them is to wait for the connection to timeout or kill the process. The same locking problem occurs with transactions. (See comments below on 23-Apr-2002 & 12-Jul-2003)

    * Normally temporary tables are dropped when the connection closes, but since persistent connections do not close, temporary tables aren’t so temporary. If you do not explicitly drop temporary tables when you are done, that table will already exist for a new client reusing the same connection. The same problem occurs with setting session variables. (See comments below on 19-Nov-2004 & 07-Aug-2006)

    * If PHP and MySQL are on the same server or local network, the connection time may be negligible, in which case there is no advantage to persistent connections.

    * Apache does not work well with persistent connections. When it receives a request from a new client, instead of using one of the available children which already has a persistent connection open, it tends to spawn a new child, which must then open a new database connection. This causes excess processes which are just sleeping, wasting resources, and causing errors when you reach your maximum connections, plus it defeats any benefit of persistent connections. (See comments below on 03-Feb-2004, and the footnote at http://devzone.zend.com/node/view/id/686#fn1)”

    Yes there are developers that litter their code with mysql_connects, and I agree that this is bad practice. However the solution is not to use mysql_pconnect, the mysql_connect link should be in a good position to be used across the entire code.

  • http://codenchips.com Dean

    Great article, some points I wasn’t aware of and will try to implement on my apps!

  • http://www.illuminatikarate.com George

    Nicely written tutorial, great visuals. One thing –

    In the replaceBadTags function you’ve named your flag variable $clean.

    IMHO, this is ambiguous – I would expect $clean === true to mean the tag is already clean (i.e. ‘safe’), whereas by $clean === true you mean that cleaning is required. To avoid confusion, I would instead use $tagIsSafe defaulted to false, set it to true if it matches a whitelist tag, and encode it if $tagIsSafe !== true.

  • http://www.3fay.com 3faycom

    very nice tutorial. definitely useful read. will try it out.

  • jay

    great article! Thanks a lot :)

  • Pingback: Can You Hack Your Own Site? A Look at Some Essential Security Considerations - Nettuts+ « DevEzine

  • http://humanbagel.com Human_Bagel

    Wonderful, but your safe html function is far from safe.
    This function leaves open large numbers of XSS injections.

    For example, the img tag can look like <img src=”javascript:alert(‘xss’)”>

    Also, an onload attribute can be a problem: <img src=”image.jpg” onload=”alert(‘xss’)”>

    The link has the same problem: <a href=”javascript:alert(‘xss’)”>CLCK ME!</a>

    or <a href=”#” onmouseover=”alert(‘xss’)”>CLICK ME!</a>

    Sadly, client side XSS is a very difficult problem.

    I made a function to fix this here, http://humanbagel.com/opencode.php
    Under XSS Protection.

    It is not exactly perfect, as no XSS filter can be, but it’s a pretty nice fix, and I plan to upgrade it to a white list at some point.

    Right now, it allows all “non dangerous” HTML and any “non dangerous” strings, so its fairly secure, in that, I have not yet been able to XSS it myself.

  • Pingback: Resources & websites you should know about part 1 | CMS tutorial site

  • http://na Jonli

    Good Article.I like it.
    Cheers

  • Pingback: tutorials — B.B.Log

  • http://myfacefriends.com Myfacefriends

    Thanks for this wonderful tuts!

  • http://www.dev-hq.co.uk Joe

    Nice Security Tips! :P

  • mohammed yaghi

    really nice tots

  • http://www.split5.com SplitFive

    Very nice tutorial. I loved it.

  • felipe

    What happen with the source??? I cant download it :S

  • Pingback: BlackTrack : SQL Injection – hacking

  • ford

    Hi. EMAIL ME!!! I’m hacker. Can get you a myspace, facebook,yahoo,msn/hotmail, gmail, aol…etc password. I do charge money though,but will show proof i have it. Are you interested? please email me at fordtrucks90@live.com (I was formly known as fordf202006 on yahoo but my account was deleted)

    **I CANNOT RECOVER A LOST/STOELN/FORGOTTEN PASSWORD**

  • http://www.bestpetsuppliesguide.com/ Alex

    Wow, the sql injection is a very nice trick, I should try this too…thank you guys

  • http://in-my-cloud.com Jas

    Great article as I think more people need to be aware of these things.

    I would like to suggest an alternative method in dealing with SQL filtering and sanitation. The use of MySQL stored procedures, PDO and PHP’s filter_var() and sprintf() provide for stricter control of dynamic SQL search, update and insert queries.

    First a simple stored procedure along with a simple table structure:

    DROP TABLE IF EXISTS `resources`;
    CREATE TABLE IF NOT EXISTS `resources` (
    `id` int(255) NOT NULL AUTO_INCREMENT,
    `resource` varchar(128) NOT NULL,
    `name` varchar(128) NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `resource` (`resource`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=0;

    Next create a stored procedure to perform lookups. The primary reasons for this is to hand off the meat and potatoes of the searching from the application layer over to the database abstraction layer, and to use strict definitions of procedure arguments.

    DELIMITER $$

    DROP PROCEDURE IF EXISTS AddResource$$
    CREATE PROCEDURE AddResource(IN `id` INT(255), IN `obj` VARCHAR(128), IN `common_name` VARCHAR(128))
    COMMENT ‘Add resource object’
    BEGIN
    INSERT INTO `resources` (`id`, `resource`, `name`) VALUES (id, obj, common_name) ON DUPLICATE KEY UPDATE `resource`=obj, `name`=common_name;
    END$$

    DELIMITER ;

    Stored procedures work just like a function would within your application layer, the syntax is a little different is all. As you can see in the code above a new stored procedure is created which accepts three arguments and simply creates a new database record.

    Next we need to establish the connection. This example uses the PDO functionality available with PHP5 MySQL-PDO extension.

    The first section I am illustrating here are re-usable functions which if needed could be placed within a class for further extensibility.

    function establish($configuration)
    {
    try {
    $dbconn = new PDO(createdsn($configuration));
    } catch(PDOException $e) {
    return false;
    }
    return $dbconn;
    }

    private function createdsn($config)
    {
    return ‘mysql:host=’.$config['hostname'].’;dbname=’.$config['database'];
    }

    public function query($db, $query)
    {
    $query = $db->prepare($query);
    try {
    $query->execute();
    return $query->fetchAll(PDO::FETCH_ASSOC);
    } catch(PDOException $e) {
    return $e->getMessage();
    }
    }

    Three functions, establish(), createdsn() and query(). Now here is an example of using these three function to format the necessary DSN information for the connection, create a connection, then use the sprintf() & filter_var() functions to formulate a valid and safe string.

    $db = establish(array(‘hostname’=>$hostname, ‘username’=>$username, ‘password’=>$password, ‘database’=>$database));

    $sql = sprintf(‘CALL AddResource(“%d”, “%s”, “%s”)’,
    filter_var($_POST['id'], FILTER_SANITIZE_NUMBER_INT),
    filter_var(sha1($_POST['name']), FILTER_SANITIZE_MAGIC_QUOTES),
    filter_var($_POST['name'], FILTER_SANITIZE_MAGIC_QUOTES));
    $a = query($db, $sql);

    That’s it. Might be a bit extreme to some but unknowns are zero-day on the black market.

  • http://mokshasolutions.com Moksha

    it was great to read it. thanks

  • http://butenas.com Ignas

    I must say – thanks for this article. Skimmed now, but will read everything in detail later, because I think it’s very useful stuff. Thanks!

  • http://techbrij.com Brij

    Nice Post!!!
    The best way to stop SQL Injection in PHP, I believe to use prepared statements and It’s very easy using PDO.

  • Pingback: Can You Hack Your Own Site? – A Classic from 2008 | (tiny) Planet WordPress

  • http://www.amazing-themes.com David

    Looks like a very useful tut, wanted to learn more more about SQL injection.

  • http://gruztec.ru/ Gruz

    very nice tutorial useful read,thanks

  • http://citroenboom.com Citroenboom

    Aren’t security issues among the reasons why we tend to use frameworks (CakePHP, CodeIgniter) or full fledged CMS’s (Drupal, WordPress) when possible?

    • http://creditorwatch.com.au Dale Hurley

      1. Not all projects need a complex framework or CMS. I have build some quick and simple web apps which a framework would add to much un-needed complexity

      2. We all need to be aware of the security holes. CMS and frameworks provide us with starting points and sometimes the in-built solutions fail to meet our needs. I have seen plug-ins for WordPress which completely ignore the $wpdb object.

      • http://createmy.com.au Dale Hurley

        built not build*

  • http://thecybertramp.com Cybertramp

    Good article, nice set of reminders for people who are already familiar (and I see the nitpicking started already …) and a gentle introduction for beginners.

    Thanks

  • http://www.artgrafi.com Mehmet

    Thanks for this great tutorial..