Using AWS S3 to Power Your Digital World

Using AWS S3 to Power Your Digital World

Tutorial Details
  • Topic: Amazon Web Service S3
  • Difficulty: Intermediate
  • Estimated Completion Time: 30 mintues

As a designer, web developer, and techie-geek, I need a versatile and robust data storage solution that I can afford, but also use without learning some new language. So far, I’ve only found one service that can handle the large majority of my needs. This article covers how I use the Amazon Web Services Simple Storage Service (AWS S3) to meet most of my needs.


AWS S3

AWS S3 is Amazon’s cloud storage solution. It’s versatile, reliable, fast, and scalable to fit almost anyone’s needs. Of course with a service that sounds this great you would expect it to be expensive but it’s actually the most affordable storage solution I’ve found on the web, considering the features you get.

Amazon Web Services S3

AWS S3 is intended for developers, but thanks to some great tools, it’s easy enough for just about anyone to use. Before I get into how I use AWS S3, I want to mention that this storage solution doesn’t use the traditional file structure of folders/files, etc. Instead AWS S3 uses “buckets” in which you store objects. The tools I use make AWS S3 appear to be a normal file system with the exception of “buckets”. Think of a bucket as a separate hard drive where you’ll store your files. You might also want to read the Amazon S3 page on Wikipedia. So let’s get on with how I use AWS S3.


AWS S3 + Jungle Disk

I probably use Jungle Disk the most often because it makes it easy to use and manage my AWS S3 buckets, perform automated backups and centralize my data for access anywhere, at any time. When you use Jungle Disk with your AWS S3 account, you decide which of your individual buckets Jungle Disk can mount as a network drive. Then, you have drag-and-drop access to your AWS S3 files! Jungle Disk also encrypts your files, so they’re safe and secure.

Jungle Disk

Jungle Disk has plenty of options for bucket management, automatic backups, encryption, bandwidth limiting, and even more. It also has a monitoring tool to view and manage transfers in progress. It typically runs in the background, but it comes in very handy when you would like to take action on something or just watch what’s going on.

Jungle Disk settings

If you’re worried about cross-platform compatibility, don’t be! Jungle Disk has versions of their software for 32- and 64-bit Windows, Linux and Mac. They even have a version that you can run from a USB flash drive on all three platforms for quick access to your files from anywhere.

Jungle Disk download

Of course, if you forget your flash drive, they also have web access to your files. If you work with other people who need access to your files, Jungle Disk can do that, too. They have multi-user options to make accessing AWS S3 buckets very easy for several people.

Jungle Disk users

So, we have cross-platform cloud storage that’s drag-and-drop easy and that we can access anywhere with tons of great options. What else do we need?


AWS S3 as a “CDN” or Public File Access

Most of you probably have blogs or websites that you have hosted on a web server you pay for. As we all know, quality web hosting isn’t cheap, especially when it comes to storage space. I don’t want to use my expensive web server storage for images and other file downloads and I especially don’t want to bog down my web server with file requests from visitors when there’s a better way to do it.

S3Fox for Firefox

S3Fox is a Firefox addon that lets you manage your AWS S3 buckets and files. Why do we need S3Fox when we could use Jungle Disk? S3Fox does a few things Jungle Disk wasn’t intended for, such as managing CloudFront distributions which we’ll get into later. I’ve setup a bucket called “files.jremick.com” which I plan on using to host images and files for my blog as well as other websites and other random purposes.

S3Fox

Then I setup a CNAME on my web server directing “files” and “www.files” to “files.jremick.com.s3.amazonaws.com.” which will then allow me to use the subdomain “http://files.jremick.com” to access files I’ve placed in the “files.jremick.com” bucket for public viewing. The other two are used by CloudFront which we’ll get into later.

S3 Cname

So now we have an easy way to access files at http://files.jremick.com. We could use it as a sort of “CDN” (even though it wouldn’t be a true CDN) or we could just use it to provide file downloads that won’t bog down our web server. If you’re wondering, yes, you can view and download the panorama image from my S3 account and no, I’m not worried about bandwidth because it’s super cheap! :-) You can find it here: http://files.jremick.com/red-rock-panorama.jpg. Did you notice the “wp-content” directory? Familiar eh? On to using AWS S3 with WordPress!

S3Fox files

AWS S3 plugin for WordPress

The AWS S3 plugin for WordPress is one of my favorite plugins for WordPress because it lets me use my AWS S3 account to host media for my blog rather than my expensive web server. Of course I could do this manually if I wanted but the plugin integrates this functionality with WordPress so I can upload files without leaving my WordPress control panel.

AWS S3 and WordPress

You might be wondering why this is beneficial. Well, for starters, images and other media loaded from your AWS S3 account will likely load faster simply because you’re using Amazon’s servers rather than your own (possibly puny) server. Also, your web server won’t be bogged down loading these media files and your regular PHP/HTML files.

Your website will also load faster for most people because in most browsers you are limited to the number of parallel downloads from a single domain. If you’re hosting your images on your AWS S3 account which will be from a secondary domain then browsers will be able to load more files at the same time. See Maximizing Parallel Downloads in the Carpool Lane for more information.


AWS S3 + CloudFront

OK, so I’ve covered how I use AWS S3 for networked storage as well as for my websites and reducing the load on my web server. If you run a high traffic website (which I don’t) or you’re just a nerd (like me) and want things to run as fast as possible then you’ll want to check out Amazon CloudFront as well.

Amazon CloudFront

Earlier in the article I put “AWS S3 as a ‘CDN’ or Public File Access” with CDN in quotes. The reason I did that is because AWS S3 is NOT a true CDN. A CDN is a Content Delivery Network that delivers your files from a distribution of servers around the world. Visitors get access to your files from the fastest resource available (usually the closest server). AWS S3 only has a few data centers around the world and your data will most likely be in one location making it far from a CDN.

If you want the best speed for visitors across the globe, you’ll want to use a real CDN like CloudFront. Thankfully Amazon has made it super easy to use these services together. I’ve already signed up for CloudFront and now I just need to configure it using S3Fox.

CloudFront distribution

Simply right click on the bucket you want distributed to Amazon’s CloudFront and click “Manage Distributions”. From here you can configure your CloudFront distribution. You’ll be assigned a unique domain for the distribution; “d1i7xb2p8w9276.cloudfront.net” is what this distribution has been assigned.

I’ve also used “cdn.jremick.com” as the CNAME for this distribution so I can access the files at http://cdn.jremick.com. You’ll see the status as “InProgress” until the distribution has been deployed and the status will change to “Deployed”.

CloudFront distribution

Then I setup the CNAME on my web server.

CloudFront CNAME

Now when I request files at http://cdn.jremick.com they will be requested from the CloudFront servers which will pull the files from your AWS S3 account and cache them for all subsequent requests.

There are some disadvantages to CloudFront (and other true CDNs) though. Once a file has been cached on the CloudFront servers, it won’t be requested from your AWS S3 account again. That means you’ll need to version your files (filename_v1.css, filename_v2.css, etc.) so they’ll actually reflect the changes for your users. It’s a great service but it really is intended more for high traffic purposes. In most situations for average people with blogs, AWS S3 will do just fine. I will be using CloudFront to host JavaScript, CSS and other static files though, just because I’m a nerd and I want performance! :-)


AWS S3 + S3Sync = Automated Offsite Server Backups

I’m a worry wart when it comes to losing data. My web server hosts around 20 accounts for other people and it’s very important to make sure all that data is backed up, safe and secure. That’s where S3Sync comes in. I can use it to automatically backup my web server to a specified AWS S3 bucket.

Here, I’ve jumped into Transmit (FTP for Mac with AWS S3 support) and logged into my AWS S3 account. I’m looking at my “servintbackups” bucket which shows the different backup folders. Each night the backups are updated automatically on my AWS S3 account.

servintbackups

If you would like to do this as well check out these tutorials.


Conclusion

Using AWS S3 and a variety of tools I’ve managed to get a lot for a little.

  • Centralized file access in the cloud, anywhere, on any platform.
  • Automated backups for desktop and server computers.
  • Web access to your files.
  • Media hosting outside of your web server to reduce load and speed things up.
  • Easy to setup “CDN” and/or providing file access for users.
  • Easy to setup true CDN with CloudFront.

As I said earlier as well, AWS S3 is built for developers; so if I do need to use it for even more solutions, then the opportunity is there.

As great as AWS S3 is, it may not fit the bill for every problem you have. For instance, AWS S3 servers don’t gzip files and backing up 200GB of data (like an iTunes library) would cost $30 per month vs. $5 or $10 per month on other services. AWS S3 is just one of the tools I use among many.

Do you use AWS S3? Or do you prefer another similar solution? Tell us about it in the comment!

This article was originally posted on the ThemeForest blog. We are currently porting over some of the more popular articles to Nettuts+.

Tags: CMS
Note: Want to add some source code? Type <pre><code> before it and </code></pre> after it. Find out more
  • http://laroouse.com esranull

    very usefull post thanks a lot

  • http://loneplacebo.com Tony

    Jeeeeez! Net Tuts Plus must be spying on me! I just set up my Amazon AWS account yesterday and was trying to figure things out. Keep up the great mind-reading skills!

    • http://www.jeffrey-way.com Jeffrey Way

      We don’t like to call it spying….but yes.

    • http://creditorwatch.com.au Dale Hurley

      I think they were reading my thoughts. On Friday I was searching for an article just like this.

  • http://www.charlestreece.com/ Charles

    Interesting post, will give it a try. Thanks

  • http://www.diigital.com Mike

    “As great as AWS S3 is, it may not fit the bill for every problem you have. For instance, AWS S3 servers don’t gzip files “

    For JS and CSS files it’s probably worth gzipping them yourself before uploading, then setting a Content-Encoding: gzip header on S3 so that the files work. You could also set a Cache-Control: max-age header so they are cached client side too.

    For CloudFront, it’s probably worth considering how much traffic your files will get. Because the CF servers don’t permanently store the files (they only have room for the most often/recently requested and purge older files) they get re-fetched from S3 when a given edge location no longer has the file. So if your site doesn’t get enough traffic to keep those files on each of the CF locations there will probably be a delay for users as CF must get the files from S3 before it can serve them to the users. Requesting direct from S3 is probably faster in this case.

  • Jeroen

    Nice article! I was wondering, sometimes you see websites, where you bought something to download, give you a ticket that only u can use (obvouisly). The ticket leads you to a s3 site where you download the actuall stuff you bought.

    In other words, how can I ne selective in what people download from my s3?

    Thanks!

  • http://www.electrictoolbox.com/ Chris Hope

    This post couldn’t have come at a better time for me! I’ve got 35GB of photos and videos I was about to move from my old backup solution (rsync to server online) to Amazon.

  • http://www.aquadonis.ch Nicolas

    S3 Hub is a pretty good client for the Mac if anyone is looking for one. http://s3hub.com/ But of course today most of the FTP clients on the Mac support S3 as well. Using it together with WP is becoming popular to take of loading time from the own server and speed up the overall site. Not sure if it makes a big difference though for the average user but it’s certainly a great way to make a backup of your data.

  • http://www.siteoptimo.com/blog Pieter

    I’m doubting about Amazon S3 about:
    -Speed: is it faster than a local host/IP? If you’re targetting a very local market in Europe (say Belgium/Holland), I’d think it’s faster to load not trough Ireland but trough a local server.

    -Flexibility: is it still as easy to get away from it (ie when it’s not the wanted effect/cost)? I see a lot of changes being made (CNames, permanent links in the codes,…).

  • Keith McLaughlin

    Are you sure you mentioned “AWS S3″ enough times in the article? :P Good article none the less. Thanks.

  • http://www.ferdychristant.com Ferdy

    Good article, I would like to add the following:

    - Be sure to consider the use of headers for expiration and other purposes. This can drastically reduce your bill (you pay per 1,000 requests made).

    - Please note that S3 currently has no bill cap controls, meaning that if you put a public Amazon S3 URL out there, somebody could launch a DoS on it and bankrupt you. One way around this is to use temp URLs, check the S3 forum.

  • Jenn

    awesome article. I believe that cyberduck ftp can access that s well (which I have been wanting to try).

  • http://www.damnsemicolon.com Skye

    Good post. I was just looking into S3+CloudFront. Anyone have any comments on it vs LimeLight?

  • http://itspice.net vijay

    Awesome tutorial ! Thanks for sharing ! I think the plugins for FF and WP are the gems and can ease your effort.

  • http://daltonrooney.com/wordpress/ Dalton

    W3TotalCache is a very nice WordPress caching module which has full support for AWS S3 & Cloudfront. It can automatically upload your theme files and other assets to S3, (gzipping them in the process) and change all of your asset URLs to point to the files on the CDN. This is in addition to all of the other amazing things it can do. A must have if you’re using S3 and WordPress.

    • http://brianegan.com Brian Egan

      Too true, W3TotalCache is THE coolest effing thing on earth for WordPress Performance.

  • http://www.webmaster-source.com redwall_hp

    I use S3 to automatically backup my VPS. It works great, and it’s very cheap. I pay under $1 each month to store all of my files and MySQL dumps. If I ever get back into podcasting, I will probably use S3/CloudFront to distribute them.

    • http://blog.karo.or.id tommy

      how much space and bandwith you use every month :) im interesting using s3 to backup my server.

      • http://www.webmaster-source.com redwall_hp

        Not sure about the precise numbers, but I’m backing up a few gigabytes per day.

  • marcogrich

    AWS S3 is great service but it is not true CDN.

  • http://www.densepixel.com densePIXEL

    Great article, exactly what I have been looking for. Planning on signing up for the new Free tier and start integrating it into some of my projects. Theres even a S3 Codeigniter library so hopefully it should be pretty easy to get started.

  • Antony de Navarro

    Hi great article. I was looking for a little advice, I find amazon’s pricing slightly confusing and wanted to see whether my calculations were roughly correct from people who have used the service.
    Particularly from web designers who may have thousands of small files (php, html etc.)

    Hypothetical – I have 160,000+ files for my clients and 40+ GB of client data and just want to back it up, not used as a psedo-CDN.

    So First month/ Initial backup
    $1.6 for 160,000 PUT requests ($0.01 per 1,000)
    $5.60 for 40GB storage ($0.14 per GB)
    $4 for 40GB transfer in ($0.10 per GB)
    Total – $11.20

    Subsequent months where fewer files change lets say 1,000 files and 100MB change (using Cloud Berry or Jungle Disk)
    $0.10 for 1,000 PUT requests
    $5.60 for 40GB storage ($0.14 per GB)
    $0.10 for 100MB transfer in ($0.10 per GB)
    Total – $5.70

    Are these figures approximately accurate?
    Am I missing any hidden charges? (eg: requesting whether files are synced)

    Does anyone have experience with BackBlaze or Mozy?
    Many thanks to anyone who has read this far ;-)

  • http://www.fzilla.com Angel Grablev

    And soon you will be able to do all those apps do much easier and all web based with fzilla.com!

  • http://cloud.blaisdell2.com/ Ron

    With the new “default object support” you can now set a default file to be displayed if someone decides to try and call your CF site directly. Makes it nice to redirect them back to the main site, instead of seeing the CF XML response.

    BrowserMob is a great way to test the load of the CDN and see how it is working worldwide — and as it has already been say W3 Total Cache is a must have for folks who want to really improve the delivery times of their WordPress sites.

  • http://findingwhy.com/ Joel D Canfield

    The S3 WordPress plugin has at least two fatal flaws:

    1. Deleting files from WordPress does not delete them from S3. Maintaining two separate systems is fraught with peril.
    2. No differentiation between blogs; if I have 24 different blogs (yes, mine and my clients’) then it all dumps into the same folder path. True, not visible on the WordPress side, but in S3, it’s a mess.

    And I was so hopeful.

  • cedonio

    I have a config file in my web application that defines an upload dir within my web server, i.e.
    ‘upload_dir’ => ‘upload/’

    I want to be able to use S3 as the upload/ dir for my application. How do I go about using it as a mounted drive? Any direction will be greatly appreciated.