Caching, YQL, and Regular Expressions
plusvideos

Caching, YQL, and Regular Expressions

Tutorial Details
Download Source Files

In today’s tutorial, we’re going to mix a handful of technologies. First, we’ll review how to implement a simple form of text file caching with PHP. To illustrate this technique, we’ll use the wonderful YQL to query Twitter’s search API for a list of tweets which contain the string, “nettuts.” Finally, we’ll experiment with PHP’s regular expression capabilities, and will turn all Twitter usernames and urls into clickable links.

Prefer a video version of this tutorial? Become a Premium member.


Step 1: YQL

Before we jump into our caching exercise, we need some data to work with. The excellent YQL platform makes the process of fetching data from a variety of APIs a cinch.

Think of YQL as an API for APIs.

For this demo, we’re going to query Twitter’s search API. While a Twitter table isn’t natively integrated into YQL, they also offer Community Tables, which anyone can submit to. As such, there’s a variety of tables for practically any social networking site you can think of.

Twitter Search

*The table we require is called “twitter.search.”*

When you click on that community table, it’ll automatically generate a sample query for you, which will look similar to: select * from twitter.search where q='volcano. To rewrite this query, and direct the API to return all tweets which instead reference Nettuts+, we only need to replace volcano with nettuts. Isn’t that incredibly awesome?

Yes – YQL rocks.

How to Use YQL with PHP

There are a variety of methods which allow us to query the YQL API with PHP. For this tutorial, we’ll use the ever-helpful file_get_contents.

On Yahoo’s YQL’s Console Page, you’ll find a Rest Query path at the very bottom.

Rest Query

However, that string is static, and is already urlencoded, which makes it rather difficult to work with. Instead, with PHP, let’s manually construct the path.

The first step is to declare our SQL (sort of) query.

$yql = "select * from twitter.search where q='nettuts'";

Next, we setup the path to YQL.

$query = "http://query.yahooapis.com/v1/public/yql?q=";
$query .= urlencode($yql);
$query .= "&format=json&env=store://datatables.org/alltableswithkeys";

All queries which are sent through YQL will begin with the string, http://query.yahooapis.com/v1/public/yql?q=.

Next, we need to append our specific query, which we stored in the variable, $yql; however, we must be sure to urlencode this variable. Finally, because we’re using a community table, we must also append &env=store://datatables.org/alltableswithkeys” to the path.

Get Contents

At this point, the variable, $query, contains our full YQL path. If we pass this variable as a parameter to PHP’s file_get_contents, a string, or object will be returned.

$tweets = file_get_contents($query, true);

To verify this, you can print_r($tweets). If the request was successful, when you view the page in your browser, a huge glob of text will be displayed!

We’ve only one problem at this point: $tweets is a large string, when we need it to be treated as json. Otherwise, we don’t have a way to filter through it. Luckily, we can use json_decode, which will accept a string, and transform it into an object that we can then work with.

$tweets = json_decode($tweets);

Displaying the Tweet

Let’s now display the tweets within an unordered list.

The YQL Console provides a helpful tree-view, which we can use to filter and determine precisely what we want to extract from the feed. Refer here for an example.

We can now decipher that, to filter through all of the specific tweets, we must access $tweets-query->results->results.

<ul>

<?php
foreach($tweets->query->results->results as $item) {
   echo '<li>' . $item->text . '</li>';
}
?>

</ul>

Now that each object in the array is represented, via $item, we can access any of the properties that we require. This isn’t specifically a Twitter tutorial; so, we’ll keep things simple and display only the text.

Tweets Unfiltered

Step 2: Caching

It seems a bit silly to repeatedly query a third-party API with every page refresh. It stands to reason that, if this was on your personal site, it’d probably only need to be updated once every few hours or so.

To remedy this, we can write the returned results from the YQL query to a text file. This is much easier than you might think!

We begin by creating a new folder within the root of our project, called cache. Next, we declare a path to the cached file, regardless of whether or not it’s been created.

$cache = dirname(__FILE__) . '/cache/nettutsTweets.txt';

We can target the path to the directory of the current file by using dirname(__FILE__).

Notice that we haven’t yet created this NettutsTweets.txt file. PHP will do that for us! Just below the point at which we called the file_get_contents, we open that file, and specify writing privileges, represented by w in the second parameter of fopen.

...
$tweets = file_get_contents($query, true);

// cache data
$cachefile = fopen($cache, 'wb');
fwrite($cachefile, $tweets);
fclose($cachefile);

PHP makes the process of working with the file system as simple, and readable, as possible. We’ll go over this code line by line.

Open the $cache file, and provide us with writing privileges. If the file does not exist, PHP will create the file.

$cachefile = fopen($cache, 'wb');

Write the contents of the $tweets variable (the returned object from YQL) to this file.

fwrite($cachefile, $tweets);

We’re done; so close the file.

fclose($cachefile);

Checking the Time

Now, with the code in its current form, we’re still querying YQL will each page load. Not to worry; that’s easily fixed with a quick if statement. Wrap the code that queries YQL in an if statement.

if ( ) {
 $yql = 'select * from twitter.search where q="nettuts"';
 $query.....
}

Before we continue, let’s first figure out exactly when our script should fetch new results from YQL. It should do so when either of the conditions are true:

  1. The “nettutsTweets.txt” file does not exist.
  2. The file is older than, say, three hours. (Feel free to adjust this interval how you wish.)

To accomplish the first check, PHP provides the file_exists function, which accepts a single parameter, which points to the file. In this case, however, we’re determining, not if the file exists, but if it does not exist. Prefixing a bang to the function all will take care of this.

if ( !file_exists($cache) )

The second part is a bit more difficult. How can we determine if the last time that a file was updated was over three hours ago? It sounds confusing, but it’s truthfully not. To do so, we use the filemtime (file make time) function. This function accepts a single parameter, which points to the file. The function will then return the date at which the file was last updated: exactly what we need!

“The filemtime returns the time when the data blocks of a file were being written to, that is, the time when the content of the file was changed.”

To determine whether this returned data is older than the current time, minus three hours, we use time() - 10800. 10800 is equal to the number of seconds in three hours (60x60x3).

if ( filemtime($cache) < ( time() - 10800 ) )

Combining the Two Checks

Rather than writing two if statements, we can combine the two, and save a couple lines of code in the process.

if ( !file_exists($cache) || filemtime($cache) < ( time() - 10800 ) )

Excellent. However, what happens if the file does exist, and the last updated time was only a few moments ago? In that case, we should grab the contents of the local, cached file, rather than querying YQL.

else {
   // We already have local cache. Let's use that instead.
   $tweets = file_get_contents($cache);
}

Here's our final code:

<?php

$cache = dirname(__FILE__) . '/cache/nettutsTweets.txt';

if ( !file_exists($cache) || filemtime($cache) < ( time() - 10800 ) ) {
   $yql = "select * from twitter.search where q='nettuts'";
   $query = "http://query.yahooapis.com/v1/public/yql?q=";
   $query .= urlencode($yql);
   $query .= "&format=json&env=store://datatables.org/alltableswithkeys";

   $tweets = file_get_contents($query, true);

   // cache data
   $cachefile = fopen($cache, 'wb');
   fwrite($cachefile, $tweets);
   fclose($cachefile);
} else {
   // We already have local cache. Let's use that instead.
   $tweets = file_get_contents($cache);
}
$tweets = json_decode($tweets);
?>

Step 3: Regular Expressions

Refer here for an online regular expression tester.

At this point, with minimal effort, we have a fast method that allows us to display tweets on our page. With that said, there's one tiny problem that we should remedy. None of the links or usernames in these tweets are...links! Nobody wants to manually copy a link and paste it into their address bar. To compensate, let's use regular expressions to search and replace.

PHP provides a variety of helpful functions to work with regular expressions. In this particular case, we require the preg_replace function.

preg_replace performs a regular expression search and replace on a passed string.

There are two types of string that we need to convert.

  1. @usernames - These should be converted to twitter.com/username
  2. urls - Urls should be converted to anchor tags.

Generally, to use the preg_replace function, you must pass three parameters:

  1. The regular expression to search for
  2. The string to replace it with
  3. The string that we're actually searching

The preg_replace function optionally accepts two arrays as its first two parameters. This allows you to pass multiple value to the function.

$patterns = array(
   '/(http:\/\/.+?)(\s|$)/i' => '<a href="$1">$1</a>',
      '/@([\w\d_-]+)/' => '<a href="http://twitter.com\/$1">@$1</a>'
);

Yikes!

Regular expressions are easily the most scary aspect of web development, especially when you have little experience with them. How could anyone possibly interpret a huge jumble of code?

As you'll find, it's honestly not too difficult to wrap your head around. Mostly, the process involves memorizing a handful of characters. For instance:

  • . - Any character
  • \w - And alphanumeric character or underscore (word characters)
  • \d - Any digit
  • + - Match one or more of the proceeding characters.
  • \s - Match a space
  • [] - A character class which matches one of any of the characters within. For example, [\w\d] will match a single word character, or a digit. To specify more than one, use the plus sign: [\w\d]+.
  • ^ - The beginning of the string
  • $ - The end of a string
  • () - Capturing group. Wrap parens around the values that you want to capture, for later.

Wrapping Urls in Anchor Tags

Refer back to our $patterns array.

$patterns = array(
   '/(http:\/\/.+?)(\s|$)/i' => '<a href="$1">$1</a>',
      '/@([\w\d_-]+)/' => '<a href="http://twitter.com\/$1">@$1</a>'
);

The first item in the array searches for any string that begins with http://. If it does, we can be fairly certain that it's supposed to be a link. We can designate the end of the string by also searching for either a space (\s), or the end of the string ($).

The value of the array_key is equal to what we want to replace any matches with. In this case, we wrap the value within an anchor tag.

You can references search values wrapped in parens by using $1, $2, etc.

Directing @Usernames to Twitter

We should also wrap @Usernames in anchor tags which link to the user's profile on Twitter. This little diddy will do the trick.

'/@([\w\d_-]+)/' => '<a href="http://twitter.com\/$1">@$1</a>'

To convert this expression to every-day speech:

  1. First search for the @ symbol
  2. Next look for one or more (+) of either a letter, number, underscore, or a dash. We'll wrap this match in parens, so that we can later reference the value with $1.
  3. Replace any matches with an anchor tag, which links to Twitter's website.

Array Keys and Array Values

Now that we're ready to call the preg_replace function, how can we separate the array keys from the array values? We do so with the appropriately named, array_keys and array_values functions.

  • array_keys : Creates a new array that contains only the keys of the passed array.
  • array_values : Creates a new array that contains only the values of the passed array.

This is precisely what we need!

$item->text = preg_replace(array_keys($patterns), array_values($patterns), $item->text);

And with that, the final code for the foreach statement:

<?php
foreach($tweets->query->results->results as $item) {
   $patterns = array(
      '/(http:\/\/.+?)(\s|$)/i' => '<a href="$1">$1</a>',
      '/@([\w\d_-]+)/' => '<a href="http://twitter.com\/$1">@$1</a>'
   );

   $item->text = preg_replace(array_keys($patterns), array_values($patterns), $item->text);

   echo '<li>' . $item->text . '</li>';
}
?>

View your updated script in the browser, and, now, all links and usernames should be wrapped within anchor tags, as expected.

Tweets Linkified

One thing worth noting is the fact that we've retrieved a large Twitter feed, via YQL, though we've only made use of the tweet's text. In these cases, you should modify your YQL query to select only the desired properties. This will lower the file size substantially.


Conclusion

Okie dokie; that'll do it for today! We reviewed how to use the YQL API, how to implement simple text file caching with PHP, and how to work with regular expressions.

I hope you enjoyed this tutorial, and, if it was a bit over your head, don't forget that Premium members have access to an exclusive video version of this tutorial, if you're more of a visual learner!

Note: Want to add some source code? Type <pre><code> before it and </code></pre> after it. Find out more
  • http://bit.ly/cLZXGi Julian

    Looking good there! Keep up the great content Nettuts!

  • http://laroouse.com esranull

    wooow ver ygood post thanks a lot

  • http://blog.cmff.de Funkmaster Flow

    Nice one.
    Maybe I’ll try it for one of my next projects.

    Why do you use “file_get_contents()” instead of using curl?

    • http://www.jeffrey-way.com Jeffrey Way
      Author

      No specific reason. It’s just a very fast method. :)

  • joomlachamp

    Thank you Tuts+!!!

  • http://parenting.pl/ Marcin

    I watched the video version – excellent tutorial – good pace and easy to follow, and quite useful at the same time. I’ve promised myself on several occasions to learn more about YQL – looks like a good opportunity.

    One comment though – it’s rather confusing when you make like a 30-second pause near the end of the video and one can only hear you breathing in the background :-). It was just the cursor blinking in the video that reassured me that the video was actually still playing :-).

    • http://www.jeffrey-way.com Jeffrey Way
      Author

      Hey Marcin – Hmm – I usually edit out any pauses (especially if it’s 30s). I’m rewatching the end right now to see if I missed a clip. Haven’t found it yet. :) When I find it, I’ll update the video.

      • http://net.tutsplus.com Jeffrey Way

        That’s odd – just rewatched the second half, and didn’t find a 30 second pause.

  • http://www.how-to-asp.net Ryan

    Great tut, thanks!

  • http://www.satya-weblog.com Satya Prakash

    Good guide for a free plugin creation

  • http://sql-plsql.blogspot.com Sachin

    awesome, never seen & very useful content on YQL & PHP…thanks

  • alwaro

    You are always surprising me with interesting articles!!! good work!!.. thanks

  • http://www.tutorialepc.info/ tutoriale calculator

    I never knew you could do such great things with twitter. Actually i didn’t knew YQL :)
    Thanks!

  • Hirvine

    Good tutor. I wanted to add a comment about custom delimiters.
    You are using / as delimiter in your regular expression. If you were to use the / in your pattern, I would recommend to use a delimiter you don’t use. That would prevent escaping and thus makes your pattern more readable.

    Use for instance ` as your delimiter. So the character / no longer requires escaping
    ‘`(http://.+?)(\s|$)`i’ => ‘$1

    Instead of
    ‘/(http:\/\/.+?)(\s|$)/i’ => ‘$1

  • Naresh

    Awesome Tutorial! You are the best when explaining new technologies..

  • http://funkall.com Dale Hurley

    I really like tuts like this one in which you use multiple technologies and present a real world example. As said before YQL looks so promising and I cant wait to get a free weekend to play with it.

  • Jaime

    Why not use Memcached or APC or something like that for caching?

    Anyway, nice post

  • http://joeyreed.com Joey Reed

    Until now I thought caching was half Voodoo. Thanks for making it so easy to understand!

  • DJK

    The “YQL’s Console Page” link seems to revert to the image that’s below it, not sure if this is intentional though.

    Nice tutorial Jeffrey, YQL seems like an amazing tool to use, just hope I find the time/reason to do so.

    Thanks.