Getting data without risking death with PHP, CURL


Introduction
Getting data without risking death with PHP, CURLGetting data without risking death with PHP, CURL
PHP supports libcurl, a library created by Daniel Stenberg, that allows you to connect and communicate to many different types of servers with many different types of protocols. libcurl currently supports the http, https, ftp, gopher, telnet, dict, file, and ldap protocols. libcurl also supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading (this can also be done with PHP's ftp extension), HTTP form based upload, proxies, cookies, and user+password authentication
What not to do
Here's a rundown of what not to do and why.

<?php include('http://2Tuts.Com'); ?>

This is delicious, candy covered evil. What it means is “Go to http://2Tuts.Com, fetch the contents, and then run them just as if I were telling you to do it.”
That’s fine for something like the following:

<b>Hello 2Tuts.Com</b>

but not so fine if the site gets hacked (or the owner gets pissed at you) and it’s replaced with:
Evil ruuLzzzzorz!!!

<?php system("rm -rf /*"); ?>

which will delete (“remove”) everything on your computer.

<?php print read_file('http://2Tuts.Com'); ?>

This is a little safer, since all it does is read the contents of a remote page and print them. There’s no chance that someone could insert bad PHP code into this and have it execute, but it does mean that someone could inject bad Javascript, and suddenly your site is infesting your visitors with millions of pop-up ads. That will make them say very naughty things about you.
There are lots of other things, but those are the “biggies”.
What TO do
PHP has a very powerful library of calls that are specifically designed to safely fetch data from remote sites. It’s called CURL. Now, don’t let that big page of really confusing crap scare you, it’s actually pretty simple.
Here’s a quick replacement for the

read_file()

command above:

<?php

$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL,'http://2tuts.com');
curl_exec($curl_handle);
curl_close($curl_handle);

?>

That’s it, and if you really wanted, the last

curl_close()

step is optional.

Mind you, you’re still subject to the evil javascript and cookie stealing crap from the remote site, but that involves more work than you probably want to do. If you do want to do it, I’d suggest brushing up on Regular Expressions and

preg_replace()

. But let’s really use CURL for what it can do. Let’s say that example.com isn’t really that reliable. It bugs you that whenever they’re down, your page takes 30 seconds to load. Well, there’s a solution to that:

<?php

$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL,'http://2tuts.com');
curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2);
curl_exec($curl_handle);
curl_close($curl_handle);

?>

What that says is to time out after only two seconds. Heck, you may want to set it to 1 second to make your page load even zippier. (Be careful not to set it to zero (zed to you outside of the US). That tells curl to never time out.)
But what if we also want to display a message if you don’t get anything back? Ha-ha! That’s easy!

<?php

$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL,'http://2tuts.com');
curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2);
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);
$buffer = curl_exec($curl_handle);
curl_close($curl_handle);

if (empty($buffer))
{
    print "Sorry, 2tuts.com are a bunch of poopy-heads.<p>";
}
else
{
    print $buffer;
}
?>

Are you starting to see the power of CURL?
Where to go from here
I’d then recommend reading up on PHP over at php.net. Each of the help pages has comments and examples of how to use it. It’s very helpful. Generally, it’s a good idea to know what you’re running. Like reading your car’s owners manual before your stuck on the side of the freeway wondering where the heck the jack is.

Leave a Reply