load() Function for PHP - Fetch URL Content

I recently had to develop a small script that will fetch an XML file from the web. All I had to do is download a given URL and read its contents. To my great surprise I found that download the file using my jx Ajax library was much easier than doing it with PHP.

PHP make this very easy by including functions like file_get_contents() that has URL support. This code will get you the contents of an URL.

$contents = file_get_contents('http://example.com/rss.xml');

Unfortunately, this is a huge security threat - and many servers have disabled this feature in PHP. Also this is not the most optimized method to fetch an URL. Also, it is impossible to submit data using the POST method using this function.

Other Options - curl and fsockopen

PHP provide other two method to fetch an URL - Curl and Fsockopen. But to use this I have to write a lot more code.

load()

So I decided to create my own function that makes it much more easier.

Features

Options

The first argument of this function is the URL to be fetched. The second argument is an associative array. This is an optional argument. The following values are supported in this array.

return_info
Possible values - true/false
If this is true, the function will return an associative array rather than just a string. The array will contain 3 elements...
headers
An associative array containing all the headers returned by the server.
body
A string - the contents of the URL.
info
Some information about the fetch. This is the result returned by the 'curl_getinfo()' function. Supported only with Curl.
method
Possible Values - post/get
Specifies the method to be used.
modified_since
If this option is set, the 'If-Modified-Since' header will be used. This will make sure that the URL will be fetched only it was modified.

Examples

The code to fetch the contents of an URL will look like this...

$contents = load('http://example.com/rss.xml');

Simple, no? This will just return the contents of the URL. If you need to do more complex stuff, just use the second argument to pass more options...

$options = array(
	'return_info'	=> true,
	'method'		=> 'post'
);
$result = load('http://www.bin-co.com/rss.xml.php?section=2',$options);
print_r($result);

The output will be like this...

Array
(
    [headers] => Array
        (
            [Date] => Mon, 18 Jun 2007 13:56:22 GMT
            [Server] => Apache/2.0.54 (Unix) PHP/4.4.7 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_fastcgi/2.4.2 DAV/2 SVN/1.4.2
            [X-Powered-By] => PHP/5.2.2
            [Expires] => Thu, 19 Nov 1981 08:52:00 GMT
            [Cache-Control] => no-store, no-cache, must-revalidate, post-check=0, pre-check=0
            [Pragma] => no-cache
            [Set-Cookie] => PHPSESSID=85g9n1i320ao08kp5tmmneohm1; path=/
            [Last-Modified] => Tue, 30 Nov 1999 00:00:00 GMT
            [Vary] => Accept-Encoding
            [Transfer-Encoding] => chunked
            [Content-Type] => text/xml
        )
	[body] => ... Contents of the Page ...
	[info] => Array
        (
            [url] => http://www.bin-co.com/rss.xml.php?section=2
            [content_type] => text/xml
            [http_code] => 200
            [header_size] => 501
            [request_size] => 146
            [filetime] => -1
            [ssl_verify_result] => 0
            [redirect_count] => 0
            [total_time] => 1.113792
            [namelookup_time] => 0.180019
            [connect_time] => 0.467973
            [pretransfer_time] => 0.468035
            [size_upload] => 0
            [size_download] => 2274
            [speed_download] => 2041
            [speed_upload] => 0
            [download_content_length] => 0
            [upload_content_length] => 0
            [starttransfer_time] => 0.826031
            [redirect_time] => 0
        )
)

Code

<?php
/**
 * See http://www.bin-co.com/php/scripts/load/
 * Version : 1.00.A
 */
function load($url,$options=array('method'=>'get','return_info'=>false)) {
    
$url_parts parse_url($url);
    
$info = array(//Currently only supported by curl.
        
'http_code'    => 200
    
);
    
$response '';
    
    
$send_header = array(
        
'Accept' => 'text/*',
        
'User-Agent' => 'BinGet/1.00.A (http://www.bin-co.com/php/scripts/load/)'
    
);

    
///////////////////////////// Curl /////////////////////////////////////
    //If curl is available, use curl to get the data.
    
if(function_exists("curl_init"
                and (!(isset(
$options['use']) and $options['use'] == 'fsocketopen'))) { //Don't user curl if it is specifically stated to user fsocketopen in the options
        
if(isset($options['method']) and $options['method'] == 'post') {
            
$page $url_parts['scheme'] . '://' $url_parts['host'] . $url_parts['path'];
        } else {
            
$page $url;
        }

        
$ch curl_init($url_parts['host']);

        
curl_setopt($chCURLOPT_URL$page);
        
curl_setopt($chCURLOPT_RETURNTRANSFERtrue); //Just return the data - not print the whole thing.
        
curl_setopt($chCURLOPT_HEADERtrue); //We need the headers
        
curl_setopt($chCURLOPT_NOBODYfalse); //The content - if true, will not download the contents
        
if(isset($options['method']) and $options['method'] == 'post' and $url_parts['query']) {
            
curl_setopt($chCURLOPT_POSTtrue);
            
curl_setopt($chCURLOPT_POSTFIELDS$url_parts['query']);
        }
        
//Set the headers our spiders sends
        
curl_setopt($chCURLOPT_USERAGENT$send_header['User-Agent']); //The Name of the UserAgent we will be using ;)
        
$custom_headers = array("Accept: " $send_header['Accept'] );
        if(isset(
$options['modified_since']))
            
array_push($custom_headers,"If-Modified-Since: ".gmdate('D, d M Y H:i:s \G\M\T',strtotime($options['modified_since'])));
        
curl_setopt($chCURLOPT_HTTPHEADER$custom_headers);

        
curl_setopt($chCURLOPT_COOKIEJAR"cookie.txt"); //If ever needed...
        
curl_setopt($chCURLOPT_FOLLOWLOCATION1);
        
curl_setopt($chCURLOPT_SSL_VERIFYPEERFALSE);

        if(isset(
$url_parts['user']) and isset($url_parts['pass'])) {
            
$custom_headers = array("Authorization: Basic ".base64_encode($url_parts['user'].':'.$url_parts['pass']));
            
curl_setopt($chCURLOPT_HTTPHEADER$custom_headers);
        }

        
$response curl_exec($ch);
        
$info curl_getinfo($ch); //Some information on the fetch
        
curl_close($ch);

    
//////////////////////////////////////////// FSockOpen //////////////////////////////
    
} else { //If there is no curl, use fsocketopen
        
if(isset($url_parts['query'])) {
            if(isset(
$options['method']) and $options['method'] == 'post')
                
$page $url_parts['path'];
            else
                
$page $url_parts['path'] . '?' $url_parts['query'];
        } else {
            
$page $url_parts['path'];
        }

        
$fp fsockopen($url_parts['host'], 80$errno$errstr30);
        if (
$fp) {
            
$out '';
            if(isset(
$options['method']) and $options['method'] == 'post' and isset($url_parts['query'])) {
                
$out .= "POST $page HTTP/1.1\r\n";
            } else {
                
$out .= "GET $page HTTP/1.0\r\n"//HTTP/1.0 is much easier to handle than HTTP/1.1
            
}
            
$out .= "Host: $url_parts[host]\r\n";
            
$out .= "Accept: $send_header[Accept]\r\n";
            
$out .= "User-Agent: {$send_header['User-Agent']}\r\n";
            if(isset(
$options['modified_since']))
                
$out .= "If-Modified-Since: ".gmdate('D, d M Y H:i:s \G\M\T',strtotime($options['modified_since'])) ."\r\n";

            
$out .= "Connection: Close\r\n";
            
            
//HTTP Basic Authorization support
            
if(isset($url_parts['user']) and isset($url_parts['pass'])) {
                
$out .= "Authorization: Basic ".base64_encode($url_parts['user'].':'.$url_parts['pass']) . "\r\n";
            }

            
//If the request is post - pass the data in a special way.
            
if(isset($options['method']) and $options['method'] == 'post' and $url_parts['query']) {
                
$out .= "Content-Type: application/x-www-form-urlencoded\r\n";
                
$out .= 'Content-Length: ' strlen($url_parts['query']) . "\r\n";
                
$out .= "\r\n" $url_parts['query'];
            }
            
$out .= "\r\n";

            
fwrite($fp$out);
            while (!
feof($fp)) {
                
$response .= fgets($fp128);
            }
            
fclose($fp);
        }
    }

    
//Get the headers in an associative array
    
$headers = array();

    if(
$info['http_code'] == 404) {
        
$body "";
        
$headers['Status'] = 404;
    } else {
        
//Seperate header and content
        
$separator_position strpos($response,"\r\n\r\n");
        
$header_text substr($response,0,$separator_position);
        
$body substr($response,$separator_position+4);
        
        foreach(
explode("\n",$header_text) as $line) {
            
$parts explode(": ",$line);
            if(
count($parts) == 2$headers[$parts[0]] = chop($parts[1]);
        }
    }

    if(
$options['return_info']) return array('headers' => $headers'body' => $body'info' => $info);
    return 
$body;
}


License

BSD License

Comments

Anonymous at 27 Oct, 2007 12:14
Thanks ! the script is easy to use
Reply to this.
What is a URL? at 23 Nov, 2007 03:45
Does this retrieve all the HTML of the URL in question? If so, then how does it differ from file()?
Reply to this.
Binny V A at 25 Nov, 2007 06:22
In some servers, file() cannot fetch an URL. There is an option in PHP to disable it. Since it is considered to be a security threat, many admins have disabled it. In such cases, you can use this function.
Reply to this.
Anonymous at 12 Jan, 2008 06:24
wow! Straight forward programming, good documentation ... a pleasure to use =)
Reply to this.
Anonymous at 20 Feb, 2008 09:39
Awesome function. Had to change some small stuff thou cause i got some weird numbers in my $body output.

$body = substr($response,$separator_position+4);
i hade to change to:
$body = substr($response,$separator_position+9);

and then a
$body = substr($body,0,-5);
cause there was a 0 at the end.

otherwise awesome job! =)

-gob
Reply to this.
Ashkan at 01 Mar, 2008 01:37
great! thanks a lot. it make me faster doing my works!
Reply to this.
Anonymous at 05 Mar, 2008 11:34
hi
how can i save the contents of the url or download it using this script
i need to download it and save it like HTML file ..

example : when you clicl file > save as .. the page will saved in HTML file Whit Images ..
Reply to this.
Brad at 15 Mar, 2008 02:53
And once again simply awesome - thanks!
Reply to this.
Anonymous at 16 Mar, 2008 06:35
what would be the code if i want to fetch full wvm path from zdsahre.net?
Reply to this.
Anonymous at 18 Mar, 2008 11:45
thank you very very much, it solves all my problems, simply..
Reply to this.
Comment


Comment




Comment Formating : HTML tags a, strong, em, b, i, code, pre, p and br allowed. Other tags will be shown as code(< will become &lt;). Urls, Line breaks will be auto-formated.
Subscribe to Feed