load() Function for PHP - Fetch URL Content
I recently had to develop a small script that will fetch an XML file from the web. All I had to do is download a given URL and read its contents. To my great surprise I found that download the file using my jx Ajax library was much easier than doing it with PHP.
PHP make this very easy by including functions like file_get_contents() that has URL support. This code will get you the contents of an URL.
$contents = file_get_contents('http://example.com/rss.xml');
Unfortunately, this is a huge security threat - and many servers have disabled this feature in PHP. Also this is not the most optimized method to fetch an URL. Also, it is impossible to submit data using the POST method using this function.
Other Options - curl and fsockopen
PHP provide other two method to fetch an URL - Curl and Fsockopen. But to use this I have to write a lot more code.load()
So I decided to create my own function that makes it much more easier.
Features
- Easy to use.
- Supports Get and Post methods.
- Supports HTTP Basic Authentication - this will work - http://binny:password@example.com/
- Supports both Curl and Fsockopen. Tries to use curl - if it is not available, users fsockopen.
- Secure URL(https) supported with Curl
Options
The first argument of this function is the URL to be fetched. The second argument is an associative array. This is an optional argument. The following values are supported in this array.
- return_info
- Possible values - true/false
If this is true, the function will return an associative array rather than just a string. The array will contain 3 elements...
- headers
- An associative array containing all the headers returned by the server.
- body
- A string - the contents of the URL.
- info
- Some information about the fetch. This is the result returned by the 'curl_getinfo()' function. Supported only with Curl.
- method
- Possible Values - post/get
Specifies the method to be used. - modified_since
- If this option is set, the 'If-Modified-Since' header will be used. This will make sure that the URL will be fetched only it was modified.
Examples
The code to fetch the contents of an URL will look like this...
$contents = load('http://example.com/rss.xml');
Simple, no? This will just return the contents of the URL. If you need to do more complex stuff, just use the second argument to pass more options...
$options = array(
'return_info' => true,
'method' => 'post'
);
$result = load('http://www.bin-co.com/rss.xml.php?section=2',$options);
print_r($result);
The output will be like this...
Array
(
[headers] => Array
(
[Date] => Mon, 18 Jun 2007 13:56:22 GMT
[Server] => Apache/2.0.54 (Unix) PHP/4.4.7 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_fastcgi/2.4.2 DAV/2 SVN/1.4.2
[X-Powered-By] => PHP/5.2.2
[Expires] => Thu, 19 Nov 1981 08:52:00 GMT
[Cache-Control] => no-store, no-cache, must-revalidate, post-check=0, pre-check=0
[Pragma] => no-cache
[Set-Cookie] => PHPSESSID=85g9n1i320ao08kp5tmmneohm1; path=/
[Last-Modified] => Tue, 30 Nov 1999 00:00:00 GMT
[Vary] => Accept-Encoding
[Transfer-Encoding] => chunked
[Content-Type] => text/xml
)
[body] => ... Contents of the Page ...
[info] => Array
(
[url] => http://www.bin-co.com/rss.xml.php?section=2
[content_type] => text/xml
[http_code] => 200
[header_size] => 501
[request_size] => 146
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 1.113792
[namelookup_time] => 0.180019
[connect_time] => 0.467973
[pretransfer_time] => 0.468035
[size_upload] => 0
[size_download] => 2274
[speed_download] => 2041
[speed_upload] => 0
[download_content_length] => 0
[upload_content_length] => 0
[starttransfer_time] => 0.826031
[redirect_time] => 0
)
)
Code
<?php
/**
* See http://www.bin-co.com/php/scripts/load/
* Version : 2.00.A
*/
function load($url,$options=array()) {
$default_options = array(
'method' => 'get',
'return_info' => false,
'return_body' => true,
'cache' => false,
'referer' => '',
'headers' => array(),
'session' => false,
'session_close' => false,
);
// Sets the default options.
foreach($default_options as $opt=>$value) {
if(!isset($options[$opt])) $options[$opt] = $value;
}
$url_parts = parse_url($url);
$ch = false;
$info = array(//Currently only supported by curl.
'http_code' => 200
);
$response = '';
$send_header = array(
'Accept' => 'text/*',
'User-Agent' => 'BinGet/1.00.A (http://www.bin-co.com/php/scripts/load/)'
) + $options['headers']; // Add custom headers provided by the user.
if($options['cache']) {
$cache_folder = '/tmp/php-load-function/';
if(isset($options['cache_folder'])) $cache_folder = $options['cache_folder'];
if(!file_exists($cache_folder)) {
$old_umask = umask(0); // Or the folder will not get write permission for everybody.
mkdir($cache_folder, 0777);
umask($old_umask);
}
$cache_file_name = md5($url) . '.cache';
$cache_file = joinPath($cache_folder, $cache_file_name); //Don't change the variable name - used at the end of the function.
if(file_exists($cache_file)) { // Cached file exists - return that.
$response = file_get_contents($cache_file);
//Seperate header and content
$separator_position = strpos($response,"\r\n\r\n");
$header_text = substr($response,0,$separator_position);
$body = substr($response,$separator_position+4);
foreach(explode("\n",$header_text) as $line) {
$parts = explode(": ",$line);
if(count($parts) == 2) $headers[$parts[0]] = chop($parts[1]);
}
$headers['cached'] = true;
if(!$options['return_info']) return $body;
else return array('headers' => $headers, 'body' => $body, 'info' => array('cached'=>true));
}
}
///////////////////////////// Curl /////////////////////////////////////
//If curl is available, use curl to get the data.
if(function_exists("curl_init")
and (!(isset($options['use']) and $options['use'] == 'fsocketopen'))) { //Don't use curl if it is specifically stated to use fsocketopen in the options
if(isset($options['post_data'])) { //There is an option to specify some data to be posted.
$page = $url;
$options['method'] = 'post';
if(is_array($options['post_data'])) { //The data is in array format.
$post_data = array();
foreach($options['post_data'] as $key=>$value) {
$post_data[] = "$key=" . urlencode($value);
}
$url_parts['query'] = implode('&', $post_data);
} else { //Its a string
$url_parts['query'] = $options['post_data'];
}
} else {
if(isset($options['method']) and $options['method'] == 'post') {
$page = $url_parts['scheme'] . '://' . $url_parts['host'] . $url_parts['path'];
} else {
$page = $url;
}
}
if($options['session'] and isset($GLOBALS['_binget_curl_session'])) $ch = $GLOBALS['_binget_curl_session']; //Session is stored in a global variable
else $ch = curl_init($url_parts['host']);
curl_setopt($ch, CURLOPT_URL, $page) or die("Invalid cURL Handle Resouce");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //Just return the data - not print the whole thing.
curl_setopt($ch, CURLOPT_HEADER, true); //We need the headers
curl_setopt($ch, CURLOPT_NOBODY, !($options['return_body'])); //The content - if true, will not download the contents. There is a ! operation - don't remove it.
if(isset($options['method']) and $options['method'] == 'post' and isset($url_parts['query'])) {
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $url_parts['query']);
}
//Set the headers our spiders sends
curl_setopt($ch, CURLOPT_USERAGENT, $send_header['User-Agent']); //The Name of the UserAgent we will be using ;)
$custom_headers = array("Accept: " . $send_header['Accept'] );
if(isset($options['modified_since']))
array_push($custom_headers,"If-Modified-Since: ".gmdate('D, d M Y H:i:s \G\M\T',strtotime($options['modified_since'])));
curl_setopt($ch, CURLOPT_HTTPHEADER, $custom_headers);
if($options['referer']) curl_setopt($ch, CURLOPT_REFERER, $options['referer']);
curl_setopt($ch, CURLOPT_COOKIEJAR, "/tmp/binget-cookie.txt"); //If ever needed...
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
if(isset($url_parts['user']) and isset($url_parts['pass'])) {
$custom_headers = array("Authorization: Basic ".base64_encode($url_parts['user'].':'.$url_parts['pass']));
curl_setopt($ch, CURLOPT_HTTPHEADER, $custom_headers);
}
$response = curl_exec($ch);
$info = curl_getinfo($ch); //Some information on the fetch
if($options['session'] and !$options['session_close']) $GLOBALS['_binget_curl_session'] = $ch; //Dont close the curl session. We may need it later - save it to a global variable
else curl_close($ch); //If the session option is not set, close the session.
//////////////////////////////////////////// FSockOpen //////////////////////////////
} else { //If there is no curl, use fsocketopen - but keep in mind that most advanced features will be lost with this approch.
if(isset($url_parts['query'])) {
if(isset($options['method']) and $options['method'] == 'post')
$page = $url_parts['path'];
else
$page = $url_parts['path'] . '?' . $url_parts['query'];
} else {
$page = $url_parts['path'];
}
if(!isset($url_parts['port'])) $url_parts['port'] = 80;
$fp = fsockopen($url_parts['host'], $url_parts['port'], $errno, $errstr, 30);
if ($fp) {
$out = '';
if(isset($options['method']) and $options['method'] == 'post' and isset($url_parts['query'])) {
$out .= "POST $page HTTP/1.1\r\n";
} else {
$out .= "GET $page HTTP/1.0\r\n"; //HTTP/1.0 is much easier to handle than HTTP/1.1
}
$out .= "Host: $url_parts[host]\r\n";
$out .= "Accept: $send_header[Accept]\r\n";
$out .= "User-Agent: {$send_header['User-Agent']}\r\n";
if(isset($options['modified_since']))
$out .= "If-Modified-Since: ".gmdate('D, d M Y H:i:s \G\M\T',strtotime($options['modified_since'])) ."\r\n";
$out .= "Connection: Close\r\n";
//HTTP Basic Authorization support
if(isset($url_parts['user']) and isset($url_parts['pass'])) {
$out .= "Authorization: Basic ".base64_encode($url_parts['user'].':'.$url_parts['pass']) . "\r\n";
}
//If the request is post - pass the data in a special way.
if(isset($options['method']) and $options['method'] == 'post' and $url_parts['query']) {
$out .= "Content-Type: application/x-www-form-urlencoded\r\n";
$out .= 'Content-Length: ' . strlen($url_parts['query']) . "\r\n";
$out .= "\r\n" . $url_parts['query'];
}
$out .= "\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
$response .= fgets($fp, 128);
}
fclose($fp);
}
}
//Get the headers in an associative array
$headers = array();
if($info['http_code'] == 404) {
$body = "";
$headers['Status'] = 404;
} else {
//Seperate header and content
$header_text = substr($response, 0, $info['header_size']);
$body = substr($response, $info['header_size']);
foreach(explode("\n",$header_text) as $line) {
$parts = explode(": ",$line);
if(count($parts) == 2) $headers[$parts[0]] = chop($parts[1]);
}
}
if(isset($cache_file)) { //Should we cache the URL?
file_put_contents($cache_file, $response);
}
if($options['return_info']) return array('headers' => $headers, 'body' => $body, 'info' => $info, 'curl_handle'=>$ch);
return $body;
}
License
BSD License

Comments
$body = substr($response,$separator_position+4);
i hade to change to:
$body = substr($response,$separator_position+9);
and then a
$body = substr($body,0,-5);
cause there was a 0 at the end.
otherwise awesome job! =)
-gob
how can i save the contents of the url or download it using this script
i need to download it and save it like HTML file ..
example : when you clicl file > save as .. the page will saved in HTML file Whit Images ..
do i need to make any change..?
if i need to call that what i should pass to the $options..
Thanks and really useful.
One thing it doesn't handle, is fetching code on different ports. I'm trying to pull a file on port 8080.
Any ideas?
include('http://remotesite.com/spamming_php_file.txt'); - it will be executed as PHP code if this feature is turned on.
I have one question. I would like to execute form on anothere server.
For on other servers pages looks something like this :
<form id="calculate_score" action="" method="post">
<input type="hidden" name="cmd" value="calculate" />
<input type="text" id="month_payment" name="score" class="input" value="" />
<input type="image" src="/dsg/sl/calculate.gif" class="button" />
</form>
Is it posible to send data with load function (post method) so I could enter values manualy and just retreive the result.
Thnx.
Thanks
---
$fp = fsockopen($url_parts['host'], 80, $errno, $errstr, 30);
---
to
---
if(!isset($url_parts['port'])) $url_parts['port'] = 80;
$fp = fsockopen($url_parts['host'], $url_parts['port'], $errno, $errstr, 30);
---
Great work!
I have one note:
if I use it and has to redirect it, the new url's header in $results['body'] of top as plain text , and not in $results['headers']. But $results['info'] is OK, but not full (Content-Length, Date, etc...).
Example:
$result = load('http://google.com',$options); //redirect to www.google.com
print_r($results);
Sorry my bad english.
The patch allows fsockopen to connect to a https:// URL. It also calculates the header_size so the header/body splitting/parsing code works.
http://yasmar.net/load_fix_fsockopen.diff
Fix the sending of headers (while I didn't test it, the curl code didn't seem right, the fsockopen code didn't handle it at all).
http://yasmar.net/fix_sending_headers.diff
Allow receiving multiple headers with the same name (eg. Set-Cookie: appear multiple times when multiple cookies are being set). This leaves the $headers['header'] behavior intact and simply changes the value from a string to an array of strings.
http://yasmar.net/multiple_headers_receive.diff
Retrieve the HTTP response code when using fsockopen.
http://yasmar.net/fsockopen_http_response.diff
Send multiple headers with the same name (a mirror implementation of the receiving multiple headers patch).
http://yasmar.net/multiple_headers_send.diff
http://yasmar.net/load_fsockopen_non_empty_line.diff
http://yasmar.net/load_fsockopen_headers_end.diff
can someone tell me how to use this script
please
I am trying to fetch new from yahoo to my website.
I have this working well, retrieving text. However, I have been trying to also use it for retrieving a PNG from a remote server and then saving this to my local server and it doesn't work. I can get it to work using feof and feof really well, but it doesn't work on alls servers which is why I am trying to use this.
Could someone please provide me with a sample of what to do to get this to work please? It saves the data, but isn't being recognised as a PNG file. I guess the data is corrupt for some reason.
Oh, and a1291762, is there any chance you can link to the entire updated script rather than just diffs?
Thanks
My code does not actually write out the file at any point. It simply fetches and then serves to the browser (effectively acting as a proxy). The code looks like this.
# Fetch the attachment (needs cookies)
$attachment = load($this->source_url, array('return_info' => true,
'headers' => array('Cookie' => session()->cookie())));
# Serve up the headers that I got (at least some of them will be important)
foreach ($attachment['headers'] as $header => $data) {
# Any text/* types get served as text/plain.
# I HATE it when the browser doesn't let me view TEXT files!
if ($header == 'Content-Type' && preg_match('/^text\//', $data)) {
header('Content-Type: text/plain');
continue;
}
if (!is_array($data)) {
$data = array($data);
}
foreach ($data as $value) {
header($header.': '.$value.'
');
}
}
# Now serve up the contents of the file.
echo $attachment['body'];
The full script I have is here: http://yasmar.net/load.php. Note that this is not quite original + patches. There is a raw POST hack I stuck in so I could upload files to a remote server but it's only for the fsockopen case. I was planning on making it work with curl before posting as a patch.
http://yasmar.net/fsockopen_post_data.diff
This adds support for multipart-encoded POSTs (cURL and fsockopen). This allows you to upload files.
http://yasmar.net/multipart_post.diff
This makes use of two helper functions (sourced from around the internets). It was designed around fsockopen's requirements and retrofitted to curl. fsockopen needs more data than curl does so the 'multipart_data' field you are expected to fill out has more options than you might expect (the same input is used so that the 'use curl or fall back to fsockopen' logic can still work).
Anyway, here is an example of how to upload with this new feature (in this case, attaching a file to a JIRA task).
$url = 'https://'.JIRA_HOST.'/secure/AttachFile.jspa';
$upload = array('filename.1' => array('filename' => $file_name, 'type' => $mime_type, 'binary' => true, 'contents' => $file_contents),
'filename.2' => array('filename' => '', 'type' => 'application/octet-stream'),
'filename.3' => array('filename' => '', 'type' => 'application/octet-stream'),
'comment' => array(),
'commentLevel' => array(),
'id' => array('contents' => $id),
'Attach' => array('contents' => 'Attach'));
load($url, array(x'headers' => array('Cookie' => $cookies),
'method' => 'post', 'multipart_data' => $upload));
Just for Matt, the whole enchilada can be found here.
http://yasmar.net/load.php.txt
Note that I'm not interested in maintaining a fork or anything so don't ask me for support. This was tested with both cURL and fsockopen but only for the case presented here (attaching a file to a JIRA task).
A great help
Thanks a lot.
cheers
a, strong, em, b, i, code, pre, pandbrallowed. Other tags will be shown as code(< will become <). Urls, Line breaks will be auto-formated.