load() Function for PHP - Fetch URL Content

I recently had to develop a small script that will fetch an XML file from the web. All I had to do is download a given URL and read its contents. To my great surprise I found that download the file using my jx Ajax library was much easier than doing it with PHP.

PHP make this very easy by including functions like file_get_contents() that has URL support. This code will get you the contents of an URL.

$contents = file_get_contents('http://example.com/rss.xml');

Unfortunately, this is a huge security threat - and many servers have disabled this feature in PHP. Also this is not the most optimized method to fetch an URL. Also, it is impossible to submit data using the POST method using this function.

Other Options - curl and fsockopen

PHP provide other two method to fetch an URL - Curl and Fsockopen. But to use this I have to write a lot more code.

load()

So I decided to create my own function that makes it much more easier.

Features

Options

The first argument of this function is the URL to be fetched. The second argument is an associative array. This is an optional argument. The following values are supported in this array.

return_info
Possible values - true/false
If this is true, the function will return an associative array rather than just a string. The array will contain 3 elements...
headers
An associative array containing all the headers returned by the server.
body
A string - the contents of the URL.
info
Some information about the fetch. This is the result returned by the 'curl_getinfo()' function. Supported only with Curl.
method
Possible Values - post/get
Specifies the method to be used.
modified_since
If this option is set, the 'If-Modified-Since' header will be used. This will make sure that the URL will be fetched only it was modified.

Examples

The code to fetch the contents of an URL will look like this...

$contents = load('http://example.com/rss.xml');

Simple, no? This will just return the contents of the URL. If you need to do more complex stuff, just use the second argument to pass more options...

$options = array(
	'return_info'	=> true,
	'method'		=> 'post'
);
$result = load('http://www.bin-co.com/rss.xml.php?section=2',$options);
print_r($result);

The output will be like this...

Array
(
    [headers] => Array
        (
            [Date] => Mon, 18 Jun 2007 13:56:22 GMT
            [Server] => Apache/2.0.54 (Unix) PHP/4.4.7 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_fastcgi/2.4.2 DAV/2 SVN/1.4.2
            [X-Powered-By] => PHP/5.2.2
            [Expires] => Thu, 19 Nov 1981 08:52:00 GMT
            [Cache-Control] => no-store, no-cache, must-revalidate, post-check=0, pre-check=0
            [Pragma] => no-cache
            [Set-Cookie] => PHPSESSID=85g9n1i320ao08kp5tmmneohm1; path=/
            [Last-Modified] => Tue, 30 Nov 1999 00:00:00 GMT
            [Vary] => Accept-Encoding
            [Transfer-Encoding] => chunked
            [Content-Type] => text/xml
        )
	[body] => ... Contents of the Page ...
	[info] => Array
        (
            [url] => http://www.bin-co.com/rss.xml.php?section=2
            [content_type] => text/xml
            [http_code] => 200
            [header_size] => 501
            [request_size] => 146
            [filetime] => -1
            [ssl_verify_result] => 0
            [redirect_count] => 0
            [total_time] => 1.113792
            [namelookup_time] => 0.180019
            [connect_time] => 0.467973
            [pretransfer_time] => 0.468035
            [size_upload] => 0
            [size_download] => 2274
            [speed_download] => 2041
            [speed_upload] => 0
            [download_content_length] => 0
            [upload_content_length] => 0
            [starttransfer_time] => 0.826031
            [redirect_time] => 0
        )
)

Code

<?php
/**
 * Link: http://www.bin-co.com/php/scripts/load/
 * Version : 3.00.A
 */
function load($url,$options=array()) {
    
$default_options = array(
        
'method'        => 'get',
        
'post_data'        => false,
        
'return_info'    => false,
        
'return_body'    => true,
        
'cache'            => false,
        
'referer'        => '',
        
'headers'        => array(),
        
'session'        => false,
        
'session_close'    => false,
    );
    
// Sets the default options.
    
foreach($default_options as $opt=>$value) {
        if(!isset(
$options[$opt])) $options[$opt] = $value;
    }

    
$url_parts parse_url($url);
    
$ch false;
    
$info = array(//Currently only supported by curl.
        
'http_code'    => 200
    
);
    
$response '';
    
    
$send_header = array(
        
'Accept' => 'text/*',
        
'User-Agent' => 'BinGet/1.00.A (http://www.bin-co.com/php/scripts/load/)'
    
) + $options['headers']; // Add custom headers provided by the user.
    
    
if($options['cache']) {
        
$cache_folder joinPath(sys_get_temp_dir(), 'php-load-function');
        if(isset(
$options['cache_folder'])) $cache_folder $options['cache_folder'];
        if(!
file_exists($cache_folder)) {
            
$old_umask umask(0); // Or the folder will not get write permission for everybody.
            
mkdir($cache_folder0777);
            
umask($old_umask);
        }
        
        
$cache_file_name md5($url) . '.cache';
        
$cache_file joinPath($cache_folder$cache_file_name); //Don't change the variable name - used at the end of the function.
        
        
if(file_exists($cache_file)) { // Cached file exists - return that.
            
$response file_get_contents($cache_file);
            
            
//Seperate header and content
            
$separator_position strpos($response,"\r\n\r\n");
            
$header_text substr($response,0,$separator_position);
            
$body substr($response,$separator_position+4);
            
            foreach(
explode("\n",$header_text) as $line) {
                
$parts explode(": ",$line);
                if(
count($parts) == 2$headers[$parts[0]] = chop($parts[1]);
            }
            
$headers['cached'] = true;
            
            if(!
$options['return_info']) return $body;
            else return array(
'headers' => $headers'body' => $body'info' => array('cached'=>true));
        }
    }

    if(isset(
$options['post_data'])) { //There is an option to specify some data to be posted.
        
$options['method'] = 'post';
        
        if(
is_array($options['post_data'])) { //The data is in array format.
            
$post_data = array();
            foreach(
$options['post_data'] as $key=>$value) {
                
$post_data[] = "$key=" urlencode($value);
            }
            
$url_parts['query'] = implode('&'$post_data);
        } else { 
//Its a string
            
$url_parts['query'] = $options['post_data'];
        }
    } elseif(isset(
$options['multipart_data'])) { //There is an option to specify some data to be posted.
        
$options['method'] = 'post';
        
$url_parts['query'] = $options['multipart_data'];
        
/*
            This array consists of a name-indexed set of options.
            For example,
            'name' => array('option' => value)
            Available options are:
            filename: the name to report when uploading a file.
            type: the mime type of the file being uploaded (not used with curl).
            binary: a flag to tell the other end that the file is being uploaded in binary mode (not used with curl).
            contents: the file contents. More efficient for fsockopen if you already have the file contents.
            fromfile: the file to upload. More efficient for curl if you don't have the file contents.

            Note the name of the file specified with fromfile overrides filename when using curl.
         */
    
}

    
///////////////////////////// Curl /////////////////////////////////////
    //If curl is available, use curl to get the data.
    
if(function_exists("curl_init"
                and (!(isset(
$options['use']) and $options['use'] == 'fsocketopen'))) { //Don't use curl if it is specifically stated to use fsocketopen in the options
        
        
if(isset($options['post_data'])) { //There is an option to specify some data to be posted.
            
$page $url;
            
$options['method'] = 'post';
            
            if(
is_array($options['post_data'])) { //The data is in array format.
                
$post_data = array();
                foreach(
$options['post_data'] as $key=>$value) {
                    
$post_data[] = "$key=" urlencode($value);
                }
                
$url_parts['query'] = implode('&'$post_data);
            
            } else { 
//Its a string
                
$url_parts['query'] = $options['post_data'];
            }
        } else {
            if(isset(
$options['method']) and $options['method'] == 'post') {
                
$page $url_parts['scheme'] . '://' $url_parts['host'] . $url_parts['path'];
            } else {
                
$page $url;
            }
        }

        if(
$options['session'] and isset($GLOBALS['_binget_curl_session'])) $ch $GLOBALS['_binget_curl_session']; //Session is stored in a global variable
        
else $ch curl_init($url_parts['host']);
        
        
curl_setopt($chCURLOPT_URL$page) or die("Invalid cURL Handle Resouce");
        
curl_setopt($chCURLOPT_RETURNTRANSFERtrue); //Just return the data - not print the whole thing.
        
curl_setopt($chCURLOPT_HEADERtrue); //We need the headers
        
curl_setopt($chCURLOPT_NOBODY, !($options['return_body'])); //The content - if true, will not download the contents. There is a ! operation - don't remove it.
        
$tmpdir NULL//This acts as a flag for us to clean up temp files
        
if(isset($options['method']) and $options['method'] == 'post' and isset($url_parts['query'])) {
            
curl_setopt($chCURLOPT_POSTtrue);
            if(
is_array($url_parts['query'])) {
                
//multipart form data (eg. file upload)
                
$postdata = array();
                foreach (
$url_parts['query'] as $name => $data) {
                    if (isset(
$data['contents']) && isset($data['filename'])) {
                        if (!isset(
$tmpdir)) { //If the temporary folder is not specifed - and we want to upload a file, create a temp folder.
                            //  :TODO:
                            
$dir sys_get_temp_dir();
                            
$prefix 'load';
                            
                            if (
substr($dir, -1) != '/'$dir .= '/';
                            do {
                                
$path $dir $prefix mt_rand(09999999);
                            } while (!
mkdir($path$mode));
                        
                            
$tmpdir $path;
                        }
                        
$tmpfile $tmpdir.'/'.$data['filename'];
                        
file_put_contents($tmpfile$data['contents']);
                        
$data['fromfile'] = $tmpfile;
                    }
                    if (isset(
$data['fromfile'])) {
                        
// Not sure how to pass mime type and/or the 'use binary' flag
                        
$postdata[$name] = '@'.$data['fromfile'];
                    } elseif (isset(
$data['contents'])) {
                        
$postdata[$name] = $data['contents'];
                    } else {
                        
$postdata[$name] = '';
                    }
                }
                
curl_setopt($chCURLOPT_POSTFIELDS$postdata);
            } else {
                
curl_setopt($chCURLOPT_POSTFIELDS$url_parts['query']);
            }
        }

        
//Set the headers our spiders sends
        
curl_setopt($chCURLOPT_USERAGENT$send_header['User-Agent']); //The Name of the UserAgent we will be using ;)
        
$custom_headers = array("Accept: " $send_header['Accept'] );
        if(isset(
$options['modified_since']))
            
array_push($custom_headers,"If-Modified-Since: ".gmdate('D, d M Y H:i:s \G\M\T',strtotime($options['modified_since'])));
        
curl_setopt($chCURLOPT_HTTPHEADER$custom_headers);
        if(
$options['referer']) curl_setopt($chCURLOPT_REFERER$options['referer']);

        
curl_setopt($chCURLOPT_COOKIEJAR"/tmp/binget-cookie.txt"); //If ever needed...
        
curl_setopt($chCURLOPT_FOLLOWLOCATIONtrue);
        
curl_setopt($chCURLOPT_MAXREDIRS5);
        
curl_setopt($chCURLOPT_SSL_VERIFYPEERfalse);

        
$custom_headers = array();
        unset(
$send_header['User-Agent']); // Already done (above)
        
foreach ($send_header as $name => $value) {
            if (
is_array($value)) {
                foreach (
$value as $item) {
                    
$custom_headers[] = "$name: $item";
                }
            } else {
                
$custom_headers[] = "$name: $value";
            }
        }
        if(isset(
$url_parts['user']) and isset($url_parts['pass'])) {
            
$custom_headers[] = "Authorization: Basic ".base64_encode($url_parts['user'].':'.$url_parts['pass']);
        }
        
curl_setopt($chCURLOPT_HTTPHEADER$custom_headers);

        
$response curl_exec($ch);

        if(isset(
$tmpdir)) {
            
//rmdirr($tmpdir); //Cleanup any temporary files :TODO:
        
}

        
$info curl_getinfo($ch); //Some information on the fetch
        
        
if($options['session'] and !$options['session_close']) $GLOBALS['_binget_curl_session'] = $ch//Dont close the curl session. We may need it later - save it to a global variable
        
else curl_close($ch);  //If the session option is not set, close the session.

    //////////////////////////////////////////// FSockOpen //////////////////////////////
    
} else { //If there is no curl, use fsocketopen - but keep in mind that most advanced features will be lost with this approch.

        
if(!isset($url_parts['query']) || (isset($options['method']) and $options['method'] == 'post'))
            
$page $url_parts['path'];
        else
            
$page $url_parts['path'] . '?' $url_parts['query'];
        
        if(!isset(
$url_parts['port'])) $url_parts['port'] = ($url_parts['scheme'] == 'https' 443 80);
        
$host = ($url_parts['scheme'] == 'https' 'ssl://' '').$url_parts['host'];
        
$fp fsockopen($host$url_parts['port'], $errno$errstr30);
        if (
$fp) {
            
$out '';
            if(isset(
$options['method']) and $options['method'] == 'post' and isset($url_parts['query'])) {
                
$out .= "POST $page HTTP/1.1\r\n";
            } else {
                
$out .= "GET $page HTTP/1.0\r\n"//HTTP/1.0 is much easier to handle than HTTP/1.1
            
}
            
$out .= "Host: $url_parts[host]\r\n";
        foreach (
$send_header as $name => $value) {
        if (
is_array($value)) {
            foreach (
$value as $item) {
            
$out .= "$name: $item\r\n";
            }
        } else {
            
$out .= "$name: $value\r\n";
        }
        }
            
$out .= "Connection: Close\r\n";
            
            
//HTTP Basic Authorization support
            
if(isset($url_parts['user']) and isset($url_parts['pass'])) {
                
$out .= "Authorization: Basic ".base64_encode($url_parts['user'].':'.$url_parts['pass']) . "\r\n";
            }

            
//If the request is post - pass the data in a special way.
            
if(isset($options['method']) and $options['method'] == 'post') {
                if(
is_array($url_parts['query'])) {
                    
//multipart form data (eg. file upload)

                    // Make a random (hopefully unique) identifier for the boundary
                    
srand((double)microtime()*1000000);
                    
$boundary "---------------------------".substr(md5(rand(0,32000)),0,10);

                    
$postdata = array();
                    
$postdata[] = '--'.$boundary;
                    foreach (
$url_parts['query'] as $name => $data) {
                        
$disposition 'Content-Disposition: form-data; name="'.$name.'"';
                        if (isset(
$data['filename'])) {
                            
$disposition .= '; filename="'.$data['filename'].'"';
                        }
                        
$postdata[] = $disposition;
                        if (isset(
$data['type'])) {
                            
$postdata[] = 'Content-Type: '.$data['type'];
                        }
                        if (isset(
$data['binary']) && $data['binary']) {
                            
$postdata[] = 'Content-Transfer-Encoding: binary';
                        } else {
                            
$postdata[] = '';
                        }
                        if (isset(
$data['fromfile'])) {
                            
$data['contents'] = file_get_contents($data['fromfile']);
                        }
                        if (isset(
$data['contents'])) {
                            
$postdata[] = $data['contents'];
                        } else {
                            
$postdata[] = '';
                        }
                        
$postdata[] = '--'.$boundary;
                    }
                    
$postdata implode("\r\n"$postdata)."\r\n";
                    
$length strlen($postdata);
                    
$postdata 'Content-Type: multipart/form-data; boundary='.$boundary."\r\n".
                                
'Content-Length: '.$length."\r\n".
                                
"\r\n".
                                
$postdata;

                    
$out .= $postdata;
                } else {
                    
$out .= "Content-Type: application/x-www-form-urlencoded\r\n";
                    
$out .= 'Content-Length: ' strlen($url_parts['query']) . "\r\n";
                    
$out .= "\r\n" $url_parts['query'];
                }
            }
            
$out .= "\r\n";

            
fwrite($fp$out);
            while (!
feof($fp)) {
                
$response .= fgets($fp128);
            }
            
fclose($fp);
        }
    }

    
//Get the headers in an associative array
    
$headers = array();

    if(
$info['http_code'] == 404) {
        
$body "";
        
$headers['Status'] = 404;
    } else {
        
//Seperate header and content
        
$header_text substr($response0$info['header_size']);
        
$body substr($response$info['header_size']);
        
        foreach(
explode("\n",$header_text) as $line) {
            
$parts explode(": ",$line);
            if(
count($parts) == 2) {
                if (isset(
$headers[$parts[0]])) {
                    if (
is_array($headers[$parts[0]])) $headers[$parts[0]][] = chop($parts[1]);
                    else 
$headers[$parts[0]] = array($headers[$parts[0]], chop($parts[1]));
                } else {
                    
$headers[$parts[0]] = chop($parts[1]);
                }
            }
        }

    }
    
    if(isset(
$cache_file)) { //Should we cache the URL?
        
file_put_contents($cache_file$response);
    }

    if(
$options['return_info']) return array('headers' => $headers'body' => $body'info' => $info'curl_handle'=>$ch);
    return 
$body;

License

BSD License

Comments

Anonymous at 27 Oct, 2007 12:14
Thanks ! the script is easy to use
Reply to this.
What is a URL? at 23 Nov, 2007 03:45
Does this retrieve all the HTML of the URL in question? If so, then how does it differ from file()?
Reply to this.
Binny V A at 25 Nov, 2007 06:22
In some servers, file() cannot fetch an URL. There is an option in PHP to disable it. Since it is considered to be a security threat, many admins have disabled it. In such cases, you can use this function.
Reply to this.
Anonymous at 12 Jan, 2008 06:24
wow! Straight forward programming, good documentation ... a pleasure to use =)
Reply to this.
Anonymous at 20 Feb, 2008 09:39
Awesome function. Had to change some small stuff thou cause i got some weird numbers in my $body output.

$body = substr($response,$separator_position+4);
i hade to change to:
$body = substr($response,$separator_position+9);

and then a
$body = substr($body,0,-5);
cause there was a 0 at the end.

otherwise awesome job! =)

-gob
Reply to this.
Ashkan at 01 Mar, 2008 01:37
great! thanks a lot. it make me faster doing my works!
Reply to this.
Anonymous at 05 Mar, 2008 11:34
hi
how can i save the contents of the url or download it using this script
i need to download it and save it like HTML file ..

example : when you clicl file > save as .. the page will saved in HTML file Whit Images ..
Reply to this.
Brad at 15 Mar, 2008 02:53
And once again simply awesome - thanks!
Reply to this.
Anonymous at 16 Mar, 2008 06:35
what would be the code if i want to fetch full wvm path from zdsahre.net?
Reply to this.
Anonymous at 18 Mar, 2008 11:45
thank you very very much, it solves all my problems, simply..
Reply to this.
Stumbled upon at 05 Jun, 2008 05:34
This is just excellent, exactly what I needed. Thanks for sharing!
Reply to this.
Tim at 05 Jun, 2008 12:18
Very nice, thx.
Reply to this.
Anonymous at 02 Jul, 2008 06:42
great snippet...consider encapsulating it in a class for easy consumption.
Reply to this.
arun dwivedi at 07 Jul, 2008 01:46
hi arun
Reply to this.
arun dwivedi at 07 Jul, 2008 09:19
how can use rollback with php function
Reply to this.
dheen at 15 Jul, 2008 09:17
i just copied the above coding...and i was running it but codings were printed in that page...pls help me..
do i need to make any change..?
Reply to this.
dheen at 15 Jul, 2008 10:00
what does $option do in the first line...do i need to call the function load from outside..
if i need to call that what i should pass to the $options..
Reply to this.
Adrian at 21 Jul, 2008 12:47
Great script.

Thanks and really useful.

One thing it doesn't handle, is fetching code on different ports. I'm trying to pull a file on port 8080.

Any ideas?
Reply to this.
Mark at 29 Jul, 2008 03:16
Thanks for posting this!
Reply to this.
Roshan Bhattarai at 12 Sep, 2008 04:10
hey bini......thanks for sharing such a great script.....cheers
Reply to this.
Anonymous at 12 Sep, 2008 04:32
Why is file_get_contents a security threat?
Reply to this.
Binny V A at 12 Sep, 2008 12:10
The handling of url as a local file is a security threat - not that function. For example, an attacker could add the code ...
include('http://remotesite.com/spamming_php_file.txt'); - it will be executed as PHP code if this feature is turned on.
Reply to this.
Anonymous at 14 Jan, 2009 04:08
Could you elaborate? Does this script pose a security threat as well? Is there a risk of injection attacks? -Thanks!
Reply to this.
tozo at 19 Sep, 2008 12:58
Hi.

I have one question. I would like to execute form on anothere server.
For on other servers pages looks something like this :

<form id="calculate_score" action="" method="post">
<input type="hidden" name="cmd" value="calculate" />
<input type="text" id="month_payment" name="score" class="input" value="" />
<input type="image" src="/dsg/sl/calculate.gif" class="button" />
</form>

Is it posible to send data with load function (post method) so I could enter values manualy and just retreive the result.

Thnx.
Reply to this.
Anonymous at 28 Sep, 2008 11:59
very good piece of code! works like a charm! thank you
Reply to this.
cartagena at 13 Oct, 2008 08:29
Perfect,,,, a great alternative to file_get_contents.... That's why i love PHP.
Thanks
Reply to this.
Anonymous at 06 Nov, 2008 05:04
Great work!
Reply to this.
Anonymous at 19 Nov, 2008 10:45
Thank you. Saved me probably hours of programming to get some JSON stream data on my webpage
Reply to this.
rahul at 08 Dec, 2008 02:37
I HAVE ONE QUESTION:SUPPOSE I AM RUNNING A PARTICULAR PAGE ON ONE SERVER AND ANOTHER ON ANOTHER SERVER,IF I WANT TO GO FROM ONE PAGE OF ONE SERVER TO OTHER PAGE THAT IS RUNNING ON ANOTHER SERVER HOW CAN I MAINTAIN THE SESSION
Reply to this.
Paul Tarjan at 23 Dec, 2008 11:02
To deal with other ports, change the line:
---
$fp = fsockopen($url_parts['host'], 80, $errno, $errstr, 30);
---

to

---
if(!isset($url_parts['port'])) $url_parts['port'] = 80;

$fp = fsockopen($url_parts['host'], $url_parts['port'], $errno, $errstr, 30);
---
Reply to this.
Jani at 29 Jan, 2009 10:14
Hi!

Great work!
I have one note:
if I use it and has to redirect it, the new url's header in $results['body'] of top as plain text , and not in $results['headers']. But $results['info'] is OK, but not full (Content-Length, Date, etc...).
Example:
$result = load('http://google.com',$options); //redirect to www.google.com
print_r($results);

Sorry my bad english.
Reply to this.
Binny V A at 31 Jan, 2009 05:21
I have update the script to the latest version - it has fixed this issue. And added a lot of new features too.
Reply to this.
Florian at 26 Feb, 2009 02:28
Thanks, this is great, and easy!
Reply to this.
Anonymous at 18 Mar, 2009 12:04
WOW ... AMAZING FUNCTION. Great work !!!
Reply to this.
a1291762 at 14 Apr, 2009 08:25
I tried to use this and found out that without cURL it's a bit limited. I have corrected some of the limitations.

The patch allows fsockopen to connect to a https:// URL. It also calculates the header_size so the header/body splitting/parsing code works.

http://yasmar.net/load_fix_fsockopen.diff
Reply to this.
a1291762 at 16 Apr, 2009 09:27
Some more patches that I've made...

Fix the sending of headers (while I didn't test it, the curl code didn't seem right, the fsockopen code didn't handle it at all).
http://yasmar.net/fix_sending_headers.diff

Allow receiving multiple headers with the same name (eg. Set-Cookie: appear multiple times when multiple cookies are being set). This leaves the $headers['header'] behavior intact and simply changes the value from a string to an array of strings.
http://yasmar.net/multiple_headers_receive.diff

Retrieve the HTTP response code when using fsockopen.
http://yasmar.net/fsockopen_http_response.diff

Send multiple headers with the same name (a mirror implementation of the receiving multiple headers patch).
http://yasmar.net/multiple_headers_send.diff
Reply to this.
Binny V A at 22 Apr, 2009 11:37
Thanks! I'll review the code and add these patches into function. Thanks for sharing.
Reply to this.
a1291762 at 22 Apr, 2009 08:56
I'm now retrieving an attachment and the end of the headers is not an empty line but a line with a single piece of white space. This patch updates the fsockopen code to check for 'only whitespace' instead of 'empty line' when finding the end of the headers.

http://yasmar.net/load_fsockopen_non_empty_line.diff
Reply to this.
a1291762 at 23 Apr, 2009 12:10
Oops. Turns out that last patch was an incorrect fix for a parsing bug. Here's a patch (on top of that patch) that corrects the situation. Embarrassingly, this logic was already there (up in the curl bit). With the non_empty_line patch you can download text attachments. With this patch on top, you can also download binary attachments.

http://yasmar.net/load_fsockopen_headers_end.diff
Reply to this.
Desi at 24 Apr, 2009 08:26
alienlinkz@gmail.com
can someone tell me how to use this script
please
I am trying to fetch new from yahoo to my website.
Reply to this.
Matt at 28 Apr, 2009 01:27
Hi there

I have this working well, retrieving text. However, I have been trying to also use it for retrieving a PNG from a remote server and then saving this to my local server and it doesn't work. I can get it to work using feof and feof really well, but it doesn't work on alls servers which is why I am trying to use this.

Could someone please provide me with a sample of what to do to get this to work please? It saves the data, but isn't being recognised as a PNG file. I guess the data is corrupt for some reason.

Oh, and a1291762, is there any chance you can link to the entire updated script rather than just diffs?

Thanks
Reply to this.
Matt at 28 Apr, 2009 01:30
Sorry, I meant fopen and fwrite.
Reply to this.
a1291762 at 29 Apr, 2009 12:22
Matt, perhaps the file is being "converted" when writing (ie. automatic line ending conversion).

My code does not actually write out the file at any point. It simply fetches and then serves to the browser (effectively acting as a proxy). The code looks like this.

# Fetch the attachment (needs cookies)
$attachment = load($this->source_url, array('return_info' => true,
'headers' => array('Cookie' => session()->cookie())));

# Serve up the headers that I got (at least some of them will be important)
foreach ($attachment['headers'] as $header => $data) {
# Any text/* types get served as text/plain.
# I HATE it when the browser doesn't let me view TEXT files!
if ($header == 'Content-Type' && preg_match('/^text\//', $data)) {
header('Content-Type: text/plain');
continue;
}
if (!is_array($data)) {
$data = array($data);
}
foreach ($data as $value) {
header($header.': '.$value.'
');
}
}

# Now serve up the contents of the file.
echo $attachment['body'];


The full script I have is here: http://yasmar.net/load.php. Note that this is not quite original + patches. There is a raw POST hack I stuck in so I could upload files to a remote server but it's only for the fsockopen case. I was planning on making it work with curl before posting as a patch.
Reply to this.
a1291762 at 30 Apr, 2009 12:18
This patch makes the fsockopen case able to use the post_data feature (no sense in keeping it curl-only right?).
http://yasmar.net/fsockopen_post_data.diff

This adds support for multipart-encoded POSTs (cURL and fsockopen). This allows you to upload files.
http://yasmar.net/multipart_post.diff

This makes use of two helper functions (sourced from around the internets). It was designed around fsockopen's requirements and retrofitted to curl. fsockopen needs more data than curl does so the 'multipart_data' field you are expected to fill out has more options than you might expect (the same input is used so that the 'use curl or fall back to fsockopen' logic can still work).

Anyway, here is an example of how to upload with this new feature (in this case, attaching a file to a JIRA task).

$url = 'https://'.JIRA_HOST.'/secure/AttachFile.jspa';
$upload = array('filename.1' => array('filename' => $file_name, 'type' => $mime_type, 'binary' => true, 'contents' => $file_contents),
'filename.2' => array('filename' => '', 'type' => 'application/octet-stream'),
'filename.3' => array('filename' => '', 'type' => 'application/octet-stream'),
'comment' => array(),
'commentLevel' => array(),
'id' => array('contents' => $id),
'Attach' => array('contents' => 'Attach'));
load($url, array(x'headers' => array('Cookie' => $cookies),
'method' => 'post', 'multipart_data' => $upload));

Just for Matt, the whole enchilada can be found here.
http://yasmar.net/load.php.txt

Note that I'm not interested in maintaining a fork or anything so don't ask me for support. This was tested with both cURL and fsockopen but only for the case presented here (attaching a file to a JIRA task).
Reply to this.
Anonymous at 04 May, 2009 03:10
I've tried the above code exactly but got an error when I used it like this:

$options = array(
'return_info' => true,
'method' => 'post'
);
$result = load('http://www.sigmaforex.com',$options);
print_r($result);

the error was:
400 Bad Request
Your browser sent a request that this server could not understand:

(none)HTTP/1.0 (port 80)

can anyone help me with this issue please?


Reply to this.
prem ypi at 16 May, 2009 01:48
This is really safe method to use now. Fetching using the default php function has few security flaws. May be you should try to commit your piece of code in the default fetch (or have separate function as 'safe_load' etc.
Reply to this.
jaswant tak at 25 May, 2009 01:22
Very nice mate,

A great help

Thanks a lot.

cheers
Reply to this.
WiserX at 23 Jul, 2009 10:28
Really a very useful job.

Thanks dude.

Plz try to write a code to fetch content from PDF files in the form of HTML.
Reply to this.
Malgor at 09 Aug, 2009 06:10
Brilliant bit of code, and just what I was looking for - however, I have a bit of a problem with it.

It seems that if the url I am requesting is unavailable (timing out), then the whole page hangs, and eventually gives me a 500 error :(

Anyone know of a simple way to set a time-out on this script, and just return an error/string ("page unavailable") if the timeout is reached?

TIA
Reply to this.
lordgrunt at 26 Aug, 2009 02:54
this works great on my local server, with all-default php settings. but on free hosting server (with their php config) causes really weird error:
ERROR
The requested URL could not be retrieved
While trying to retrieve the URL: www.lordgrunt.myplus.org/?
The following error was encountered:
* Zero Sized Reply
Squid did not receive any data for this request.
Reply to this.
preet at 10 Sep, 2009 10:27
I need to create a PHP class that simulates the behaviour of a Java Properties class. Below methods should be present:
void load(String propertiesFileName) -  Reads a property file (key and element pairs) from the input file name and loads the key value pairs into a PHP array.
String getProperty(String key) - gets the value of the key setup in the properties file
void setProperty(String key, String value) - gets the value of the key setup in the properties file
Please help me out.
 
Reply to this.
Soundpin at 23 Sep, 2009 10:37
'headers' => array('Cookie' => session()->cookie());
How to pass cookies? Please

Your function help me a lot, thanks.
Reply to this.
Anonymous at 16 Oct, 2009 06:43
Hi nice script. However the following

"Also, it is impossible to submit data using the POST method using this function."

is not true. Please have a look at the twitter script:


fabien.potencier.org/article/20/tweeting-from-php

Cheer!
Reply to this.
Der_Doc at 17 Oct, 2009 11:37
Hi,
i use the script on php 5.1.2 and it work´s perfect.
But on 5.2.11 the script will not work. Knows anybody the problem?

An mail to me were so nice.

MfG
Der_Doc
Reply to this.
Anonymous at 12 Nov, 2009 03:26
One issue as I see it...

If I load

www.something.com/somedir/page.php

from one of my sites into

www.mysite.com/otherpage.php

and if

www.something.com/somedir/page.php

has a link on it that is pointing to

"../someotherpage.php"

THEN if I click on that link while in

www.mysite.com/otherpage.php

I would get taken to

www.mysite.com/someotherpage.php

which does not exist.

Is there a way to have the script account for directory levels if links are not FULL urls in the loaded document?

Other then that AND the inability to correctly load Jscript elements This is one GREAT script..

And actual usage EXAMPLE would be great too.

Thanks
Reply to this.
Anonymous at 12 Nov, 2009 03:28
Oh,, the Jscript issue is very evident if loading pages that have embeded videos in them (embed codes are often JScript)
Reply to this.
Solicitatie hulp at 17 Nov, 2009 04:19
This is a great post. I was looking for this for a long time.

Thank you very much!

Regards,
Mark
Reply to this.
Anonymous at 04 Dec, 2009 02:50
Thank you so much for a great script, just what I was looking for!! You saved me hours of coding. Many thanks again!
Reply to this.
rajat at 04 Dec, 2009 10:09
Wow ! Wat a script.
Reply to this.
Chris Smith at 07 Dec, 2009 07:52
This is an excellent script! Just started playing around with it! Anywhere to donate?

How do I remove the headers off the top of it?

--
Chris
Reply to this.
Binny V A at 04 Jan, 2010 04:29
Headers off the top? I'm not sure what you mean.
Reply to this.
Nacho at 14 Dec, 2009 06:48
Thanks man, this function is really usefull :)
Reply to this.
Anonymous at 12 Jan, 2010 02:57

getting this error:

"PHP Notice: Undefined index: header_size"

any idea?

Thanks

Pippo
Reply to this.
Anonymous at 12 Jan, 2010 03:09

really quick and dirty, but this got me rid of the warning...

$info = array(//Currently only supported by curl.
'http_code' => 200,
'header_size' => ''
);

Reply to this.
Anonymous at 15 Jan, 2010 11:47
I kept getting the HTTP 200 OK information at the top of beginning of the string. I just used strpos to remove it though.
Reply to this.
Hassan at 28 Jan, 2010 08:48
great! thanks a lot. nice work!!!!!
Reply to this.
Math at 09 Feb, 2010 05:51
Thanks a lot, I was searching for this code for a long time. I need to get the selected content of a web page using curl, How it is can be done?
Reply to this.
Atasözleri at 20 Feb, 2010 02:22
Thank you man, this is usefully!

I am using in my web site and stable working.
Reply to this.
Billy at 10 Mar, 2010 12:55
Hi!
I have problem with getting the script to work as it should. I keep getting errors about header size and content lenght. I would really like to use this script though since it would solve all my problems, so any ideas are welcome!
Reply to this.
Muhameti at 12 Mar, 2010 11:13
Hi, I have a problem.

I upload my file in server, and create two type of link: browser link, and ServerUrl where the file is uploaded, those links I save in my database.

When a user type the browseUrl in Internet Explorer, the aplication check on db if file exist on server, if file exist he get the serverUrl with function getFile(browseUrl) and save in a variable for example :

$file = getFile(browseUrl);

when I print the variable $file he print dhe url for example : c:/xampp/htdocs/sample/upload/6c1f0e4810dad6db8bc01dc10baeb283/passwords.txt

but when i use the php function to download the file:

$file = getFile(browseUrl);
$speed = 512;
//First, see if the file exists
if (!is_file($file))
{
die("404 File not found!");
}
//Gather relevent info about file
$filename = basename($file);
$file_extension = strtolower(substr(strrchr($filename,"."),1));

// Begin writing headers
header("Cache-Control:");
header("Cache-Control: public");
header("Content-Type: $ctype");

$filespaces = str_replace("_", " ", $filename);
// if your filename contains underscores, replace them with spaces

$header='Content-Disposition: attachment; filename='.$filespaces;
header($header);
header("Accept-Ranges: bytes");

$size = filesize($file);
// check if http_range is sent by browser (or download manager)
if(isset($_SERVER['HTTP_RANGE'])) {
// if yes, download missing part

$seek_range = substr($_SERVER['HTTP_RANGE'] , 6);
$range = explode( '-', $seek_range);
if($range[0] > 0) { $seek_start = intval($range[0]); }
if($range[1] > 0) { $seek_end = intval($range[1]); }

header("HTTP/1.1 206 Partial Content");
header("Content-Length: " . ($seek_end - $seek_start + 1));
header("Content-Range: bytes $seek_start-$seek_end/$size");
} else {
header("Content-Range: bytes 0-$seek_end/$size");
header("Content-Length: $size");
}
//open the file
$fp = fopen("$file","rb");

//seek to start of missing part
fseek($fp,$seek_start);

//start buffered download
while(!feof($fp)) {
//reset time limit for big files
set_time_limit(0);
print(fread($fp,1024*$speed));
flush();
sleep(1);
}
fclose($fp);
exit;

i get a error: 404 File not found! in reality file exist in that path, becose if i set the variable

$file = c:/xampp/htdocs/sample/upload/6c1f0e4810dad6db8bc01dc10baeb283/passwords.txt;

i can download file without problem, but if i get link from db i get the error: 404 File not found!

Help me!
where is the problem.

Sorry for my bed english
Reply to this.
Comment

Please dont enter you comments in this form - this is a fake form to confuse spamming bots. The next form is the real one.




Comment




Comment Formating : HTML tags a, strong, em, b, i, code, pre, p and br allowed. Other tags will be shown as code(< will become &lt;). Urls, Line breaks will be auto-formated.
Subscribe to Feed