http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss
is missing all item elements when downloaded with curl, the feed appears otherwise OK (no encoding problems, XML valid).
- drupal_http_request() behaves the same as the curl library.
- curl on the command line behaves the same as using curl with PHP.
- PHP stream wrappers work.
- wget on the command line works.
PHP 5.2.6
curl 7.19.5
MacOSX 10.5.8
Some examples:
1.) Doesn't work - curl on command line: feed has no items
curl --user-agent "Drupal (+http://drupal.org/)" "http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss"
2.) Works - wget on command line
wget --user-agent "Drupal (+http://drupal.org/)" "http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss"
3.) Doesn't work - curl PHP library: feed has no items, character encoding appears ok.
$headers = array(
'User-Agent: Drupal (+http://drupal.org/)',
);
$request = curl_init('http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss');
curl_setopt($request, CURLOPT_HTTPHEADER, $headers);
curl_setopt($request, CURLOPT_RETURNTRANSFER, TRUE);
$data = curl_exec($request);
header('Content-Type: application/rss+xml; charset=utf-8');
print $data;
4.) Works - same feed as in 3.) downloaded with stream_get_contents().
ini_set('user_agent', 'Drupal (+http://drupal.org/)');
$handle = fopen('http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss', 'rb');
$contents = stream_get_contents($handle);
header('Content-Type: application/rss+xml; charset=utf-8');
print $contents;
5.) Works - same code as in 3.) different (non special characters?) query.
$headers = array(
'User-Agent: Drupal (+http://drupal.org/)',
);
$request = curl_init('http://news.google.com/news?pz=1&cf=all&ned=en_pk&hl=en&q=drupal&cf=all&output=rss');
curl_setopt($request, CURLOPT_HTTPHEADER, $headers);
curl_setopt($request, CURLOPT_RETURNTRANSFER, TRUE);
$data = curl_exec($request);
header('Content-Type: application/rss+xml; charset=utf-8');
print $data;
(view PHP examples with browser from your web server)
Comments
Comment #1
Anonymous (not verified) commentedMore diagnosis:
The cURL output omits the search terms in the title tag. It simply says Google News.
While the wget output has them. It says Google News - Syria.
I suspect that somebody somewhere can't handle the change in direction.
Comment #2
Anonymous (not verified) commentedMeanwhile, if you would encode the URL (ahem...):
http://news.google.com/news?pz=1&hl=ar&q=%D8%B3%D9%88%D8%B1%D9%8A%D8%A7&...
cURL works.
curl --user-agent "Drupal (+http://drupal.org/)" "http://news.google.com/news?pz=1&hl=ar&q=%D8%B3%D9%88%D8%B1%D9%8A%D8%A7&..."
Comment #3
alex_b commented#2 ;-) I hear you. So this seems to be a URL encoding issue after all? wget, PHP streams are smarter in accepting input?
Comment #4
alex_b commentedSimplePie's replace_invalid_with_pct_encoding() may be worth pillaging for addressing this issue.
Comment #5
kenorb commentedClosed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.