http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss

is missing all item elements when downloaded with curl, the feed appears otherwise OK (no encoding problems, XML valid).

  • drupal_http_request() behaves the same as the curl library.
  • curl on the command line behaves the same as using curl with PHP.
  • PHP stream wrappers work.
  • wget on the command line works.

PHP 5.2.6
curl 7.19.5
MacOSX 10.5.8

Some examples:

1.) Doesn't work - curl on command line: feed has no items

curl --user-agent "Drupal (+http://drupal.org/)" "http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss"

2.) Works - wget on command line

wget --user-agent "Drupal (+http://drupal.org/)" "http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss"

3.) Doesn't work - curl PHP library: feed has no items, character encoding appears ok.

$headers = array(
  'User-Agent: Drupal (+http://drupal.org/)',
);
$request = curl_init('http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss');
curl_setopt($request, CURLOPT_HTTPHEADER, $headers);
curl_setopt($request, CURLOPT_RETURNTRANSFER, TRUE);
$data = curl_exec($request);

header('Content-Type: application/rss+xml; charset=utf-8');
print $data;

4.) Works - same feed as in 3.) downloaded with stream_get_contents().

ini_set('user_agent', 'Drupal (+http://drupal.org/)');
$handle = fopen('http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss', 'rb');
$contents = stream_get_contents($handle);

header('Content-Type: application/rss+xml; charset=utf-8');
print $contents;

5.) Works - same code as in 3.) different (non special characters?) query.

$headers = array(
  'User-Agent: Drupal (+http://drupal.org/)',
);
$request = curl_init('http://news.google.com/news?pz=1&cf=all&ned=en_pk&hl=en&q=drupal&cf=all&output=rss');
curl_setopt($request, CURLOPT_HTTPHEADER, $headers);
curl_setopt($request, CURLOPT_RETURNTRANSFER, TRUE);
$data = curl_exec($request);

header('Content-Type: application/rss+xml; charset=utf-8');
print $data;

(view PHP examples with browser from your web server)

Comments

Anonymous’s picture

More diagnosis:

The cURL output omits the search terms in the title tag. It simply says Google News.

		<title> - أخبار Google</title>

While the wget output has them. It says Google News - Syria.

		<title>سوريا - أخبار Google</title>

I suspect that somebody somewhere can't handle the change in direction.

Anonymous’s picture

alex_b’s picture

#2 ;-) I hear you. So this seems to be a URL encoding issue after all? wget, PHP streams are smarter in accepting input?

alex_b’s picture

SimplePie's replace_invalid_with_pct_encoding() may be worth pillaging for addressing this issue.

kenorb’s picture

Status: Active » Closed (outdated)

Closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.