Closed (outdated)
Project:
Feeds
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
19 Jan 2010 at 18:57 UTC
Updated:
4 Mar 2016 at 21:35 UTC
Jump to comment: Most recent
http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss
is missing all item elements when downloaded with curl, the feed appears otherwise OK (no encoding problems, XML valid).
PHP 5.2.6
curl 7.19.5
MacOSX 10.5.8
Some examples:
curl --user-agent "Drupal (+http://drupal.org/)" "http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss"
wget --user-agent "Drupal (+http://drupal.org/)" "http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss"
$headers = array(
'User-Agent: Drupal (+http://drupal.org/)',
);
$request = curl_init('http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss');
curl_setopt($request, CURLOPT_HTTPHEADER, $headers);
curl_setopt($request, CURLOPT_RETURNTRANSFER, TRUE);
$data = curl_exec($request);
header('Content-Type: application/rss+xml; charset=utf-8');
print $data;
ini_set('user_agent', 'Drupal (+http://drupal.org/)');
$handle = fopen('http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss', 'rb');
$contents = stream_get_contents($handle);
header('Content-Type: application/rss+xml; charset=utf-8');
print $contents;
$headers = array(
'User-Agent: Drupal (+http://drupal.org/)',
);
$request = curl_init('http://news.google.com/news?pz=1&cf=all&ned=en_pk&hl=en&q=drupal&cf=all&output=rss');
curl_setopt($request, CURLOPT_HTTPHEADER, $headers);
curl_setopt($request, CURLOPT_RETURNTRANSFER, TRUE);
$data = curl_exec($request);
header('Content-Type: application/rss+xml; charset=utf-8');
print $data;
(view PHP examples with browser from your web server)
Comments
Comment #1
Anonymous (not verified) commentedMore diagnosis:
The cURL output omits the search terms in the title tag. It simply says Google News.
While the wget output has them. It says Google News - Syria.
I suspect that somebody somewhere can't handle the change in direction.
Comment #2
Anonymous (not verified) commentedMeanwhile, if you would encode the URL (ahem...):
http://news.google.com/news?pz=1&hl=ar&q=%D8%B3%D9%88%D8%B1%D9%8A%D8%A7&...
cURL works.
curl --user-agent "Drupal (+http://drupal.org/)" "http://news.google.com/news?pz=1&hl=ar&q=%D8%B3%D9%88%D8%B1%D9%8A%D8%A7&..."
Comment #3
alex_b commented#2 ;-) I hear you. So this seems to be a URL encoding issue after all? wget, PHP streams are smarter in accepting input?
Comment #4
alex_b commentedSimplePie's replace_invalid_with_pct_encoding() may be worth pillaging for addressing this issue.
Comment #5
kenorb commentedClosed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.