Atom feed retrieving images

Gidgidonihah - May 13, 2008 - 19:23
Project:Aggregation
Version:5.x-4.3
Component:Code
Category:support request
Priority:normal
Assigned:Unassigned
Status:active
Description

I need to get images from enclosures on atom feeds, and I'm having two problems with this one.
First, get the images from the enclosures.
Second, getting images with spaces in the filename to work.
I've made some progress but some help would be appreciated.

Please don't get bored with this first issue. Take a gander at the second one as well.

Problem #1

Adjusting the code to get it to get images from enclosures.
The code from the rss20.inc simply didn't work, so by debugging a bit, here is what I found.
This is what the news item looks like:

<?php
SimpleXMLElement
::__set_state(array(
  
'title' => 'CS grad student presents his research in France',
  
'link' =>
  array (
   
0 =>
   
SimpleXMLElement::__set_state(array(
      
'@attributes' =>
      array (
       
'rel' => 'alternate',
       
'type' => 'text/html',
       
'href' => 'http://college.edu/article/2007-10-03-cs_grad_student_presents_his_research_in_france',
      ),
    )),
   
1 =>
   
SimpleXMLElement::__set_state(array(
      
'@attributes' =>
      array (
       
'rel' => 'enclosure',
       
'type' => 'image/jpeg',        'href' => 'http://college.edu/files/images/0609-59+043+.feature.jpg',

       
'title' => '0609-59 043 .jpg',
      ),
    )),
  ),
  
'id' => 'http://college.edu/article/2007-10-03-cs_grad_student_presents_his_research_in_france',
  
'published' => '2007-10-03T10:25:47-06:00',
  
'updated' => '2007-10-03T10:30:10-06:00',
  
'author' =>
 
SimpleXMLElement::__set_state(array(
    
'name' => 'Computer Science Department',
  )),
  
'category' =>
 
SimpleXMLElement::__set_state(array(
    
'@attributes' =>
    array (
     
'term' => 'Feature Story',
    ),
  )),
  
'summary' =>
 
SimpleXMLElement::__set_state(array(
    
'@attributes' =>
    array (
     
'type' => 'html',
    ),
  )),
  
'content' =>
 
SimpleXMLElement::__set_state(array(
    
'@attributes' =>
    array (
     
'type' => 'html',
    ),
  )),
))
?>

So that shows me that the image enclosure I want to get is in the link array. Great. How do I get at it?

So using that info, I've hacked up some code to make it work. I'm sure it's not the best approach, but since atom feeds can have enclosures, shouldn't a method be included?
Here is my code:

<?php
  $image
= array();
    if (
$news->link[1]) {
      foreach(
$news->link[1]->attributes() as $name => $value){
        if(
$name == 'href'){
       
$image[] = array('url' =>  $value);
        break;
      }
    }
  }
?>

Now that works, but it has to loop through the attributes of the link array. Not the most efficient. And it only works on the 2nd item in the link array since this is where it's stored for the feed I'm working on.
Is there a better approach?
How can we make enclosure support standard?

Problem #2

The feed I'm using, unfortunately, has some images saved with spaces in the names. The spaces come in the feed encoded as a plus sign. i.e. example.com/image+1.jpg
That actually returns a 404 error. I've let the webmaster of that site know that he should fix that. In the mean time, I'll work around it on my end:

<?php
preg_replace
("/\+/", "%20", $url);
?>

That should do it. Tried that and aggregation now realized that there was an image, and created a node for it. However the image wasn't downloaded. I put the link in a browser as example.com/image%201.jpg and instead of showing the image, the location changed to example.com/image 1.jpg and then image appeared.

Odd. Well lets not use the encoding then, we'll just use a space in the url:

<?php
preg_replace
("/\+/", " ", $url);
?>

Same thing happens. A node is created for the image, but the image isn't copied. It shows up as alt text instead.
Is this a problem aggregation is having with urls with spaces in them?

#1

Gidgidonihah - May 13, 2008 - 21:13

Another problem I realized, is that no image is detected when the image link is a thumbnail/cache script.
e.g. visiting the link in the atom feed,
http://www.chem.byu.edu/Site/SiteFeatures/AligningAKrPumpedFCenterLaser/Image(resize)?height=200
would redirect you to a cached version of the thumbnail at
http://www.chem.byu.edu/imagecache/jpg/707/a-h200-.jpg

On links like this, the imageid is 0 despite being passed the url on creation.

 
 

Drupal is a registered trademark of Dries Buytaert.