The "Alternate text" and "Title" elements of images, if present, should be added to the search index upon indexing. this would be done simply with code like this:

/**
 * Implementation of hook_nodeapi()
 */
function imagefield_nodeapi($node, $op) {
  if ($op == 'update index') {
    // get all alt and title attributes from all image fields on $node
    // concatenate them, and return the $text
    $text = alt and title attributes;
    return $text;
  }
}

Comments

quicksketch’s picture

Status: Active » Postponed (maintainer needs more info)

ImageFields are already indexed in the search according to the display settings on your site at admin/content/node-type/[node-type]/display/search. Is additional text really necessary?

robertdouglass’s picture

I think it is because 1) it exists, 2) google thinks it's important, and 3) it's in the browser but not in the search. Since those texts end up inside of the img tag they get stripped out of all search implementations and get lost. So the very easy implementation is worth it, in my opinion. The only part that wasn't clear to me is how to get a list of imagefields that a node has so that you can most easily mine this information out. The return value just has to be the text concatenated together (separated by a space).

quicksketch’s picture

We should probably use the hook_field($op == 'search_index') rather than implementing our own hook_nodeapi($op == 'search_index'), this way it'll automatically affect ImageFields on a per-field basis. I'm not sure what affect if any this has on the display settings for search indexing.

quicksketch’s picture

Seems I just made-up hook_field($op == 'update index'), so I guess we're back to hook_nodeapi().

Anyway, if I'm understanding this correctly, this is necessary because the core indexing process isn't smart enough to read alt and title tags (which sounds like a bug in core)? While I know that returning strings of this text will technically work, it just seems really hacky. The title and alt tags have different purposes after all, just returning them as strings gives them equal importance.

robertdouglass’s picture

Maybe it's a core bug. But taking responsibility for it here guarantees it works with all search implementations (solr, xapian, luceneapi). The importance issue isn't big to me. It's about discovery. If I write "banana cream pie" in my alt tag (but only there) and can't find the node later when I search for "banana", I'd be confused.

On the other hand, this is far from a critical issue. If you've got lingering doubts, maybe it's best to keep it postponed.

wuwei23’s picture

Subscribing.

This _is_ somewhat of a critical issue for me. I'm working on a project that is embedding scans of sections of historical journals into articles. The OCRed text needs to exist in the ALT tag, as we still want it to display in the absence of images.

Ideally, I'd like the ALT text in this instance to be searchable, rather than forcing the users to add it twice: once to the ALT and once to an invisible field for searching.

robertdouglass’s picture

Status: Postponed (maintainer needs more info) » Active

So I've now encountered more and more people who feel that this is a worthwhile feature. I'm moving back to active for you to weigh in on, quicksketch. If you green light the issue perhaps I can come up with a patch.

quicksketch’s picture

Okay, fair enough. I'd be happy to review a patch.

1websitedesigner’s picture

I can't help with a patch but would like to add myself to the list of people who feel this is important. I'm a website designer and work with a lot of artist's websites and adding ALT tags helps these sites to appear higher with search engines in 'Image' results (which is how people tend to find art sites).

Thanks for your work on this!

aterchin’s picture

subscribing 100%

robertdouglass’s picture

Status: Active » Needs review
StatusFileSize
new1.31 KB

Here's a patch. Maybe there's a better way to use the content API to find out if any given node has an imagefield.

robertdouglass’s picture

StatusFileSize
new1.36 KB

Slightly more efficient version. Only caches the imagefields, thus lowering the memory footprint and speeding the looping on each particular node.

wuwei23’s picture

Hey robert, thank you for the patch!

Should this be applied against the CVS HEAD? I initially tried it on the 6.x-3.0 official release but rebuilding the index gives me the following error repeatedly (it looks like once for each imagefield field):

warning: Invalid argument supplied for foreach() in /var/www/qhist/sites/q150.library.uq.edu.au/modules/imagefield/imagefield.module on line 143.

Update: the same error occurs with CVS HEAD.

robertdouglass’s picture

@wuwei23 - that looks like a problem in my code. I developed this against 6.x-3.x-dev, I believe. The code needs a check for if (!empty($fields)) { ... } around the second foreach loop to get rid of the warning. The code should provide the desired functionality, though, despite the warning. I'll reroll when I get a chance, or quicksketch can make the change when he tests it.

wuwei23’s picture

robert: Thanks for the reply, I'll add the conditional myself and test it here too.

Cheers!

quicksketch’s picture

Version: 6.x-3.x-dev » 6.x-3.2
Status: Needs review » Fixed
StatusFileSize
new1.2 KB

I gave this another looks and corrected the issue mentioned in #14. I also used the existing filefield_get_field_list() function to pull in a list of fields rather than building a similar routine by hand. This will be in the 3.3 version of ImageField which will be out shortly.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.