Javascript is being indexed for search

leetamus - August 18, 2008 - 16:56
Project:OpenPackage Video
Version:5.x-4.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:jbrown
Status:closed
Description

I was getting the embed info in the node description search results so I ended up modding the search function to not print the node description. The problem is that any words in the embed info are still affecting my search results.. the big problem is my site is a youtube like site for math related videos and I need to be able to search for the word 'volume', but this brings up EVERY video in my database as '&volume=80' is in the embed tag...

is there some way to fix this?

#1

leetamus - August 25, 2008 - 18:12

if anyone wants to see this in action, check out the following link and search for 'volume'
http://schoolwaxtv.com

Thanks!
Leet

#2

dman - August 25, 2008 - 18:54

I dunno about op_video, but there is a facility in Drupal to present different content to the internal search indexer than gets displayed on the page.
This can be leveraged to either provide more or less content for indexing.
I've usually seen it in regards to custom node types however, not sure if it'll work for you right now.

see Hide CCK field/s from search results and indexing
Excluding CCK fields from indexing?
... looks like you can get there with a theme function override, where you return a rendering of your node without the offending field to a special hook_search_item() func. I was hoping for a native CCK option...

#3

dman - August 25, 2008 - 18:57

or this http://drupal.org/project/field_indexer may be a real answer...
Sounds right from the blurb

#4

leetamus - August 25, 2008 - 20:20

Thanks! Your... well... dman!

I'll look into those, sounds like I can mix something up now. My concern is that there's stuff in the body section that should be indexed, but I can at least make a custom field to put that stuff in to instead. I'll make sure to post back with any fixes in case it can help someone else.

Cheers!
Leet

#5

leetamus - August 27, 2008 - 17:57

Ok I still haven't been able to get this working.. I've looked into the suggested modules and it seems like they only filter search results by node type where I need to filter by field. (i need the body section filtered out). Upon further speculation it has occured to me that I've modded the search function within the template.php file. Mine returns the entire node content rather than just a title and snippet. Here's the function i'm using, could it be that by returning hte entire node that it's also searching the entire node?

function schoolwaxTV_search_item($item, $type) {
//$output = '

'. check_plain($item['title']) .'

';
$item['snippet'] = theme('node', $item['node']);
$output = '

'. ($item['snippet'] ? '

'. $item['snippet'] .'

' : '');
return $output;
}

sorry im not up on my php, I just modified an existing piece of code I found.

Thanks for any of your insights!

UPDATE: ok so I commented out the custom search function and im still getting the embed code in the search results.. guess my quest continues!

#6

dman - August 27, 2008 - 18:13

Yes, you should be theming your *search_item() function to only return the content you want ...
But remember that you'll have to re-index all your stuff every time you change that.
Searches are not done in real-time.

- You define what content gets searched - via the search func
- the cron process requests the pages one by one, and looks at your themed stuff
- found content is indexed
...
- when someone makes a search, the index is consulted.

... If you change the theme func to test ... it makes no difference to the results!

You need to access the search index admin, and reset it. Then run cron.php a few times. THEN see if you are getting better results.

#7

leetamus - August 27, 2008 - 21:05

Ah ok forgot to reindex before trying the search, thanks for that.

Ok so I commented out my custom search function, reindexed my site via search settings in the admin section, ran cron until i had 100% indexed and it's still happening. Here's a sample of what was outputted:

Pilot Math 7 - Statistics and Probability

... width:"450",height:"357",majorversion:"7",build:"0",bgcolor:"#FFFFFF", ... computer, Ed. This animation, from the Pilot Math 7 series, introduces the concept of Statistics by following Cameron as he ...

Video - Visitor - 08/15/2008 - 17:04 - 0 comments - 0 attachments

Title
Flash Embed info <-- the evil doer!
Body text
Stats

Hmm well i think my next task is to install a fresh copy of drupal and the opvideo module only.. then at least I can confirm it's an op_video issue...

#8

dman - August 27, 2008 - 21:33

Hm, looks like we've got the wrong end of the stick...
that theme function is only for changing the displayed results, not the content that goes into the index.
We CAN indeed adjust the content that goes in - which is what I recalled ...
But only by appending more stuff to it ... which is not what I thought.
Although there is a hook / callback we could intercept:

<?php
    $extra
= node_invoke_nodeapi($node, 'update index');
?>

... it ONLY lets us add content :(

You are stuck.

#9

leetamus - August 27, 2008 - 22:39

Thanks so much for all your help! Hopefully when I try the fresh install it won't have this problem.. or at least i'll know it's something to do with op_video and not the mishmash of modules I have running lol.

#10

leetamus - September 15, 2008 - 20:26

It still happens on a fresh install... :( I thought open package was pretty stable, but I guess it's still too early to be used in any real capacity...

#11

leetamus - September 15, 2008 - 20:26

just a final comment on this... it's definitely a bug in open project video... it's happening on every site i've checked using hte module including the creators website.

example from his site searching for 'flashvars'

http://openpackage.biz/search/node/flashvars

Leet

#12

jbrown - September 15, 2008 - 23:23
Title:swf embed info comes up in search results» Javascript is being indexed for search
Version:5.x-3.9» 5.x-4.x-dev
Assigned to:Anonymous» jbrown
Status:active» fixed

http://drupal.org/cvs?commit=140248

Fixed and backported to 5.x-3.13 .

#13

leetamus - September 16, 2008 - 16:11

Woot! thank you jbrown! Works like a charm. ;)

#14

Anonymous (not verified) - September 30, 2008 - 16:11
Status:fixed» closed

Automatically closed -- issue fixed for two weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.