Currently the indexed content for a node is the result of a regular 'full node' render. Aside from obvious limitations (you can't hide specific fields), an embarrassing consequence is that CCK field _labels_ get indexed. Give a cck field the label 'foo', search the site for 'foo', all the nodes of the given content types show up as results. This is nonsense - field labels are not the node's actual content.
This patch simply adds a 'search_index_render' flag to the node before it's being rendered. I do know this is a miserable hack. What we actually need here are Eaton's 'rendering styles', but as those become less probable for D6, it's the only way I can think of right now to stop keeping the above behaviour for another release.
Plus of course this would open the door to the cool 'only index specific CCK fields' feature.
| Comment | File | Size | Author |
|---|---|---|---|
| #20 | node_build_mode.patch | 2.37 KB | gábor hojtsy |
| #16 | search_index_render_style_0.patch | 1.72 KB | yched |
| #13 | search_index_render_style_2.patch | 2.3 KB | pwolanin |
| #11 | search_index_render_style.patch | 1.72 KB | yched |
| #10 | search_index _render_style_1.patch | 1.77 KB | yched |
Comments
Comment #1
eaton commentedThe node rendering and node styles patches do solve a lot of these niggly issues, both for indexing and the (currently broken) generation of RSS feeds. Those patches are unlikely to go in, obviously, so this at least allows the problem to be worked around. Perhaps we could use a similar 'in_feed' flag, similar to the 'in_preview' flag that lets modules see a node is being previewed when they go to alter it.
Comment #2
gábor hojtsyCan't we come up with a *simple* flag system instead? Dries will definitely have opinions on this.
Comment #3
eaton commentedDo you have anything in mind when you say 'simple?' maybe something like consistent $node->build_for_preview, $node->build_for_index, and $node->build_for_feed?
Comment #4
gábor hojtsyI would say $node->build_mode to be either BUILD_JSON or BUILD_RSS and so on (constant values), or something along the lines. That is more extensible than having differently named properties at least as it occurs to me. I am not sure the variable and constant names are good enough here, but this seems to be better then having many different properties...
Comment #5
freeman-1 commentedI've made a patch adopting the ideas here. It's really not that different from yched's original.
There's a corresponding patch for the CCK Field Permissions module that uses this 'build_mode'. See here => http://drupal.org/node/133113#comment-263225. It limits the core search indexing to publicly viewable CCK fields.
Note my patch is for the 5.1 core and not 6.x-dev - coz I'm testing and using this in a few production sites.
Comment #6
freeman-1 commentedHave tested this in my setup and am changing status. Need more testers and feedback.
Comment #7
yched commentedI think this is more inline with what Gabor had in mind.
Patch is for D6.
I also updated the thread title to reflect the slightly broader angle this is taking. I'd like to point though that IMO the real issue remains search index rendering, as I explained in my original post.
Comment #8
yched commentedAs Moshe suggested, current title is not really selling :-)
Comment #9
pwolanin commentedwhy not use this flag for previews too?
Comment #10
yched commentedActually previews already uses its own flag (in_preview = TRUE)
Attached patch unifies this with the other $node->build_mode flags.
I also removed the bitwise constants, they do not really make sense here, since render styles are mutually exclusive - we use plain old integer constants instead.
Comment #11
yched commentedfixed whitespace in the patch filename, and removed extraneous empty line after the define's- thx pwolanin :-)
Comment #12
pwolanin commentedpatch applies and looks fine - obviously the use cases are in contrib.
Comment #13
pwolanin commentedA good point from dmitrig01: a normal model should be defined by default.
Comment #14
dmitrig01 commentedready to go
Comment #15
chx commented/me likes patchy
Comment #16
yched commentedrerolled
Comment #17
yched commentedJust bumping one last time before code freeze.
Amongst other things, this patch lets cck fix a really annoying behaviour : field labels are indexed as being part of the nodes content. Search a word being used as a label for a cck field, and you get all the nodes of this content type as results, flooding the nodes that are actually relevant for the word.
Additionnaly, it opens the door for interesting features like : select which fields get indexed, select which fields get included in feeds... Those are 'only' features. The previous point is a bug that can only be fixed in core
Comment #18
douggreen commentedExtra text in the search index is a problem. I like the feature. I'm not thrilled about the implementation, but I don't have a better suggestion today, and I'd like to see something make it into D6. I would use this flag in vocabperms module, which is similar to cck_field_perms, but for taxonomies.
+1
Comment #19
douggreen commentedBTW, I pulled the latest CVS head, applied the patch. enabled the search.module, and ran cron.php all without errors.
Comment #20
gábor hojtsyWell, the NORMAL mode was missing from the reroll, but was an important piece IMHO, so I added it back in. Committed the attached patch!
Comment #21
yched commentedOops, I could have sweared I added the NORMAL mode back. Sorry about that - thanks Gabor.
Comment #22
(not verified) commented