I'd like to know how one goes about making comments searchable. Any ideas?

Comments

gpk’s picture

Comments are automatically indexed and for search purposes are considered part of the node to which they are attached.

gpk
----
www.alexoria.co.uk

SomebodySysop’s picture

If the search results are in the comment, and you have 50 to 100 comments attached to a node, it is almost impossible to locate the actual comment where the search results are located. You get the node back, but nothing which tells you which individual comment contains your search words.

Is there something that has been done, can be done about this?

gpk’s picture

Even more complicated case: if the words you've searched for are in different comments ...

Some websites/systems highlight the searched-for words in colour, but I'm not aware of any Drupal modules that do this. (Yet.)

gpk
----
www.alexoria.co.uk

SomebodySysop’s picture

Anyone know of any work done in this area? Anything at all?

Or, if not, does anyone know if when the comments of a node are indexed, any notation at all is made with reference to the comment id?

gpk’s picture

If you look in the {search_index} table in the DB you will see that each "word" is referenced against a sid of type "node". The sid is the node id.

The 2 functions that really do the biz w.r.t. node indexing/searching are http://api.drupal.org/api/function/node_update_index/5 and http://api.drupal.org/api/function/node_search/5. You'd probably have to implement your own custom module if you want to index comments individually. Alternatively you might be able to do something along the lines of http://api.drupal.org/api/function/search_excerpt/5 to highlight search terms (this is called from the bottom of node_search()).

HTH

gpk
----
www.alexoria.co.uk

SomebodySysop’s picture

In order to get to where we want, we need the comment ID. Is that stored, or is there a way to sotre it and retrieve it?

If you look in the {search_index} table in the DB you will see that each "word" is referenced against a sid of type "node". The sid is the node id.

gpk’s picture

Since the entire node body and all comments are treated as a single "item" of text for search purposes, the comment ID where the search terms appear is not stored by Drupal. In fact there may not be a single comment which contains all the search terms.

Given the node id (say, $nid) of a result then it's quite easy to get all attached comments:

$comments = db_query('SELECT * FROM {comments} WHERE nid = %d AND status = %d', $nid, COMMENT_PUBLISHED);

If you loop through the results of the query, fetching each $comment object. then $comment->cid, $comment->subject and $comment->comment are probably the properties you want - should be relatively straightforward to scan the "subject" and "comment" for the search terms (keys), but obviously this requires a bit of coding.

HTH,

gpk
----
www.alexoria.co.uk

SomebodySysop’s picture

Searching for the keywords actually isn't a bad idea, but I was thinking a more accurate search would be to search for the "search_except": http://api.drupal.org/api/function/search_excerpt/5 which would contain an exact phrase that we could match for.

Problem is that search_except returns text that is formatted for "snippets" when all I want is the text around the keywords.

But, the idea of searching comments from the node(s) returned seems very promising. Just have to figure out a logical way to search.

gpk’s picture

>search_except returns text that is formatted for "snippets" when all I want is the text around the keywords
If you strip tags from the text in the snippet and explode it using  ...  and also strip tags from the node body and individual comments then you should find I match quite easily I would think?

gpk
----
www.alexoria.co.uk

SomebodySysop’s picture

Logically won't work because the snippets sometimes contain words from either other comments or the node itself.

          drupal_set_message(' gid ' . $gid . ' | group_node ' . $group_node->title);
          $snippet=strip_tags(search_excerpt($keys, $node->body));
		  $content = explode("...", $snippet);
          // Get comments
		  $result = db_query('SELECT * FROM {comments} WHERE nid = %d AND status = %d', $node->nid, COMMENT_PUBLISHED);
          while ($comments = db_fetch_object($result)) {

		    foreach ($content as $line) {
			  $subject = strtolower(strip_tags($comments->subject));
			  $comment = strtolower(strip_tags($comments->comment));
              $line = strtolower($line);

              drupal_set_message('line ' . $line);
              drupal_set_message('subject ' . $subject);
              drupal_set_message('comment ' . $comment);

		      if (!empty($line) && (strpos($subject, $line) !== FALSE || strpos($comment, $line) !== FALSE)) {
		        drupal_set_message($line . ' | comment cid = ' . $comments->cid);
			  }
		    }
          }

So, I'll have to search for the words. Problem is that I don't know whether to do AND or OR search once I get the keywords in $show_keys below:

      // Extract positive keywords and phrases
      preg_match_all('/ ("([^"]+)"|(?!OR)([^" ]+))/', ' '. $keys, $matches);
      $show_keys = array_merge($matches[2], $matches[3]);
      foreach ($show_keys as $key=>$value) {
        drupal_set_message(' key ' . $key . ' | value ' . $value);
	  }
gpk’s picture

>Logically won't work because the snippets sometimes contain words from either other comments or the node itself
But if you just look at (say) the first non-trivial $line, after stripping and trimming it and collapsing all whitespace to a single space AFAICS you should always get a "match" in either the node body or a comment - is that good enough? I hadn't thought of searching for the snippet - looks good to me :-D

The only snag I can see is that the default snippet doesn't I think include the node title or comment subject even if that was where the search keys actually occurred.

gpk
----
www.alexoria.co.uk

SomebodySysop’s picture

This is clunky, and only displaying info right now, but it is obtaining the cid if the keywords are found in the comment or comment subject:

      // Extract positive keywords and phrases into $show_keys
	  // Determine number of search phrases in $show_terms
	  $search_terms = 0;
      preg_match_all('/ ("([^"]+)"|(?!OR)([^" ]+))/', ' '. $keys, $matches);
      $show_keys = array_merge($matches[2], $matches[3]);
      foreach ($show_keys as $key=>$value) {
        if (!empty($value)) $search_terms++;
        drupal_set_message(' key ' . $key . ' | value ' . $value);
	  }

Then, later:

          $results[] = array(
            'link' => url('node/'. $item->sid),
            'type' => check_plain(node_get_types('name', $node)),
            'title' => $node->title,
            'event' => $event_link,
            'user' => theme('username', $node),
            'date' => $node->changed,
            'node' => $node,
            'extra' => $extra,
            'score' => $item->score / $total,
            'group' => l($group_node->title, 'node/'. $group_node->nid),
            'snippet' => search_excerpt($keys, $node->body),
          );

		  // In case I ever need to use the snippet text
          //   $snippet=strip_tags(search_excerpt($keys, $node->body));
          //   $content = explode("...", $snippet);

          // Get comments for this node
		  $result = db_query('SELECT * FROM {comments} WHERE nid = %d AND status = %d', $node->nid, COMMENT_PUBLISHED);
          while ($comments = db_fetch_object($result)) {

            $found = 0;

			// Go through keywords to see if they match this comment
		    foreach ($show_keys as $key=>$search) {
              // Change everything to lower case just to be sure
			  $subject = strtolower(strip_tags($comments->subject));
			  $comment = strtolower(strip_tags($comments->comment));
			  $search = strtolower($search);

              drupal_set_message('search ' . $search);
              drupal_set_message('subject ' . $subject);

		      if (!empty($search)) {
		        // Look for match in subject or comment
		        if (strpos($subject, $search) !== FALSE || strpos($comment, $search) !== FALSE) {
				  $found++;
                  drupal_set_message('FOUND MATCH! ' . $search . ' found = ' . $found . ' | search_terms = ' . $search_terms);
			    }
			  }
              // If all the keywords are found in this comment, then get the cid
              if ($found == $search_terms && $search_terms > 0) {
                drupal_set_message('MATCHED | COMMENT CID = ' . $comments->cid);
	          }
		    }
          }

		}

This works perfect for AND searches for phrases or keywords. But, what about OR searches?

I don't even know how to tell if the search is an OR search from the keyword array.

SomebodySysop’s picture

Well, I'm a lot further down the road. I'm actually listing comments on the search results. But, my links aren't working. I wish to link directly to the comment:

    'link' => url('node/'. $item->sid . '/' . $comments->cid . '#comment-' . $comments->cid),

I want it to return this:

    http://www.mysite.com/node/87/21%2523comment-21

But, instead, it returns this:

    http://www.mysite.com/node/87/21%2523comment-21

Which, of course, won't doesn't work.

Any suggestions?

Here's the code I'm rolling with so far:

      // Extract positive keywords and phrases into $show_keys
	  // Determine number of search phrases in $show_terms
	  $search_terms = 0;
      preg_match_all('/ ("([^"]+)"|(?!OR)([^" ]+))/', ' '. $keys, $matches);
      $show_keys = array_merge($matches[2], $matches[3]);
      foreach ($show_keys as $key=>$value) {
        if (!empty($value)) $search_terms++;
	  }

and

          $results[] = array(
            'link' => url('node/'. $item->sid),
            'type' => check_plain(node_get_types('name', $node)),
            'title' => $node->title,
            'event' => $event_link,
            'user' => theme('username', $node),
            'date' => $node->changed,
            'node' => $node,
            'extra' => $extra,
            'score' => $item->score / $total,
            'group' => l($group_node->title, 'node/'. $group_node->nid),
            'snippet' => search_excerpt($keys, $node->body),
          );

		  // In case I ever need to use the snippet text
          //   $snippet=strip_tags(search_excerpt($keys, $node->body));
          //   $content = explode("...", $snippet);

          // Get comments for this node
		  $result = db_query('SELECT * FROM {comments} WHERE nid = %d AND status = %d', $node->nid, COMMENT_PUBLISHED);
          while ($comments = db_fetch_object($result)) {

            $found = 0;

			// Go through keywords to see if they match this comment
		    foreach ($show_keys as $key=>$search) {
              // Change everything to lower case just to be sure
			  $subject = strtolower(strip_tags($comments->subject));
			  $comment = strtolower(strip_tags($comments->comment));
			  $search = strtolower($search);

		      if (!empty($search)) {
		        // Look for match in subject or comment
		        if (strpos($subject, $search) !== FALSE || strpos($comment, $search) !== FALSE) {
				  $found++;
			    }
			  }
              // If all the keywords are found in this comment, then get the cid
              if ($found == $search_terms && $search_terms > 0) {
                $results[] = array(
//                  'link' => url('node/'. $item->sid . '/' . $comments->cid . '#comment-' . $comments->cid),
                  'link' => url('node/'. $item->sid . '/' . $comments->cid),
                  'type' => check_plain(node_get_types('name', $node)),
                  'title' => $comments->subject,
                  'event' => $event_link,
                  'user' => theme('username', $node),
                  'date' => $node->changed,
                  'node' => $node,
                  'extra' => $extra,
                  'score' => $item->score / $total,
                  'group' => l($group_node->title, 'node/'. $group_node->nid),
                  'snippet' => search_excerpt($keys, $node->body),
                );

	          }
		    }
          }
gpk’s picture

Try using the $fragment argument to http://api.drupal.org/api/function/url/5.

gpk
----
www.alexoria.co.uk

SomebodySysop’s picture

The $fragment argument is exactly what I needed. Thanks!

SomebodySysop’s picture

Here's a results snippet:

... options. There is also a button on your options page <b>that will</b> email your current password to you. Invalid Credentials (49) ... back online [12:40:41 PM] Ron Parker says: If I can <b>solve the problem</b>, do I get the rest of Labor day off? [12:41:19 PM] Mark A ...

The search was: "that will" and "solve the problem"

Now, the snippet you above is derived from two different comments. Because all comments are considered part of the node, the two phrases cause a hit. However, because my comment search is an "AND" search, I don't get any comment hits because I'm looking for both phrases in one comment.

If I make the comments search an OR search, that will make it extremely difficult to do fine tuned searches. So, what to do?

Now I see why this hasn't been resolved.

gpk’s picture

I think we are probably agreeing that both phrases in the snippet won't necessarily be in one single comment, and nor will all the search keys necessarily be, in a default "AND" type search.

If you really want that then you'd need to implement your own custom module to search comments. Probably not to difficult - just a matter of massaging the search stuff already in node.module.

gpk
----
www.alexoria.co.uk

SomebodySysop’s picture

What I've got now is a search result that will include comments if all the keywords used are found in the comment or the comment subject. It's not perfect, but it's better than what I have now -- searches that return snippets from comments within a node with no way to locate the comment other than visually scanning the comment text. That's fine if your nodes only have 2 or 3 comments. But, in my case, I have a node with 93 comments!

In this case, the code I've created will help me to navigate directly to specific comments I'm looking for.

Someone else will have to take it to the next level.

@gpk, Thanks for all your help!

If anyone is interested in the final code, I'm happy to post it. The code as written assumes that the OG module is installed (it organizes content by groups), so it's fairly particular to my needs, but I'll share.

richdadloc’s picture

I think we are probably agreeing that both phrases in the snippet won't necessarily be in one single comment, and nor will all the search keys necessarily be, in a default "AND" type search.

If you really want that then you'd need to implement your own custom module to search comments. Probably not to difficult - just a matter of massaging the search stuff already in node.module.

arminbw’s picture

any "official" code available?

shablm’s picture

I have a content type called 'ticket'; its Node type for comments is 'interaction'.

I want a search interface which searches through only these two (ticket,interaction) content-types; and output the node-link/title

Also, even though matches are found in interactions, i should be displaying the respective ticket node-link/title. (since interaction are actually the comments for the ticket node)

Can anybody please help me achieve this?

Thanks!