The apachesolr module provides a 'related posts' block - with configuration options, that shows related nodes from the same site - this can be based on author, tags, node title, node body, and comments (and other stuff too) - it also allows for restrictions on node type, so the block shown next to issues could show only related issues, or the block shown alongside forum topics could show just forum topics, or forum topics + issues etc.

This block would be extremely useful for the issue queue to flag up potentially duplicate or closely related issues. I was reminded of this in the discussion of #1050616: Figure out backport workflow from Drupal 8 to Drupal 7 and also when I managed to open an issue with the exact same title as another one against core (except D6 so not in my default searches) and didn't notice for a couple of weeks.

Comments

yoroy’s picture

Priority: Normal » Major

Sub. This is one way of making better use of issue tags and improving queue workflow in general. Do major feature requests exist? Why, yes they do! We need to improve our core dev tools to better handle the scale we're at.

pwolanin’s picture

With a small amount of work, for issues we could limit the MLT suggestions to non-closed issues from the same project. Need to make sure the right data is in the index.

catch’s picture

I think I'd actually keep closed issues in there, especially if it's more work to exclude them. Sometimes issues are fixed and closed weeks before I find them on client sites, and those can be the hardest to actually find since they're excluded in quick scans of the issue queue too.

pwolanin’s picture

Ah, good point

yoroy’s picture

Issue tags: +prairie

tagification…

gdd’s picture

So what needs to be done for this to happen? It would be extremely useful as the initiatives get off the ground as well. I'm happy to help and have been wondering how to get started on d.o development for a while.

yoroy’s picture

Project: » Drupal.org infrastructure
Component: Download and Extend » Solr

Switching component for a bit, requesting feedback from d.o. engineers on how to proceed here.

I just reviewed our 'improve issue queue work' options again and this seems to be the most actionable and doable quick & big win we should work on.

Don't let any Praire big pictures hold this back. "Show related stuff" is in scope :)

As for designing this:
Would it be helpful to think of this as a block on your d.o. dashboard?
What are the 3 sane defaults for how to relate things and filter them?

pwolanin’s picture

So, the main question is whether this needs to be limited to issue filed against the same module or not?

That might not be hard, depending on what Project module is putting in the index, but is not something you could do in the current UI, since it doesn't have any options to filter based on properties of the node being viewed.

catch’s picture

To be honest I could see use cases for not limiting it by module. That'd be useful for bugs where there's an error in form.inc or field.module but it's due to various issues in contrib code - seeing the other issues in that case would help with cross-project duplicates. There's also modules that are closely related like media and media gallery (and etc.).

If it turns out this meant there was a lot of unrelated noise, then we could look at limiting it later. Another option would be to add the project name to the index if it's not there already, and weight that really high for relevancy. But these all seem like things that could be looked at after giving it a trial run.

pwolanin’s picture

ok, so it would be really easy to enable if we don't try to filter based on the properties of the current node, so let's go ahead with that perhaps?

Do we need to test it on the staging site?

yoroy’s picture

I too have no objections to starting out without limiting by module/project first and adapting from there

Any chance of a screenshot of some default output then? Just linked node title links? What does the header say? How to handle the 'read more' link to show full results listing?

Because once wireframing, first thing I did is pull in some properties of the current node :)

relatedissuesblock - Justinmind Prototyper

This might well be entirely too tag-based thinking, but that's what we use now to relate issues into Topics.
(Benefit: tags are created and assigned by humans. Cons: tags are created and assigned by humans)

catch’s picture

Status: Active » Needs work

I now have a d.o sandbox. I've created a new recommended content post on there.

http://dashboard-drupal.redesign.devdrupal.org/node/1143706 - this shows it in action.

Issues:

- What is the deal with deploying a block like this on d.o? Does it have to be in a feature? Update function? Are blocks still enabled/disabled via the UI?

- We need to ensure this only shows on issues and/or forum topics. That's easy to do with PHP block visibility, or with a custom block that does its own logic then pulls in the content of the one provided by solr.

- It is not possible to restrict the content of the block by node type. This means that project pages, handbook pages, forum topics, issues can all be in there. I actually think this might be somewhat of a feature - we want to minimize duplicate support requests, feature requests etc. so if they're relevant they're relevant. If it didn't work out so great, that could be worked on later.

- the sandbox only has a very small number of actual issues on it (I think it is x number of recent issues) not the full d.o database. So the actual relevancy of results can't really be tested on there accurately.

catch’s picture

These two issues would be good ones to check since they both ran for months and ended up containing exactly the same patch.

http://drupal.org/node/766382

and http://drupal.org/node/1013864

catch’s picture

Project: Drupal.org infrastructure » Drupal.org site moderators
Component: Solr » Redesign

I'm also moving this back out of the infra queue, whatever we do this isn't ready for deployment.

catch’s picture

Discussed this with webchick in irc:

For block visibility, we need a custom block in drupalorg module and handle it there.

Manual setup for the actual apachesolr block on Drupal.org is probably OK.

For results, webchick wanted a node type filter to limit this to project_issue only, I'll try to summarize why I think we should avoid that if possible:

- We can show the block both on issues and forum posts.

- Duplicate issues tend to get posted between both issues and forums - cross-linking automatically in both directions hopefully has some value (showing forum posts that should have been issues, showing people in forum posts that there's a relevant issue).

- some people will open issues for things that are documented in the handbook like the troubleshooting guide (especially support and feature requests), if we show handbook pages that are actually relevant, that could be nice.

The main source of what is probably going to be genuine noise is actual project pages - at least on the sandbox that is really bad.

I am really hopeful that this is also a data issue, ran these queries on the sandbox:

mysql> SELECT COUNT(*) FROM node WHERE type = 'project_project';
+----------+
| COUNT(*) |
+----------+
|    11779 | 
+----------+
1 row in set (0.09 sec)

mysql> SELECT COUNT(*) FROM node WHERE type = 'project_issue';
+----------+
| COUNT(*) |
+----------+
|    23558 | 
+----------+

So hopefully that is why we're getting such weird results.

Here's what I'd like to do to get this deployed:

1. drupalorg patch to handle the block visibility so it only shows up on project-issue and forum topic node types.
2. If there is a scratch site with the full database, move it to there for testing with the real-ish data set.
2b. Deploy it on Drupal.org but with the block disabled by default - a handful of people could then enable it for their accounts and compare notes on how the results look.

If the results are good enough or better than nothing, then we could make the block enabled by default - then open followup issues to refine the results (I haven't done much or any solr work so help from someone familiar with that would be great).

webchick also mentioned some kind of indicator as to node type by the titles of nodes - that could be useful, not sure how much that is to implement.

It would be pretty cool to have a 'more' link and go to a special page, but that is definitely going to require custom solr work.

The other concern here is how much load this would create on solr. I'd think we could definitely cache the block for 24 hours per-page or similar, but it will be a lot of solr requests regardless.

catch’s picture

Project: Drupal.org site moderators » Drupal.org customizations
Version: » 6.x-3.x-dev
Component: Redesign » Code
Status: Needs work » Active
pwolanin’s picture

Regarding the load on the Solr server - there should be Varnish caching when the same MLT query is made multiple times. However, if we assume lots of people browsing lots of pages (so low cache hit rate), it's still going to be close to 1 extra query per issue or forum node visited.

The basic code for the MLT block is pretty trivial. We can request the node type back in the results if we want to concat that to the title somehow.

Gerhard Killesreiter’s picture

We'll deploy a new pair of solr servers shortly, maybe we can postpone that until then?

dww’s picture

See also #19386: Automatically search for duplicate issues/questions before submitting new issue/question -- that seems like a more useful time/place to be asking solr for duplicates. And http://drupal.org/project/uniqueness already exists for exactly this purpose (as linked in #19386).

catch’s picture

@dww I think this is a different thing. No matter when we install something like uniqueness, there are going to be thousands of open, duplicate issues in the system already.

Also if you look at issues like #561422: Replace strtr() with str_replace() for db prefixing and #42827: optimize/enhance db prefixing those have ended up as duplicates, but didn't start out that way - so even with the search-first method a block could well be useful.

dww’s picture

@catch: agreed. I didn't say this was dup. Just cross-linking since folks that care about this probably also care about that. ;) I agree with all your points on why we want both.

Cheers,
-Derek

pwolanin’s picture

Are the new servers deployed? Which version of Solr are they running?

mgifford’s picture

Version: 6.x-3.x-dev » 7.x-3.x-dev
Issue summary: View changes

This would be very useful!

mgifford’s picture

Issue tags: +apach solr, +search
mgifford’s picture

Just adding in a link back to the GDO page where this is being discussed
https://groups.drupal.org/node/133169#comment-445024

catch’s picture