I have created a recommendations block with PHP for the Pivots recommendation system. The block is only visible to site maintainers on http://drupal.org/project/*

Please provide feedback.

CommentFileSizeAuthor
#30 snapshot3.png2.12 KBoadaeh
#1 Recommendation.JPG43.25 KBHeine
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Heine’s picture

FileSize
43.25 KB

It looks like the block is missing some styling (even after forced refreshes). The rectangular heading clashes with the rounded corners of the containing block. The heading also wraps in an ugly way on smaller window sizes (screenshot) and makes the block feel even more cramped than usual.

The list items do not seem to have proper margins (topmargin) and when wrapped create a rather messy, difficult to read view.

webchick’s picture

Initial impressions:
- Overall I find this interesting, as a module developer. It has "bubbled up" conversations about my module that I had no idea existed, since most non-developers use forums as a support outlet and not the issue queues. It also will help pull developers into the forums again, since we mostly stick to issue queues exclusively.

- It also presents an opportunity for a developer to find places in the forums people have talked about offering funding for development on a module, since most "normal" users don't really "get" the concept of issue queues; at least not right away. Developers think they prefer issue queues for support requests, but very few of us are so on top of our issue queues that we end up answering questions before someone else. Therefore, these probably are actually better off in the forums, really.

- The "other modules mentioned" seems to be largely filled with totally irrelevant project names.
For example: http://drupal.org/project/og

Not a single one of those projects (with the exception of "Groups," which is unpublished) has anything remotely to do with Organic Groups. I would expect to see projects here like OG Roles and OG Vocabulary and so on. I notice similar results on every single module I view. This algorithm either needs to be tweaked heavily, or else done away with imo.

- Also, it's a bug to show unpublished projects in this list. ;)

- The list of related forum topics is weighted rather weirdly. I would expect posts that mention my module in the title to be listed above those that list it somewhere in the node, listed again above ones that just have a mention in a comment. Of course, if it's mentioned *multiple* times in the comments, that might bubble it higher.

Overall, promising, needs some work.

sun’s picture

I'm looking at http://drupal.org/project/img_assist

As mentioned in IRC:
- I like this block.
- I'm not entirely sure if it will help anyone if it'll stay particularly on this position, containing links to forum topics.

Ideas raised:
- Displaying this block on forum threads, pointing to possibly related project issues and forum threads would help.
- Displaying this block on project issues, pointing to possibly related issues, would be awesome.

Caveats raised:
- Determining the topic of a forum thread or project issue is a hard task (however, that would be a real challenge for universities).

oadaeh’s picture

I like the "Recent discussions about this module" part, but I think the "Projects referenced in the same conversations" part is not nearly as useful. I would much rather have links to where the module was mentioned, rather than to the module's project page, which I think I can find easily enough on my own.

@webchick: they appear to me to be sorted by creation date of the item (node or comment) that mentions the module, with the newest one at the top, which makes sense, but I see your point, too.

aclight’s picture

Regarding webchick's comment about other projects listed at http://drupal.org/project/og, Groups isn't the only unpublished project on that list. There's also File, Taxonomy, and Profile, all of which are now unpublished.

I don't know how the algorithm for the related projects works, but it seems to me that it preferentially picks projects with names that are short. This isn't terribly surprising, but it also makes it much less useful, IMHO. Single word modules like Event, as an example, might be "mentioned" all over the place, but my guess is that the percentage of the time a user is referring to the Event module when "event" is found in text is small. In a small sample of project pages I looked at, the only 3 modules I saw recommended that were more than 1 word were CCK, TinyMCE, and Organic groups. Those three all stand out in that the individual words in their titles are probably less likely to be used outside of the context of mentioning the module.

christefano’s picture

Will this be deployed for all users at some point? I can do without the "projects referenced in the same discussions" list but I think it would actually be useful (if unpublished nodes are removed from the results, of course) for a large percentage of the community.

My initial thoughts:

  1. At first the Recommendations block only had a "Loading..." message. I had to reconfigure NoScript before Firefox would load the script from scratch.drupal.org.
  2. Posts in deprecated forums are shown. I'm not sure how that should be addressed.
  3. I'd like to see issues as well as forum posts. Right now the issue queues are more important to me than the general forum.
pivots_test’s picture

As the developer of this feature, I really appreciate all the comments so far. I'll improve the code based on the feedbacks.

Here's the result of the survey from 47 responses:
* Whether “related conversations“ are useful -- 32 Yes, 8 No, 7 N/A
* Whether “related projects” are useful -- 35 Yes, 7 No, 5 N/A

Later, I will post all the code here for public review.

fgm’s picture

I like it a lot: it showed me two forum discussions about my module, about which I had not been aware, and this allowed me to answer the users, which would have remained helpless otherwise.

Actually, it might be even better if there was

  • a "read more" link to all matches on the same criteria on a separate page
  • a "new" flag for unread posts
catch’s picture

As far as I understand the algorithm, it finds projects based on them appearing in conversations together in the forums so:

"I'm using organic groups and views and I'm trying to do foo" or "I've got [.. some huge list of modules ..] installed and I'm getting fatal error memory limit exhausted tried to allocate xxx bytes" is probably going to show up a lot more often than "I'm using Organic Groups and OG Roles". I'm not convinced that forum conversations can provide this kind of information usefully at all - maybe as a ranking factor alongside something like similar by terms (dependent on freetagging for projects) or scraping .info files for similar dependencies.

I agree that the 'useful' thing is subjective though - it might be useful just to see a list of modules that other people happen to be using together even if they're not related if you're a new user.

As well as filtering out unpublished projects, ideally we could filter out modules with no official releases, or only 4.7 releases etc.

However, big +1 to recent discussions referencing this module, seems to work really well.
Big +1 to having this displayed on forum topics with links to projects - I've opened a feature request for that.

@sun - I asked about project issues (and groups.drupal.org posts) at DrupalCon, but as far as I understand it, only forum posts are parsed at the moment. I agree that having this for issues would be amazing. Opened a feature request: http://drupal.org/node/266579

By the way, project page for pivots: http://drupal.org/project/pivots

webchick’s picture

* Whether “related conversations“ are useful -- 32 Yes, 8 No, 7 N/A
* Whether “related projects” are useful -- 35 Yes, 7 No, 5 N/A

Well that's a bit silly. Did those people who rated "related projects" useful actually find useful related projects on the page in question? I've been looking at this block each time I go on a project page, and it's been relevant maybe 1 out of every 20 times. I wonder more if people are rating the "idea" of related projects useful. If it's intended to be, "Are the related conversations/projects for THIS module useful", that probably needs to be worded differently (and shortened to just "Was this helpful or not and why?", not a litany of questions).

I had assumed it was a general survey about the "idea" of pivots, and the length of it made me ignore it entirely, so I've never clicked on it. :P If you want me to start clicking on it every page with unhelpful results to help balance the stats, I'd be happy to. ;)

webchick’s picture

Yeah, the wording of this question is:

"*Would* links to other related modules be useful?"

Not

"*Are* the links to other related modules useful?"

So IMO those survey results have no bearing on anything, if they're intended to be used as a means of gauging quality of the algorithm.

webchick’s picture

Btw, here's one thought for making that more relevant.

http://drupal.org/project/activeedit

It shows "CCK" as a related module, which isn't remotely related.

However, right in the project page itself, it mentions Javascript Tools, Popups, and Formfilter. All of those ARE related.

Could "Are these modules mentioned anywhere on the project page itself" be factored into the algorithm? This would help a lot to distinguish between "Ok, even the maintainer thinks these are related" vs. "Hm. Lots of people were talking about 'events' that day."

catch’s picture

+1 for using project nodes as a (hopefully heavily weighted) source of related modules. Agree about the survey results as well. Would be better to zero out the results and start from scratch with new questions on that one.

webchick’s picture

One other thought is parsing module .info files for dependencies. Although this would actually be good for the project page to show in general, come to think of it. But anyway, might be another idea for identifying related modules.

catch’s picture

@webchick You mean like #9? ;) I've opened a groups discussion here: http://groups.drupal.org/node/11998

webchick’s picture

LOL Sorry. The one sentence I must not have read in #9. ;) Cool, thanks.

pivots_test’s picture

Thank you, webchick and catch, for the suggestions on algorithm improvement and survey design. I'm trying to work on that now. Hope together we can make it a valuable service to the community :).

pivots_test’s picture

Here's an explanation of the pivots system architecture. It is to show that the deployment of the system has little impact to the existing D.O infrastructure, and thus reduces the introduction of security vulnerability to the minimum.

The system consists of three components, as shown in the attached figure. The first component is the Indexer (written in Java). Its job is to run periodically and generate related conversations and related projects. It runs at the backend, and it only has read-only permission on five D.O database tables. The results are saved to another separate database. The second component is "action.php". It is placed in a separate Apache folder and it's not part of a Drupal system. Its job is to generate the HTML content of the pivots block from the pivots database. The third component is a PHP block on D.O. Its job is to display the HTML content from "action.php" through AJAX calls. It only has 18 lines of code.

When a user browses a module page on D.O (Step.1), the pivots block will forward a AJAX request to "action.php" (Step.2). "action.php" will then fetch data from the pivots database (Step.3), wrap it into HTML format, and render the pivots block on the webpage using AJAX responses (Step.4).

This structure is a little complex, but it has the minimum impact to the existing D.O infrastructure. It only has two places that interact with D.O: 1) a read-only D.O DB account on five tables, and 2) the18 lines PHP code block on D.O. It is not likely that the read-only DB account could introduce security problems. As to the PHP block, we can look at it line by line at http://danithaca.pastebin.com/m56cb7e87. It only has two places where users can input illegal code. They are arg(0) and arg(1). However, note line 2, if arg(0) is not string 'node' or arg(1) is not a number, then the code would stop. So it's not likely to have security problems here either.

As to "action.php" and the Indexer (which consist 99% of the coding work), both of them are placed on scratchvm at the time being, and are isolated from D.O. Even if they are placed on the D.O production servers, it's not likely they will cause security issues to D.O. Note that "action.php" is not part of a Drupal system so that it doesn't have access to the Druapl database. The worst it can do is to damage its own database and to mess up the webpage at the client side. As to the Indexer, it's isolated from the front end and it doesn't have write permissions to D.O database. The worst it can do is to cause performance issues.

pivots_test’s picture

Thanks to all the feedbacks so far, I have composed a TO-DO list and the development plan as follows.

1. Add the "read more" link.
2. Remove unpublished projects from the "related projects" block.
3. Fix some performance issues of the Indexer.

At that point, I will publish all the code to the issue queue as well as the development mailing list for public review. Then, hopefully, if the code meets all security standard and if the community likes it, we can provide it as a beta service to all the registered D.O users who are interested. In the following two months, I will collect feedbacks and work on the following features:

*. Add issue queue, related handbook entries etc to the "related conversations" block as separate tabs.
*. Improve the "related projects" algorithm so that its results are more relevant.
*. Improve the ordering algorithm of the "related conversations" block.
*. Add some interface features, such as making the block collapsible, displaying "new" post status, etc.

At the end of the two months, if we can show that the community accept it as a valuable approach for module recommendations, then we'll migrate the system from scratchvm to the production server and make it stabilized.

You comments would be much appreciated. Thank you!

pivots_test’s picture

Hi Amazon, would it be possible to turn it back on for advanced users who are interested? It would be helpful for me to collect more feedbacks to make improvement. And I hope it won't introduce security issues as I have explained above. Thanks!

pivots_test’s picture

Some update of the pivots system:

1. Added the "more" link.
2. Removed unpublished projects from the "related projects" block.
3. Fixed some performance issues of the Indexer.
4. Rearranged the "survey" link.

Is it possible to re-activate the pivots block? It would be helpful to collect feedbacks of this implementation. Thanks!

Amazon’s picture

I've re-enabled the block.

pivots_test’s picture

The code of the pivots system is published to CVS: contributions/modules/pivots4do

Or, for a quick browse, you might go to the following links.
PHP:
http://danithaca.pastebin.com/m798c291d
http://danithaca.pastebin.com/m2ea9911a
Java Indexer:
http://danithaca.pastebin.com/m1b147b25
http://danithaca.pastebin.com/m53f9621b
http://danithaca.pastebin.com/m592c46ec

For a short description of the architecture and security concerns, please refer to:
http://drupal.org/node/265450#comment-884264

Your comments would be much appreciated. Thanks!

jbrauer’s picture

The email to the devel list indicated this is available for d.o maintainers and cvs account holders but as a cvs account holder I don't seem to have access to the block. It looks like some awesome work.

webchick’s picture

Unless the algorithm for "Related modules" gets significantly improved, I think deploying this en masse is only going to result in more user confusion and more support requests for the webmasters team to handle. :( http://drupal.org/project/og is still filled with totally unrelated modules. (though at least Taxonomy Access Control" is there, as a another node access module option)

I would support deploying only the area of "recent forum discussions about this" though; that part seems to work well and is pretty cool in its own right.

catch’s picture

I'd agree with restricting it only to 'recent forum discussions'. IMO 'related modules' is likely to need combining with similarbytersms.module and/or .info parsing to give meaningful results. And better no results than misleading ones.

Michelle’s picture

I love the recent forum discussions but wish it was updated more often. It's the same ones there all the time. The related modules bit is useless to me, but I suppose I'm not its intended audience.

Michelle

Amazon’s picture

I've enabled the pivots recommendation system for all authenticated users. For more details see Daniel's latest blog post: http://mrzhou.cms.si.umich.edu/node/15

If you find problems with pivots, and we expect you to, please cite specific examples and Daniel will make improvements to the algorithms. Keep making recommendations and we will keep fine tuning to try to get it working for everyone. Your feedback is welcome and encouraged and an important part of helping new users, who don't sleep, eat, breath Drupal to be able to find relevant information on Drupal.org.

stephthegeek’s picture

Cool! The related discussions on module pages are looking really relevant.

A few suggestions for the survey that pops up on the (?) links, to make things a little more polished/professional:
- add a margin:15px; to the body definition in the CSS file so things are smushed against the edges of the popup
- wording change: "Latest posts on top." to "Latest posts are at the top."
- "We hope it would help" to "We hope it will help"
- spelling: appologize -> apologize
- "Decide whether i want this module" -> capitalize "I"
- "Other (pls specify below)" -> "please", not pls.
- Question 2, I think "conversations" should be "discussions", since that's what it's called previously. Ditto for question 3 and "related projects" rather than "modules"

oadaeh’s picture

FileSize
2.12 KB

Can the block be hidden, or have something like "No related discussions found." and "No related projects found.", if there is nothing in it, rather than showing a blank block, as the attached image shows?

greggles’s picture

@oadaeh - re hiding the block...I believe this is a result of the way that the block is implemented because it is javascript which pulls data from scratch.drupal.org (this was done for a variety of reasons - mostly limiting impact on the servers). drupal.org always thinks that the block has data because it always has the JavaScript code. The list of data isn't generated until your browser executes the JavaScript.

If you want to try fixing that problem here is the code for it: http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/drupalorg/b...

pivots_test’s picture

To webchick and catch: I'm working on improving the "related modules" section. Currently the algorithm puts the most co-mentioned modules at the top. Then popular modules such as CCK would always get its spot at the top. We are thinking to use the "cosine similarity" algorithm to tackle that problem. Thank you for pointing out the problem.

To Michelle and stephthegeek: I really appreciate your feedback and suggestions. I'm going to fix those problems in a moment.

To oadaeh and greggles: Just as greggles pointed out, the block problem is because of Javascript. When we migrate it from s.d.o. to d.o. (hopefully), we could re-write part of the code to use PHP rather than Javascript. Then the problem will go away.

Thanks everybody for your help and feedback! I will continue to make improvements, and hope the pivots system could really be something useful to the community.

wmostrey’s picture

The "Related discussions" block is a gift for module maintainers!

In the same way that unpublished projects are no longer displayed in the "Related projects" block, I believe projects that don't offer a download (that is, projects that only offer a HEAD release) shouldn't be displayed either. Those modules are either completely outdated or aren't ready yet for public facing. An example is the media module which hasn't received an update since november 2006 but which appears on the asset module's related modules list.

catch’s picture

I agree 100% with hiding modules that don't have any valid releases from the block.

pivots_test’s picture

Thanks wmostrey and catch for the comments. It will be incorporated in the next version. thanks!

Gerhard Killesreiter’s picture

We've had to disable the block today since scratch.d.o got moved behind htaccess.

We'll need to move the pivots engine to our main install.

Amazon’s picture

Status: Active » Fixed

Now working on this issue. http://drupal.org/node/265450 Marking fixed.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.