Node's most semantic similar nodes

This project is not covered by Drupal’s security advisory policy.

Overview

Semantic Similarity automatically computes semantic similarity scores between each node, which helps to detect duplicate (high scores) or complementary (medium scores) content.

Features

  • computes scores at cron time
  • provides blocks [1] on node page to display a node's most/least semantically similar nodes
  • provides a page [2] that list most/least semantically similar nodes
  • integrated with Views
  • supports English only

Background technologies

These scores are computed by integrating Drupal with the R project for Statistical Computing and its Latent Semantic Analysis package. The semantic similarity scores, obtained from a Latent Semantic Analysis (LSA) algorithm a well established one in text mining, is an effective measure of semantic relatedness.

Semantic Similarity module utilizes advanced text mining. It offers a truly semantic approach that applies the LSA algorithm to approximate the meaning of nodes, thereby exposing semantic structure to computation. LSA combines the classical vector-space model — well known in computational linguistics — with a Singular Value Decomposition (SVD), a two-mode factor analysis. Thus, bag-of-words representations of nodes can be mapped into a modified vector space that is assumed to reflect semantic structure. The module then computes the distance amongst the vectors. This distance is in fact a measure of semantic relatedness between texts.

The algorithm, segmented in 3 steps (pre-process, process and post-process) and simply configurable, permits to choose between 2400 combinaisons of factors, allowing the user (with proper permissions) to fine tune the relevancy of the scores.

Dependencies

Installation

See INSTALL.txt

Roadmap

  • integrate with Views (to expose scores to the power of Views) done
  • add setup of methods to reduce the rank of the term node matrix (to get even more than 2400 possible combinaison of factors) done
  • outsource loading of Raphael JS to Raphael module done
  • translate into French (to save Molière)
  • add a PHP fallback to R/lsa logic (to lower the module installation entry barrier)

Notes

  1. Both provided by the module default view or the module. The module block has a nice radial map but not the view one.
  2. By the module default view

Project information

  • Created by benoit.borrel on , updated
  • shield alertThis project is not covered by the security advisory policy.
    Use at your own risk! It may have publicly disclosed vulnerabilities.

Releases