This function was an attempt at abstraction of the algorithm. But I fear, that this abstraction isn't possible without sacrificing performance. So remove this and perhaps we can visit the algorithm abstraction later.

Comments

ngaur’s picture

A sophisticated similarity calculation probably is going to be a performance issue, but that can presumably be kept under control by making sure you don't process many nodes at once. Eg the linkchecker module does a pretty decent job of this.

Scott Reynolds’s picture

yes it process a configurable amount of nodes at once. AND it has a configurable time control as well, it will abort if it has spent to much time.

This function is a hold over from doing the calculations from php memory. We do it now in the database using temporary tables.