Use real HTML-Diff algorithm in Drupal Diff module

AndreU - May 10, 2009 - 11:27
Project:Diff
Version:6.x-2.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:yhahn
Status:postponed
Description

Suggestion

Daisy Diff can make a real diff of html-code instead of just striping out html-tags. In GSoC 2008 the author wrote a php version for Wikipedia: http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/includes/diff/

Maybe it could also be used for the Drupal Diff module?

Daisy Diff Description

Daisy Diff is a Java library that diffs (compares) HTML files. It highlights added and removed words and annotates changes to the styling.

  • Works with badly formed HTML that can be found "in the wild".
  • The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
  • In addition to the default visual diff, HTML source can be diffed coherently.
  • Provides easy to understand descriptions of the changes.
  • The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.

Demo of HTML-Diff: http://code.google.com/p/daisydiff/wiki/Examples

#1

scoorch - June 16, 2009 - 08:56

I had a detailed look at daisydiff and started integrating it into the diff module. The problem is that the PHP implementation of daisydiff is very slow when documens are longer (>4 screen pages). I talked to Guy, the developer of daisydiff and he confirmed this. The comparison does not deliver good results for longer documents. Therefore, I can not recommend using daisydiff for the diff module.

#2

yhahn - July 10, 2009 - 14:59
Assigned to:Anonymous» yhahn
Status:active» postponed

Thanks for this evaluation - please feel free to make active again if/when the performance of this library is improved.

 
 

Drupal is a registered trademark of Dries Buytaert.