Use real HTML-Diff algorithm in Drupal Diff module
AndreU - May 10, 2009 - 11:27
| Project: | Diff |
| Version: | 6.x-2.x-dev |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | yhahn |
| Status: | postponed |
Jump to:
Description
Suggestion
Daisy Diff can make a real diff of html-code instead of just striping out html-tags. In GSoC 2008 the author wrote a php version for Wikipedia: http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/includes/diff/
Maybe it could also be used for the Drupal Diff module?
Daisy Diff Description
Daisy Diff is a Java library that diffs (compares) HTML files. It highlights added and removed words and annotates changes to the styling.
- Works with badly formed HTML that can be found "in the wild".
- The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
- In addition to the default visual diff, HTML source can be diffed coherently.
- Provides easy to understand descriptions of the changes.
- The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.
Demo of HTML-Diff: http://code.google.com/p/daisydiff/wiki/Examples

#1
I had a detailed look at daisydiff and started integrating it into the diff module. The problem is that the PHP implementation of daisydiff is very slow when documens are longer (>4 screen pages). I talked to Guy, the developer of daisydiff and he confirmed this. The comparison does not deliver good results for longer documents. Therefore, I can not recommend using daisydiff for the diff module.
#2
Thanks for this evaluation - please feel free to make active again if/when the performance of this library is improved.