Right now HTML is not stripped from fields before looking for matches. This results in:
1)Terms not being matched when they should be, because html runs into the term thus making it look to the module like a different word altogether
2)Terms being matched that shouldn't be. This happens when an html command happens to be identical to one of the vocabulary terms. This happened to me. I have a vocabulary that is basically a list of abbreviations. One of these abbreviations is "UL", and it was getting flagged erroneously in posts containing unordered lists.

Comments

sja1’s picture

Title: Stip HTML before searching for matches » Strip HTML before searching for matches

fix typo in issue title

sdrycroft’s picture

Version: 6.x-1.27 » 6.x-2.0

This should be very easy to get in. I'll add it once I've added your proposed patch #811522: Code generates false positives, causing nodes to be tagged with erroneous terms.

sdrycroft’s picture

Version: 6.x-2.0 » 7.x-3.0
Issue summary: View changes

HTML is not currently stripped from the text being searched before it is searched. This is potentially still an issue, although I have advised people to split words/phrases that they do not want tagging with SPAN tags, so it could also be considered a feature.