Right now HTML is not stripped from fields before looking for matches. This results in:
1)Terms not being matched when they should be, because html runs into the term thus making it look to the module like a different word altogether
2)Terms being matched that shouldn't be. This happens when an html command happens to be identical to one of the vocabulary terms. This happened to me. I have a vocabulary that is basically a list of abbreviations. One of these abbreviations is "UL", and it was getting flagged erroneously in posts containing unordered lists.
Comments
Comment #1
sja1 commentedfix typo in issue title
Comment #2
sdrycroft commentedThis should be very easy to get in. I'll add it once I've added your proposed patch #811522: Code generates false positives, causing nodes to be tagged with erroneous terms.
Comment #3
sdrycroft commentedHTML is not currently stripped from the text being searched before it is searched. This is potentially still an issue, although I have advised people to split words/phrases that they do not want tagging with SPAN tags, so it could also be considered a feature.