Currently, this module converts any word prefixed by an hash (#) to a link and add it as a tag.

This behaviour creates problems when you use links internal to the current page, since HTML uses # to identify anchors. Actually, we want to ignore words prefixed by hashes when they appear inside HTML tags.

For instance, if you have something like:

<a id="reference"></a>Foo bar
[CUT]
<a href="#reference">A link to go back to foo bar</a>

The hashtag module breaks last < a > tag and gives you something like:

<a href="<a href="/web/tag/reference" class="hashtag">#reference</a>">A link to go back to foo bar</a>

This shouldn't happen. The module should ignore hashes inside HTML tags.

Comments

Raul Cano’s picture

I subscribe to this. Isn't there any solution to that?
Unfortunately, it renders the module completely unuseful!

Leopold-2’s picture

Fully agree, prevents us from using it, which is a pity - how comes that this issue is still unassigned?

Raul Cano’s picture

I did some modifications to the module and now the issue here stated is partially solved:
The HTML tags are not consider anymore, so the body is not broken BUT some contents were having problems at being published. That is to say, they are displayed as empty contents when viewing them and if you try to edit them, then you see part of the content. Strange...

hashtags.module

/*
 * Create and return commas separated string from hashtag words (#some_word)
 */
function hashtags_get_tags($text) {

  $tags_list = array();
  //Start -- 06.12.2012
  //We are not interested in
  // -the tags inside html<a href="#fragment">Other text</a>  => we strip the text of HTML with strip_tags()
  // -HTML entities &#8122;
 
  //$pattern = "/#[0-9A-Za-z_]+/";
  //preg_match_all($pattern, $text, $tags_list);  
  $pattern = "/(^#|[^&]#)([0-9A-Za-z_]+)/";
  preg_match_all($pattern, strip_tags($text), $tags_list);
  //End -- 06.12.2012
 
  $result = implode(',', $tags_list[0]);  
  return $result;
}
letapjar’s picture

Re #3 you may want to use
preg_match_all($pattern, strip_tags(html_entity_decode($text,ENT_QUOTES)), $tags_list);

instead.

I found that anytime I had body text with apostrophes the ' would get flagged as a hashtag. Very annoying.

Raul Cano’s picture

Thank you very much! Works like charm. This is now the function in
hashtags.module

/*
 * Create and return commas separated string from hashtag words (#some_word)
 */
function hashtags_get_tags($text) {

  $tags_list = array();
  //Start -- 23.01.2013
  //We are not interested in
  // -the tags inside html<a href="#fragment">Other text</a>  => we strip the text of HTML with strip_tags()
  // -HTML entities &#8122;
  
  //$pattern = "/#[0-9A-Za-z_]+/";
  //preg_match_all($pattern, $text, $tags_list);  
  $pattern = "/(^#|[^&]#)([0-9A-Za-z_]+)/";
  //preg_match_all($pattern, strip_tags($text), $tags_list);

preg_match_all($pattern, strip_tags(html_entity_decode($text,ENT_QUOTES)), $tags_list);

  //End -- 23.01.2013

  
  $result = implode(',', $tags_list[0]);  
  return $result;
}
Leopold-2’s picture

thanks a lot! works perfectly fine now

Raul Cano’s picture

Another correction. I found with the previous code that the extraction of the hashtags was not done alright: the character immediately before the # sign was also retrieved, which was ok if the previous was a space, but not so ok if it was a colon or whatever else. I have the suspicion that this was also producing some duplicity of terms but I could not track the exact reason.
Here I add a correction that avoids both:

function hashtags_get_tags($text) {

  $tags_list = array();
  //Start -- Raul Cano - 04.03.2012
  //We are not interested in
  // -the tags inside html<a href="#fragment">Other text</a>  => we strip the text of HTML with strip_tags()
  // -HTML entities &#8122;

  // Removal of HTML special characters
  $text = preg_replace("/&#?[a-z0-9]{2,8};/i","",$text);
  // Extraction of Hashtags (with a previous removal of HTML tags)
  $pattern = "/(#[^\s[:punct:]]+)/";
  preg_match_all($pattern, strip_tags($text), $tags_list);   
  //End -- Raul Cano - 04.03.2012
  
  $result = implode(',', $tags_list[0]);  
  return $result;
}
radamiel’s picture

Assigned: Unassigned » radamiel
Status: Active » Fixed

sorry for late response - fixed in 6.x-1.0

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

corrected a typo