I am using a views exposed filter with an entityreference field. If my referenced entities have & in the title, they get double escaped in the form: & causing the user to see something like Recreation & Entertainment
<select id="edit-sector" name="area" class="form-select">
<option value="All" selected="selected">- Any -</option>
<option value="1">Accommodation</option>
<option value="2">Food &amp; Beverage</option>
<option value="3">Recreation &amp; Entertainment</option>
</select>
The attached patch fixes it, but please review it.
My first patch posted to drupal.org. Please be nice.
Comments
Comment #1
modestmoes commentedComment #2
damien tournoud commentedIt is actually needed here, but we need to reverse it into the select handler.
Comment #3
modestmoes commentedis this something that will be worked on, or should I submit another patch attempt?
Comment #4
joachim commentedI'm assuming what the maintainer means is that this patch needs more work.
The check_plain() is still required *somewhere*, but only for some types of widget.
My take on it is that getReferencableEntities() should be usable by other things too, so shouldn't do the check_plain().
Comment #5
colanLet's start with this. Other than the one I removed, there are two (2) more instances in the module file, but I'm not sure if those need to go as well.
Comment #6
joachim commentedRight, but if the options are used in a checkboxes or radio buttons form element, the check_plain is needed there. Or does options module take care of that for us?
Also, the docs for the method should state that the titles are raw.
Comment #7
colanI'm not sure about the Options module, but yes, if those titles are raw (and are not being checked elsewhere), we need to document that somewhere. Or better yet, check them properly if we need to.
Comment #8
joachim commentedIt's not about better yet, it's about anything else that calls that function knowing what it's receiving :)
IIRC select boxes shouldn't be sanitized, but checkboxes/radio buttons should.
Comment #9
damien tournoud commentedThe check_plain() removed in #5 is legit.
entity_label()returns a plain-text value that needs to be encoded before displayed in an HTML context.As far as I know, the only place we are not doing the right thing right now is in the Views select list handler. We need something like the attached patch.
Comment #11
star-szrHere's a reroll of #9 with a syntax fix and return value.
Comment #12
gaëlgWhich ones of the above patches need to be applied in order to get correct entity reference selection widgets?
Comment #13
star-szrNone of them worked for me, I ended up with a hook_form_alter(). Will post some code later for the workaround, because I'm not sure where the fix would go.
Comment #14
star-szrIt's not pretty, but here's what I used to work around what I believe is the originally reported bug. Inside an implementation of
hook_form_alter()orhook_form_FORM_ID_alter(), replace YOUR_FIELD_MACHINE_NAME. If you copy the entire hook below, change MYMODULE and FORM_ID as well.Comment #15
gaëlgThank you for the workaround, but it's a pain to do that for every little field.
As far as I understand, the problem is that a check_plain is done in form_select_options(), so it shouldn't be done before.
Actually, hook_options_list() is used to be able to display the options as a select list (no check_plain needed before) or radios, etc. (check_plain needed). BUT hook_options_list()'s documentation says no sanitize is needed because it will be managed by the options module.
So that getReferencableEntities(), called by entityreference_options_list(), should ideally not call check_plain(), or at least have an argument to disable it.
Am I wrong?
Comment #16
gaëlgSomething like this?
Comment #17
jfrederick commentedI applied the patch in #16 but I am still seeing a problem :/ I think it is the same issue.
In my entity reference select widget, which is using an entity reference view for its options, I am seeing
User's Listinstead ofUser's List.Comment #18
gaëlgI don't use the views widget but the default one. The patch #11 may help for the views one?
Comment #19
damien tournoud commentedLooking into this, Entity Reference seems to do the right thing here. Both Field API options lists and Views exposed filters options are supposed to be HTML.
But neither core nor Views properly handle using this HTML in a
<select>. They both attempt to do so and fail:strip_tags()without adecode_entities()strip_tags(decode_entities($label))instead ofdecode_entities(strip_tags($label));, losing some of the initial text; see #1793236: views_handler_filter::prepare_filter_select_options() incorrectly converts HTML to plaintextI haven't opened a core bug yet, let's close this when it's done.
Comment #20
rudiedirkx commentedJust my 2 cents:
getReferencableEntities()methods shouldn't html encode OR strip tags. They should return literal keys & values.entityreference_options_list()shouldn't encode $bundle_label becauseform_select_options()should do that (and does!).strip_tags()should never ever be called anywhere.strip_tags()actually CHANGES labels, which is very odd to me. By extension IMO _options_properties() should never ever return[strip_tags => TRUE](and/or _options_prepare_options() should never use it).The problem ofcourse is backward compatibility. I hate it so much.
Comment #21
rudiedirkx commentedIt's much more complicated than that unfortunately... A View can return a combination of encoded (e.g. term name with &) and not encoded (an
<img>). Just encoding, decoding and stripping tags at the end won't work. If the 'preprocess' ends with a strip_tags, you always lose tags even if they're in the entity title.I honestly don't know a solution that would fit all customers. A safe dropdown should be a possible result, but also a full HTML within a
<label>for after a checkbox. The problem goes beyond dropdowns (encode all): it can be a bunch of checkboxes too (encode only parts).I don't think a fixed #1793236: views_handler_filter::prepare_filter_select_options() incorrectly converts HTML to plaintext will solve all.
Comment #22
rob230 commentedAny news on this? Just noticed it happening on one of my sites (when I put '&' in an entity title then when it appears in an entity reference field using a select widget, the option appears with
&in it, i.e. escaped twice).I am using the latest dev version of Entity reference. This is not happening in a Views exposed filter for me! It's just happening in a normal form created with field_attach_form().
Comment #23
damien tournoud commentedAs I said in #19, this is not a Entity Reference bug.
Comment #24
star-szrQuoting @rudiedirkx in #20:
hook_options_list API docs:
So at the very least we should be removing the check_plain() from entityreference_options_list().
Comment #25
damien tournoud commentedI explained in #19 that core is buggy.
If you look at the documentation more closely, hook_options_list() clearly specify that the labels are HTML:
(emphasis mine)
Comment #26
damien tournoud commentedThe core bug is in _options_prepare_options(): this function tries to downcase HTML to plain-text, but it does only a
strip_tags()without adecode_entities().Comment #27
star-szrWhether core's sanitization is buggy or not, having characters like ampersands encoded twice should IMO be fixed by giving hook_options_list() what it wants, un-sanitized labels.If we opened an issue against core and said "Passing already sanitized option labels through hook_options_list() results in double encoding", we'd probably get pointed to the API docs and closed (works as designed), wouldn't we?I'd be happy to write a test for this, it's very easy to reproduce and Entity Reference in D8 core does the same thing (so there needs to be a core issue to fix it there as well), whereas the "Term reference" field is fine.Edit: patch doesn't work. Nevermind.
Comment #28
alexweber commented@Damien, I understand your position but this is a bug that's very noticeable mainly when using Entity Reference so I really don't get why you won't commit a patch (at least temporarily until the core bug gets fixed) to fix this instead of forcing others to manually patch the module whenever they have to use it.
Comment #29
damien tournoud commentedSorry, but we cannot fix it in Entity Reference: select boxes and checkboxes/radios require different encoding (Select box requires plain-text and checkboxes/radios require HTML), so we cannot fix one without breaking the other.
Comment #30
damien tournoud commented@Cottser: you are confusing sanitization (which is a security requirement) and encoding (which is a correctness requirement).
hook_options_list()wants HTML, so we *have* to usecheck_plain()(inEntityReference_SelectionHandler_Generic::getReferencableEntities(), which is I assume the one you meant to remove) to encode the plain-text into HTML.Comment #31
star-szr@Damien - Thanks for your response. I'm really trying to get to the bottom of this issue, the main thing I'm failing to understand is this:
If the cause of this issue is really a core bug, why does the term reference field in core have none of these issues? The term reference field provides select box and checkbox/radio widgets just like Entity Reference does.
Comment #32
JvE commentedI think the misunderstanding is caused by the fact that the check_plain() in entityreference_options_list() is escaping the vocabulary label, not the term labels.
Seems like some people missed that point.
As stated before the real problem here is the use of getReferencableEntities() which returns
The documentation clearly states that the option labels should not be sanitized. Unfortunately entityreference_options_list() has no access to unsanitized labels.
The bug is still in Entityreference (returning sanitized labels in stead of raw ones). So here's a new patch that decodes the labels again.
An alternative would be the patch from #16 which lets developers decide if they want sanitized or raw values from getReferencableEntities().
Comment #33
damien tournoud commented@JvE: please re-read my comments before commenting again. You are contracting yourself:
The documentation for
hook_options_list()clearly states that the labels are *HTML*:As explained, this is a core bug. And I already explained what needs to be fixed. Please, someone open a core issue and propose a patch.
Comment #34
damien tournoud commentedAlso, read #30 that explains the difference between sanitization (which is about security) and plain-text vs. HTML (which is about encoding).
Comment #35
damien tournoud commented@Cottser#31: The taxonomy reference field definitely doesn't work.
Try this simple experiment: create a term named
<test>(literally). Visit the term page, you will see<test>in the title. In a taxonomy reference widget (either select box or radio button), you will see the term rendered as blank, like in the attached screenshot.Comment #36
JvE commented@Damien Take a deep breath and take another look.
Yes, there is a core bug that strips tags from the labels. This patch does not fix that. In fact this whole issue has nothing to do with that afaict.
However, there is also a bug in entityreference. Entityreference_option_list() returns encoded labels when it should not. This leads to double encoding.
I cannot make it easier to understand than that.
As you can see, core makes the <> disappear, but it is entityreference that is making a mess of the rest. Simply beacuse it returns encoded labels when it should not (both docuemtnation and implementation confirm this).
Comment #37
damien tournoud commented@JvE: you are confusing encoding and sanitization. The documentation *clearly* states the return value of
hook_options_list()needs to be HTML. To transform the label (that is plaintext) to HTML, we run it throughcheck_plain(). There is no bug anywhere in this process.The bug - as you described it yourself! - is that core pipes HTML (that it gets from
hook_options_list()), into a<select>element that only accepts plain-text. Transform your select widget into a radio button, and you will see that the output is correct in that case (because radio buttons and checkboxes expects the options to be HTML).Comment #38
star-szr@Damien - Can we please stop fighting with the issue status on this and have a real discussion?
The top field is an entity reference field, the bottom field is a term reference field, both pointing to the same vocabulary.

I personally think it's fine that core strips out tags from
<select>elements and agree with @JvE, that's probably a separate issue.Comment #39
star-szrIn the example above, switching to a checkbox/radio button widget indeed works fine, and I don't see any discernible difference in rendering or markup between the two field types. Both show properly encoded characters in the source (
&,>).So what's the purpose of Entity Reference encoding the options through check_plain() ahead of time? As far as I can see core handles that encoding just fine (other than perhaps the tag stripping bug). I think many of us are still struggling to understand why this is a won't fix.
Edit: In the first paragraph I meant with and without the patch in #32.
Comment #40
star-szrThis is due for a re-title, this is at least closer to the original report.
Comment #41
star-szrThat's not quite right, let's just remove the word 'encode', shall we? :)
Comment #42
gaëlg@Damien OK, encoding and sanitization are not the same. Actually the name of check_plain() itself is confusing. It's not for checking but for encoding. In the real world, you can check twice or more with no problem (it's just useless), but you can't encode twice and keep the result unchanged. Well, this function is so much used that I don't think a function renaming patch would have any chance to get applied, would it?
AFAIU, hook_options_list() says it needs unsanitized HTML. We have potentially unsanitized plain text. Ideally, all we need to do is convert plain text to HTML. check_plain() do that. But, we observe that options.module does not correctly handle what we provide. It claims it needs HTML, but if we give HTML, it renders a mess. Looks like it sanitizes and encodes. That's a core bug. Am I wrong?
Now, we can submit a core bug issue for that. Actually, I don't feel comfortable enough with all this stuff to do it. And while waiting for that issue to be fixed, we need a workaround. There seem to be an easy one that other modules seemingly use already. This workaround is to take options.module for what it currently do: encode plain text to HTML.
Using a core bug workaround in a contrib module or not, that is the question, right?
Comment #43
rob230 commentedIf it is indeed a core bug, is there an issue for it anywhere? Would be interesting to see what people working on core have to say about the expected behaviour. The bug could be with the documentation rather than the functionality.
Comment #44
star-szrOkay, how about this:
options.test.
Creates a few fields whose option labels contain unescaped markup including an ampersand.
Tests that radio button and check box labels are filtered properly.
Also tests the option labels for select widgets, single and multiple.
I remain unconvinced that hook_options_list() wants encoded options.
#37:
Where? All I see is this:
hook_options_list():
To me this says HTML tags may be filtered out, but does not read as "you need to provide the options as HTML, encoded appropriately".
Comment #45
damien tournoud commented@Cottser: really, it's either HTML or it's plaintext. There is no middle ground :)
Comment #46
gaëlgYes. There is no way for options.module to know if the given string encoding is HTML or plain text. Currently the code acts as if it's plain text given, and encodes it. If you give HTML, it encodes twice. So it currently allow only one format: plain text.
If we change the core, it will act as if it's HTML given, and never encode it. There will be a problem if you give it plain text, because if won't be encoded at all, and at the end, browsers need HTML. It will still allow only one format: HTML.
Or you would have to indicate the format as a new function argument.
For now, the hook documentation is a bit unclear. It should said that the string needed must be plain text. It may contain HTML tags, but with plain text encoding. Actually, this is the only core bug from that point of view.
Ideally, it would be clearer if options.module needed an HTML encoded string which might contain HTML tags. In that case we would change the code and not the doc. That's what Damien means if I'm right.
Comment #47
damien tournoud commented@GaëlG: No, the options module expects HTML. It properly uses
filter_xss()on it to pass it to checkboxes and radio buttons. It *tries* to down-case it to plain-text to pass it to<select>but does that incorrectly. It only runsstrip_tags()and forgets to rundecode_entities()on it, as explained back in #26.Comment #48
alexweber commented@Damien, can't you also "forget" to run decode_entities()? :)
Comment #49
damien tournoud commented@alexweber: It is not possible to provide
hook_options_list()something that would work for both the broken<select>code path and the correct radios/checkboxes one.I have provided all the information necessary to open a core bug and submit a patch. Can someone run this home?
Comment #50
joachim commented> It is not possible to provide hook_options_list() something that would work for both the broken
code path and the correct radios/checkboxes one.
Taxonomy module seems to do fine?
Here is what options module is getting when it invokes hook_options_list() in _options_get_options():
The docs for hook_options_list() don't say that HTML entities should be escaped:
Comment #51
damien tournoud commented@joachim, please see #35.
... if it's HTML, entities need to be escaped. Else it's plaintext. The option module definitely expects HTML, it just wrongly deals with it in one case. And the taxonomy module also provides the wrong information in its
hook_options_list().Comment #52
joachim commentedSo you're saying that there are two bugs, one in options module and one in taxonomy module, which **together** combine to give correct output to the user, while entityref is doing nothing wrong?
Comment #53
damien tournoud commented@joachim: No. I am saying there are two bugs in core, and they render the taxonomy reference field completely broken. At least entity reference works right in one case (Radio/Checkboxes).
But do feel free to read this whole thread before displaying your little understanding of the subject.
Comment #54
star-szr@Damien - I must say I'm not a fan of the condescending comments, but despite that here is a core issue as you requested:
#1919338: Select widget (from the options module) prone to double encoding
Feel free to update as you see fit.
Comment #55
damien tournoud commented@Cottser: Fair enough, but I must say I'm not a fan of saying the same thing over and over again either.
Thanks for opening this issue!
Comment #56
JvE commented@Damien: thank you for your patience in trying to clarify this issue.
Core does a check_plain() on the option labels in form_select_options() (called by theme_select()) which is why the double encoding occurs.
So perhaps the documentation of hook_option_list() needs to be clarified to avoid leading developers like yourself to encode the labels as well.
I am in favor of doing check_plain in outputting function like theme and render and not in lower-level api functions like getReferencableEntities() but perhaps that function could be considered a render/theme function.
For checkboxes/radios the labels go through theme_form_element_label() which does not check_plain them (it does do a filter_xss_admin()). This allows for some features like using images with your radio/checkbox.
If you feel that core should not check_plain select options then please raise a core issue yourself rather than asking other to do it for you.
I do agree with you that the core behaviour does have some issues like preventing select elements from showing "Some <tags>" in select options, but I consider that another issue, not related to the issue that entityreference_option_list() returns encoded labels when it should not.
PS: You earlier comments seem to suggest you are looking at the html through a plugin like firebug which hides the double encoding from you. Please check the actual source in stead to avoid confusion.
Comment #57
damien tournoud commented@JvE: you are confusing encoding and sanitization.
This issue is closed, both the Drupal core bug and the Views bug have been opened. There is nothing to do on the Entity Reference side.
Comment #58
JvE commentedNo I am not confusing encoding and sanitization.
Is it okay if I re-open this one when #1919338: Select widget (from the options module) prone to double encoding is closed as "won't fix" or "works as designed"?
Comment #59
damien tournoud commented@JvE: you are working of a false premise... What you call "raw, unsanitized, unencoded values" does not exist. The value can be HTML or plain-text, that's it. "Raw values" cannot be interpreted in any meaningful way.
hook_options_list()requires HTML. It is a good thing, because it means that you can have rich values, at least in for some widgets / display contexts (for example, facets generated by Facet API). But if your source data is plain-text, which is the case of Entity Reference when using the base selection mode (the Views mode can output HTML) and Taxonomy Term Reference, you have to transform it into HTML before passing it tohook_options_list(). That's what thischeck_plain()call does.That's really the end of the story. There is nothing to fix on Entity Reference, because everything works as designed.
Comment #60
bunthorne commented"Working as designed" does not mean that it is working as desired. Many of us are still left with a crappy select list. When can we expect it to begin working as desired? Or should we look for another module to handle our needs?
sorry to be a bit snippy, but I'm more interested in a good website than who is winning an argument over whose bug it is. This issue has been open since June 29, 2012--over 7 months. Should I try one of the patches attached to this issue, while waiting for Core and Views to act? Are the Core and Views issues linked here for us to follow?
Comment #61
damien tournoud commentedThere is no argument over whose bug this is, nor where it needs to be fixed.
Here are the referenced bugs:
Help roll patches and review them there!
Comment #62
capellicI just wrote some jquery to deal with the problem.
Comment #63
mikefyfer commentedObviously would still like to see the core bug fixed, but for now, I did like the above, but a little more. This replaces all occurances, rather than just one. Also replaces the broken quotes and apostrophes.
Comment #66
alauddin commentedpatch from #32 still works on 7.x-1.1
Comment #67
dewalt commentedThe string values should be escaped before output, on render stage. Assigning '#options' element or using hook_options_list() isn't a part of render - it is provides data for render, that should be unescaped.
There are some cases with this issue:
a) EntityReference_SelectionHandler_Generic usage
b) EntityReference_SelectionHandler_Views usage
In first case unnessesary HTML escape functions should be removed, in the other case should be used view result (unescaped), not view output (escaped).
Comment #70
troybthompson commentedI don't know about correctness, but #69 solved the problem for me.
Comment #71
damien tournoud commentedComment #72
damien tournoud commented@dewalt and @erlendoos: Please see the history of this issue, especially the links in #61.
@erlendoos: I'm unpublishing your patch as it can only introduce security issues.
Comment #73
damien tournoud commentedComment #74
dewalt commented@damien-tournoud, we can discuss 3 years more, where the problem should be fixed, here or in #1919338 issue. But discussion doesn't resolve it! And all this time many users of entityreference module suffers from this bug.
As I know, entityreference module is the only one (from popular modules), where the issue takes place. Proceeding from this I purpose to fix the bug within entityreference module scope, while situation with #1919338 isn't clear. Guys, what do you think about it?
My proposal to fix is attached in patch.
Comment #75
dewalt commentedComment #76
damien tournoud commented*sigh*
Comment #77
dewalt commentedComment #78
dewalt commented@damien-tournoud - I propose to discuss, I propose it for all interested people. While you close the issue, the problem is still exists! And it exist for whole community! Why do you deside individually?
Comment #79
damien tournoud commented@dewalt: The discussion already happened. All 77 comments of them. The way forward has been described in #61, 2 years ago. Every time you come back to it, you delay the actual issues from being fixed (in Views and in Drupal core). Please move the discussion *there*. It is not *that* hard.
Comment #80
dewalt commentedCore issue #1919338: Select widget (from the options module) prone to double encoding isn't resolved yet, it is in discussion - and there is a case, it wouldn't ever fixed.
The issue is in entityreference thread, it doesn't touch core and views. IMHO, issue, that can be solved on module side, but exists on core thread, delays the actual issues from being fixed.
Comment #81
mpdonadio#80, patches that fix upstream issues in the downstream modules are harmful in two ways.
1. Other modules downstream of the actual problem will still have the problem (there is a decent chance this is happening in elsewhere, too).
2. If this is also fixed upstream in Core and/or Views, then there could be a regression in this module (and any other module with this problem that implements a fix). Regressions are sometimes hard to detect, especially in D7 where test coverage isn't as good as D8.
The proper thing to do is for someone to act on #1919338: Select widget (from the options module) prone to double encoding and actually post a patch for review. It looks like two on line changes. I try post a patch tonight.
[Edited point 2 for clarity].
Comment #82
mpdonadioOf the two issues mentioned in #61, the #1919338: Select widget (from the options module) prone to double encoding now has a green patch. Someone needs to review it. #1793236: views_handler_filter::prepare_filter_select_options() incorrectly converts HTML to plaintext has had a green patch for about a year waiting review. I just triggered a retest there to see if it still applies and works. If this comes back green, someone should review it.
Comment #83
lionslair commentedYes it works. Just tested it and confirm it does solve the issue.
Comment #84
netw3rker commented@lionslair, you should read the whole issue here or at least the last comment right before your post. The patches within this issue are not relevant since this is actually a core bug. The patches here will potentially cause security vulnerabilities (among other problems).
see @mpdonadio's comment directly before yours in #82:
The core patch in #1919338 should fix the bulk of the use-cases people keep reporting here. Users considering this issue should instead look to patches in both of those referenced issues and apply those instead of the ones here. Because of this, *this issue needs to remain closed, and not RTBC*.
Comment #85
tessa bakkerComment #86
i.bajrai commentedPatch here will fix your problems:
https://www.drupal.org/node/1919338#comment-10443571
Comment #87
capfive commentedJust an update, this has been fixed in drupal Core as of 7.42
Comment #88
capfive commented