truncating field name in apachesolr_index_key() causes Solr Error "multiple values encountered for non multiValued field" [#1707404]

Problem

If a field name in a content type definition is near the maximum length (32 characters), the apachesolr_index_key() function prepends the data type and plurality hints (e.g. "bs_" for boolean, single) to the field name, then truncates the result to the maximum length of a field name.

Sample error message when indexing nodes where the machine name is close to 32 characters long:

"400" Status: Bad RequestApache Tomcat/6.0.32 - Error report HTTP Status 400 - ERROR: [doc=4v3uk8/node/47489] : [0, 1, 1, 0, 0]type Status reportmessage ERROR: [doc=4v3uk8/node/47489] multiple values encountered for non multiValued field bs_field_siderbar_1_related_tab_: [0, 1, 1, 0, 0]description The request sent by the client was syntactically incorrect (ERROR: [doc=4v3uk8/node/47489] multiple values encountered for non multiValued field bs_field_siderbar_1_related_tab_: [0, 1, 1, 0, 0]).Apache Tomcat/6.0.32

Proposed resolution

Changing the routine to account for the maximum size of a field name + length of the "type hint" + length of the "plurality hint" resolves the issues. A patch is attached as a comment.

Comment	File	Size	Author
#10	apachesolr-field-name-truncates-1707404-9.patch	447 bytes	hedinfaok
#1	apachesolr-field-name-truncates-1707404.patch	454 bytes	hedinfaok

Comments

Comment #1

hedinfaok commented 1 August 2012 at 07:07

Status	File	Size
new	apachesolr-field-name-truncates-1707404.patch	454 bytes

Comment #2

hedinfaok commented 1 August 2012 at 07:08

Status:

Active

» Needs review

Comment #3

ianthomas_uk

he/him

English

Brighton, UK

commented 1 August 2012 at 16:06

Won't that cause problems elsewhere? You're effectively increasing the string length to 36 characters, but I think there are database columns that are 32 characters wide.

If not, then why is the limit there are all?

Comment #4

hedinfaok commented 2 August 2012 at 01:59

Won't that cause problems elsewhere?

I'm running this in production, but it would be awesome if you could apply the patch and see if it causes problems for you. The more QA the better, I always say. As it stands, the original line of code causes indexing failures if the field name is close to the maximum machine name for a field (32 chars).

You're effectively increasing the string length to 36 characters, but I think there are database columns that are 32 characters wide.

Yup, the field name is a database column (in SQL databases), which has a maximum limit of 32 characters. The function apachesolr_index_key() returns an array, which is fed into the Apache Solr web service. Solr does not have a 32 character limit on it's DynamicField names. (The limit is the Java limit for a string.)

If not, then why is the limit there are all?

You'd have to ask @Nick_vh, as he's the author of the commit, according to git log. My belief is that there is nothing wrong with limiting the length of a variable, and I believe it is generally a good practice. I personally don't think it's necessary, but I like to err on the side of the author's original intentions on this one.

Comment #5

nick_vh

he/him

Ghent

commented 8 August 2012 at 13:01

Status:

Needs review

» Needs work

The issue here is the block delta that is being used in facet api. As it states above the code :

Block deltas are limited to 32 chars

Facet API does hash the block deltas if they become bigger but preferably we do not want this practice. The current patch allows the field to grow > 32 chars so I don't think this is a good solution?

Do we have any other route to go to?

Comment #6

hedinfaok commented 8 August 2012 at 18:33

Thanks for clearing that up :)

I'm assuming that it is preferred to avoid block delta hashing to keep block hook implementations readable. Is this correct? I guess I would argue that it is reasonable to have hashes for deltas as part of the Drupal experience. I admit, it's ugly to read and the resulting hook function is not self describing, but it's expected behavior (See Views module).

If "hash dodging" is more important than the indexing function working for field names that are of legal value, then
It seems to me then, that the best option is to create a static map of "Apache solr truncated field names" and actual drupal field names. Because end characters will be truncated, collision is possible, so that would need to be accounted for as well. This would allow both 32 character truncated deltas and the ability to reference the original drupal field name.

Is this overhead worth the readability? If not, I'd have to push for the pragmatic solution as previously submitted.

Thanks again, I appreciate the input.

Comment #7

ianthomas_uk

he/him

English

Brighton, UK

commented 8 August 2012 at 22:40

I've had a closer look at this, and would now advocate removing the substr() altogether.

Block deltas are limited to 32 chars

But this function isn't returning a block delta, it's returning a field name for apachesolr so only needs to respect apachesolr's limits, which I think is practically unlimited (the only reference to a limit that I could find was a performance reference saying the field names would be included in the response).

There could be a problem if something takes the value from this function (or the field name pulled from Solr) and uses it straight as a block delta, but that would be a bug in the code calling this function, rather than the function itself. I've had a look through facetapi/query_type_numeric_range.inc and it's only used for two purposes: sent as a parameter to apachesolr, and used as part of a cache key. Neither should suffer from longer strings.

apachesolr.index.inc doesn't look like it would have any problems, and even adds _end to the key for dates.
apachesolr.module does use it as the key in a couple of arrays, and I'm not sure if it would cause a problems.
There are no other calls to apachesolr_index_key in apachesolr or facetapi.

If we do change the index key then already indexed content will have the wrong key. Do we need to handle that, or do we just add a release note advising a full reindex if you have long field names?

Comment #8

pwolanin commented 10 August 2012 at 19:14

I think this is a legacy of the much earlier code that used these directly as block deltas.

Comment #9

pwolanin commented 10 August 2012 at 19:21

Version:	7.x-1.0-rc2	» 7.x-1.x-dev
Assigned:	hedinfaok	» Unassigned

so, I don't think it's a big problem to change this t stop truncating, but it might causing existing indexes and facets to stop working.

Comment #10

hedinfaok commented 10 August 2012 at 19:39

Status:

Needs work

» Needs review

Status	File	Size
new	apachesolr-field-name-truncates-1707404-9.patch	447 bytes

/me capitulates

I've re-rolled the patch. I'll look into providing some test coverage on this.

Comment #11

pwolanin commented 11 August 2012 at 15:35

Well, look at the prior issue, if we are changing the keys we might also need an update function to check wither any of the facetapi keys changed and fix those, plus at least warn the user if re-indexing is required. See: #1161538: The numeric field id should not be used for Solr index field names

Comment #12

nick_vh

he/him

Ghent

commented 27 August 2012 at 19:18

I'm ok with this change, but I agree this needs a huge warning as it might change indexes that are currently running

Comment #13

heacu commented 24 November 2012 at 18:19

any chance we'll see this patch incorporated into the next release?

Comment #14

nick_vh

he/him

Ghent

commented 7 October 2013 at 17:10

Before we roll this change we need an update function and I'm not seeing this here yet. It's very tricky to suddenly change this and not tell people in advance.

Perhaps we could set a flag in the index_key function that tells it not to truncate while the default behaviour is still truncating?

Comment #15

nick_vh

he/him

Ghent

commented 7 October 2013 at 17:10

Status:

Needs review

» Needs work

Comment #16

nick_vh

he/him

Ghent

commented 23 October 2013 at 20:12

Status:

Needs work

» Closed (duplicate)

working on #2118947: Allow mapping to accept multiple index_callbacks that makes this issue a duplicate

truncating field name in apachesolr_index_key() causes Solr Error "multiple values encountered for non multiValued field"

Problem

Sample error message when indexing nodes where the machine name is close to 32 characters long:

Proposed resolution

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

News items

Our community

Documentation

Drupal code base

Governance of community